How to verify if a value exists at specific position in pandas dataframe (index and column) using a substring and for loopWhat is the most efficient way to loop through dataframes with pandas?Selecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameHow to drop rows of Pandas DataFrame whose value in a certain column is NaN“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headerspandas three-way joining multiple dataframes on columns

How can I stop myself from micromanaging other PCs' actions?

Why is a dedicated QA team member necessary?

"I you already know": is this proper English?

Determine if a triangle is equilateral, isosceles, or scalene

Are gangsters hired to attack people at a train station classified as a terrorist attack?

Is Grandpa Irrational? Another Grandpa Mystery

USA: Can a witness take the 5th to avoid perjury?

kids pooling money for Lego League and taxes

Where to place an artificial gland in the human body?

What exactly makes a General Products hull nearly indestructible?

Why are so many countries still in the Commonwealth?

How can I create a shape in Illustrator which follows a path in descending order size?

What do teaching faculty do during semester breaks?

Grid/table with lots of buttons

Why is chess failing to attract big name sponsors?

Moving files accidentally to an not existing directory erases files?

Other than a swing wing, what types of variable geometry have flown?

How important is a good quality camera for good photography?

arcpy.ListFields() displaying numerical field names instead of actual field names

Why did Saturn V not head straight to the moon?

This message is flooding my syslog, how to find where it comes from?

Why are angular mometum and angular velocity not necessarily parallel, but linear momentum and linear velocity are always parallel?

Keeping an "hot eyeball planet" wet

Terence Tao - type books in other fields?

How to verify if a value exists at specific position in pandas dataframe (index and column) using a substring and for loop

What is the most efficient way to loop through dataframes with pandas?Selecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameHow to drop rows of Pandas DataFrame whose value in a certain column is NaN“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headerspandas three-way joining multiple dataframes on columns

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I have a script that reads in hundreds of excel files from a directory. The files are of similar format (all are .xls or .xlsx, therefore readable by pandas) and the script converts them to pandas dataframes to write them back out in a format that I desire. The first 10 rows of each excel files are not needed, and I don't need all the empty rows and columns,(Example spreadsheet: I need all the data from row 11 down.) so I strip them out using df.iloc[9:, 0:47] (47 is the max extent of columns) and use df.dropna(how='all') to drop all the rows with empty values.

for fn in os.listdir(path):
 file = os.path.join(path, fn)
 if os.path.isfile(file):
 ##I need the second sheet of each spreadsheet
 data = pd.read_excel(file, sheet_name=1, index=False)
 relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of 
 ##useless data, and caps the 
 ##columns at 48.
 relevantData.dropna(how='all')

However, it has come to my attention that some of the excel files are one whole row short. (Example Image.)Thus the headers will not be the same and the data will not append in further processing, and the script stops because of string vs float interpretation errors(the header columns are all strings and the rest of the data is mostly number values). I want to be able to filter out these files, or dynamically add another row to make sure they conform to the other majority of files.

Is there a way to include in a for-loop a check of the df.iloc[0,0] position (even after I have already used df.iloc previously in the code), to contain a substring that would indicate that the file is missing a row? If the value of the dataframe cell has the substring 'act', then I know that the format is correct and it can continue for further processing. If the substring does not exist (aka the file is 1 row short or a nonstandard format) then either add a row at the 9th position index (if possible), or at the least spit these files out to preprocess manually.

I have tried using if df.iloc[row position, column position].str.contains('act'):, but it appears the placement of this code in the script renders the dataframe value as not a series? It throws exception: AttributeError: 'str' object has no attribute 'str'.

Then I tried a different approach: if sub in df.iloc[0,0]: (where sub is a variable = 'act'.
But even when 'act' existed in that position, the script was sending the values to the False portion of the for loop, however I need it to send it to the True route.
(Also tried if sub in df[0,0]:, this throws error: KeyError: (0, 0))

import os
import pandas as pd


dfList = []
path = 'H:DirectoryWithExcelFiles'
newpath = 'H:FolderWithNewFiles_ThatContain_act'
newpath2 = 'H:FolderWithNonStandardFiles_DontContain_act'


for fn in os.listdir(path):
 file = os.path.join(path, fn)
 if os.path.isfile(file):
 ##I need the second sheet of each spreadsheet
 data = pd.read_excel(file, sheet_name=1, index=False)
 relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of 
 ##useless data, and caps the 
 ##columns at 48.
 relevantData.dropna(how='all')
 sub = 'act'


 if sub in relevantData[0,0]:
 # Create a Pandas Excel writer using XlsxWriter as the engine.
 writer1 = pd.ExcelWriter('H:FolderWithNewFiles_ThatContain_act\' + fn, engine='xlsxwriter')

 # Convert the dataframe to an XlsxWriter Excel object.
 relevantData.to_excel(writer1, sheet_name='Sheet1', index=False, header=None)

 # Close the Pandas Excel writer and output the Excel file.
 writer1.save()

 else:
 ## Ideally I would want to add the new row to the data frame here at 
 ##9th position and then send back to the beginning of the loop.
 writer2 = pd.ExcelWriter('H:FolderWithNonStandardFiles_DontContain_act\' + fn, engine='xlsxwriter')
 # Convert the dataframe to an XlsxWriter Excel object.
 relevantData.to_excel(writer2, sheet_name='Sheet1', index=False, header=None)

 # Close the Pandas Excel writer and output the Excel file.
 writer2.save()

I expect that if the substring 'act' is in the data value at df[0,0] that the files should be outputted to newpath1, however I can only get the files to pass in the else: statement (newpath2), even though when printing the value before passing it through the 2nd if loop, print (relevantData.iloc[0,0]) the value clearly contains the substring 'act' within the string 'Action Item/Dig #'(Example picture of positon clearly having the substring.)

Does anyone have any solutions as to why the for loop will not recognize the iloc[] positioning and validate if the string exists? I can provide sample spreadsheets if asked.

edited Mar 26 at 17:38

asked Mar 26 at 15:02

zmotuck

13 bronze badges

add a comment |

for fn in os.listdir(path):
 file = os.path.join(path, fn)
 if os.path.isfile(file):
 ##I need the second sheet of each spreadsheet
 data = pd.read_excel(file, sheet_name=1, index=False)
 relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of 
 ##useless data, and caps the 
 ##columns at 48.
 relevantData.dropna(how='all')

import os
import pandas as pd


dfList = []
path = 'H:DirectoryWithExcelFiles'
newpath = 'H:FolderWithNewFiles_ThatContain_act'
newpath2 = 'H:FolderWithNonStandardFiles_DontContain_act'


for fn in os.listdir(path):
 file = os.path.join(path, fn)
 if os.path.isfile(file):
 ##I need the second sheet of each spreadsheet
 data = pd.read_excel(file, sheet_name=1, index=False)
 relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of 
 ##useless data, and caps the 
 ##columns at 48.
 relevantData.dropna(how='all')
 sub = 'act'


 if sub in relevantData[0,0]:
 # Create a Pandas Excel writer using XlsxWriter as the engine.
 writer1 = pd.ExcelWriter('H:FolderWithNewFiles_ThatContain_act\' + fn, engine='xlsxwriter')

 # Convert the dataframe to an XlsxWriter Excel object.
 relevantData.to_excel(writer1, sheet_name='Sheet1', index=False, header=None)

 # Close the Pandas Excel writer and output the Excel file.
 writer1.save()

 else:
 ## Ideally I would want to add the new row to the data frame here at 
 ##9th position and then send back to the beginning of the loop.
 writer2 = pd.ExcelWriter('H:FolderWithNonStandardFiles_DontContain_act\' + fn, engine='xlsxwriter')
 # Convert the dataframe to an XlsxWriter Excel object.
 relevantData.to_excel(writer2, sheet_name='Sheet1', index=False, header=None)

 # Close the Pandas Excel writer and output the Excel file.
 writer2.save()

Does anyone have any solutions as to why the for loop will not recognize the iloc[] positioning and validate if the string exists? I can provide sample spreadsheets if asked.

edited Mar 26 at 17:38

asked Mar 26 at 15:02

zmotuck

13 bronze badges

add a comment |

for fn in os.listdir(path):
 file = os.path.join(path, fn)
 if os.path.isfile(file):
 ##I need the second sheet of each spreadsheet
 data = pd.read_excel(file, sheet_name=1, index=False)
 relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of 
 ##useless data, and caps the 
 ##columns at 48.
 relevantData.dropna(how='all')

import os
import pandas as pd


dfList = []
path = 'H:DirectoryWithExcelFiles'
newpath = 'H:FolderWithNewFiles_ThatContain_act'
newpath2 = 'H:FolderWithNonStandardFiles_DontContain_act'


for fn in os.listdir(path):
 file = os.path.join(path, fn)
 if os.path.isfile(file):
 ##I need the second sheet of each spreadsheet
 data = pd.read_excel(file, sheet_name=1, index=False)
 relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of 
 ##useless data, and caps the 
 ##columns at 48.
 relevantData.dropna(how='all')
 sub = 'act'


 if sub in relevantData[0,0]:
 # Create a Pandas Excel writer using XlsxWriter as the engine.
 writer1 = pd.ExcelWriter('H:FolderWithNewFiles_ThatContain_act\' + fn, engine='xlsxwriter')

 # Convert the dataframe to an XlsxWriter Excel object.
 relevantData.to_excel(writer1, sheet_name='Sheet1', index=False, header=None)

 # Close the Pandas Excel writer and output the Excel file.
 writer1.save()

 else:
 ## Ideally I would want to add the new row to the data frame here at 
 ##9th position and then send back to the beginning of the loop.
 writer2 = pd.ExcelWriter('H:FolderWithNonStandardFiles_DontContain_act\' + fn, engine='xlsxwriter')
 # Convert the dataframe to an XlsxWriter Excel object.
 relevantData.to_excel(writer2, sheet_name='Sheet1', index=False, header=None)

 # Close the Pandas Excel writer and output the Excel file.
 writer2.save()

Does anyone have any solutions as to why the for loop will not recognize the iloc[] positioning and validate if the string exists? I can provide sample spreadsheets if asked.

edited Mar 26 at 17:38

asked Mar 26 at 15:02

zmotuck

13 bronze badges

for fn in os.listdir(path):
 file = os.path.join(path, fn)
 if os.path.isfile(file):
 ##I need the second sheet of each spreadsheet
 data = pd.read_excel(file, sheet_name=1, index=False)
 relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of 
 ##useless data, and caps the 
 ##columns at 48.
 relevantData.dropna(how='all')

import os
import pandas as pd


dfList = []
path = 'H:DirectoryWithExcelFiles'
newpath = 'H:FolderWithNewFiles_ThatContain_act'
newpath2 = 'H:FolderWithNonStandardFiles_DontContain_act'


for fn in os.listdir(path):
 file = os.path.join(path, fn)
 if os.path.isfile(file):
 ##I need the second sheet of each spreadsheet
 data = pd.read_excel(file, sheet_name=1, index=False)
 relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of 
 ##useless data, and caps the 
 ##columns at 48.
 relevantData.dropna(how='all')
 sub = 'act'


 if sub in relevantData[0,0]:
 # Create a Pandas Excel writer using XlsxWriter as the engine.
 writer1 = pd.ExcelWriter('H:FolderWithNewFiles_ThatContain_act\' + fn, engine='xlsxwriter')

 # Convert the dataframe to an XlsxWriter Excel object.
 relevantData.to_excel(writer1, sheet_name='Sheet1', index=False, header=None)

 # Close the Pandas Excel writer and output the Excel file.
 writer1.save()

 else:
 ## Ideally I would want to add the new row to the data frame here at 
 ##9th position and then send back to the beginning of the loop.
 writer2 = pd.ExcelWriter('H:FolderWithNonStandardFiles_DontContain_act\' + fn, engine='xlsxwriter')
 # Convert the dataframe to an XlsxWriter Excel object.
 relevantData.to_excel(writer2, sheet_name='Sheet1', index=False, header=None)

 # Close the Pandas Excel writer and output the Excel file.
 writer2.save()

Does anyone have any solutions as to why the for loop will not recognize the iloc[] positioning and validate if the string exists? I can provide sample spreadsheets if asked.

excel python-3.x pandas for-loop substring

edited Mar 26 at 17:38

asked Mar 26 at 15:02

zmotuck

13 bronze badges

edited Mar 26 at 17:38

asked Mar 26 at 15:02

zmotuck

13 bronze badges

edited Mar 26 at 17:38

asked Mar 26 at 15:02

zmotuck

13 bronze badges

asked Mar 26 at 15:02

zmotuck

13 bronze badges

asked Mar 26 at 15:02

zmotuck

13 bronze badges

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55360324%2fhow-to-verify-if-a-value-exists-at-specific-position-in-pandas-dataframe-index%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

vwO1Tc4z1omGaxhlHXHBCZ 8D pWKO41B1slFx4,Kile39gYP

搜尋此網誌

Styjun

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari