How to verify if a value exists at specific position in pandas dataframe (index and column) using a substring and for loopWhat is the most efficient way to loop through dataframes with pandas?Selecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameHow to drop rows of Pandas DataFrame whose value in a certain column is NaN“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headerspandas three-way joining multiple dataframes on columns
How can I stop myself from micromanaging other PCs' actions?
Why is a dedicated QA team member necessary?
"I you already know": is this proper English?
Determine if a triangle is equilateral, isosceles, or scalene
Are gangsters hired to attack people at a train station classified as a terrorist attack?
Is Grandpa Irrational? Another Grandpa Mystery
USA: Can a witness take the 5th to avoid perjury?
kids pooling money for Lego League and taxes
Where to place an artificial gland in the human body?
What exactly makes a General Products hull nearly indestructible?
Why are so many countries still in the Commonwealth?
How can I create a shape in Illustrator which follows a path in descending order size?
What do teaching faculty do during semester breaks?
Grid/table with lots of buttons
Why is chess failing to attract big name sponsors?
Moving files accidentally to an not existing directory erases files?
Other than a swing wing, what types of variable geometry have flown?
How important is a good quality camera for good photography?
arcpy.ListFields() displaying numerical field names instead of actual field names
Why did Saturn V not head straight to the moon?
This message is flooding my syslog, how to find where it comes from?
Why are angular mometum and angular velocity not necessarily parallel, but linear momentum and linear velocity are always parallel?
Keeping an "hot eyeball planet" wet
Terence Tao - type books in other fields?
How to verify if a value exists at specific position in pandas dataframe (index and column) using a substring and for loop
What is the most efficient way to loop through dataframes with pandas?Selecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameHow to drop rows of Pandas DataFrame whose value in a certain column is NaN“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headerspandas three-way joining multiple dataframes on columns
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I have a script that reads in hundreds of excel files from a directory. The files are of similar format (all are .xls or .xlsx, therefore readable by pandas) and the script converts them to pandas dataframes to write them back out in a format that I desire. The first 10 rows of each excel files are not needed, and I don't need all the empty rows and columns,(Example spreadsheet: I need all the data from row 11 down.) so I strip them out using df.iloc[9:, 0:47]
(47 is the max extent of columns) and use df.dropna(how='all')
to drop all the rows with empty values.
for fn in os.listdir(path):
file = os.path.join(path, fn)
if os.path.isfile(file):
##I need the second sheet of each spreadsheet
data = pd.read_excel(file, sheet_name=1, index=False)
relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of
##useless data, and caps the
##columns at 48.
relevantData.dropna(how='all')
However, it has come to my attention that some of the excel files are one whole row short. (Example Image.)Thus the headers will not be the same and the data will not append in further processing, and the script stops because of string vs float interpretation errors(the header columns are all strings and the rest of the data is mostly number values). I want to be able to filter out these files, or dynamically add another row to make sure they conform to the other majority of files.
Is there a way to include in a for-loop a check of the df.iloc[0,0]
position (even after I have already used df.iloc
previously in the code), to contain a substring that would indicate that the file is missing a row? If the value of the dataframe cell has the substring 'act', then I know that the format is correct and it can continue for further processing. If the substring does not exist (aka the file is 1 row short or a nonstandard format) then either add a row at the 9th position index (if possible), or at the least spit these files out to preprocess manually.
I have tried using if df.iloc[row position, column position].str.contains('act'):
, but it appears the placement of this code in the script renders the dataframe value as not a series? It throws exception: AttributeError: 'str' object has no attribute 'str'
.
Then I tried a different approach: if sub in df.iloc[0,0]:
(where sub is a variable = 'act'.
But even when 'act' existed in that position, the script was sending the values to the False portion of the for loop, however I need it to send it to the True route.
(Also tried if sub in df[0,0]:
, this throws error: KeyError: (0, 0)
)
import os
import pandas as pd
dfList = []
path = 'H:DirectoryWithExcelFiles'
newpath = 'H:FolderWithNewFiles_ThatContain_act'
newpath2 = 'H:FolderWithNonStandardFiles_DontContain_act'
for fn in os.listdir(path):
file = os.path.join(path, fn)
if os.path.isfile(file):
##I need the second sheet of each spreadsheet
data = pd.read_excel(file, sheet_name=1, index=False)
relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of
##useless data, and caps the
##columns at 48.
relevantData.dropna(how='all')
sub = 'act'
if sub in relevantData[0,0]:
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer1 = pd.ExcelWriter('H:FolderWithNewFiles_ThatContain_act\' + fn, engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
relevantData.to_excel(writer1, sheet_name='Sheet1', index=False, header=None)
# Close the Pandas Excel writer and output the Excel file.
writer1.save()
else:
## Ideally I would want to add the new row to the data frame here at
##9th position and then send back to the beginning of the loop.
writer2 = pd.ExcelWriter('H:FolderWithNonStandardFiles_DontContain_act\' + fn, engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
relevantData.to_excel(writer2, sheet_name='Sheet1', index=False, header=None)
# Close the Pandas Excel writer and output the Excel file.
writer2.save()
I expect that if the substring 'act'
is in the data value at df[0,0]
that the files should be outputted to newpath1
, however I can only get the files to pass in the else:
statement (newpath2
), even though when printing the value before passing it through the 2nd if loop, print (relevantData.iloc[0,0])
the value clearly contains the substring 'act' within the string 'Action Item/Dig #'(Example picture of positon clearly having the substring.)
Does anyone have any solutions as to why the for loop will not recognize the iloc[] positioning and validate if the string exists? I can provide sample spreadsheets if asked.
excel python-3.x pandas for-loop substring
add a comment |
I have a script that reads in hundreds of excel files from a directory. The files are of similar format (all are .xls or .xlsx, therefore readable by pandas) and the script converts them to pandas dataframes to write them back out in a format that I desire. The first 10 rows of each excel files are not needed, and I don't need all the empty rows and columns,(Example spreadsheet: I need all the data from row 11 down.) so I strip them out using df.iloc[9:, 0:47]
(47 is the max extent of columns) and use df.dropna(how='all')
to drop all the rows with empty values.
for fn in os.listdir(path):
file = os.path.join(path, fn)
if os.path.isfile(file):
##I need the second sheet of each spreadsheet
data = pd.read_excel(file, sheet_name=1, index=False)
relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of
##useless data, and caps the
##columns at 48.
relevantData.dropna(how='all')
However, it has come to my attention that some of the excel files are one whole row short. (Example Image.)Thus the headers will not be the same and the data will not append in further processing, and the script stops because of string vs float interpretation errors(the header columns are all strings and the rest of the data is mostly number values). I want to be able to filter out these files, or dynamically add another row to make sure they conform to the other majority of files.
Is there a way to include in a for-loop a check of the df.iloc[0,0]
position (even after I have already used df.iloc
previously in the code), to contain a substring that would indicate that the file is missing a row? If the value of the dataframe cell has the substring 'act', then I know that the format is correct and it can continue for further processing. If the substring does not exist (aka the file is 1 row short or a nonstandard format) then either add a row at the 9th position index (if possible), or at the least spit these files out to preprocess manually.
I have tried using if df.iloc[row position, column position].str.contains('act'):
, but it appears the placement of this code in the script renders the dataframe value as not a series? It throws exception: AttributeError: 'str' object has no attribute 'str'
.
Then I tried a different approach: if sub in df.iloc[0,0]:
(where sub is a variable = 'act'.
But even when 'act' existed in that position, the script was sending the values to the False portion of the for loop, however I need it to send it to the True route.
(Also tried if sub in df[0,0]:
, this throws error: KeyError: (0, 0)
)
import os
import pandas as pd
dfList = []
path = 'H:DirectoryWithExcelFiles'
newpath = 'H:FolderWithNewFiles_ThatContain_act'
newpath2 = 'H:FolderWithNonStandardFiles_DontContain_act'
for fn in os.listdir(path):
file = os.path.join(path, fn)
if os.path.isfile(file):
##I need the second sheet of each spreadsheet
data = pd.read_excel(file, sheet_name=1, index=False)
relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of
##useless data, and caps the
##columns at 48.
relevantData.dropna(how='all')
sub = 'act'
if sub in relevantData[0,0]:
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer1 = pd.ExcelWriter('H:FolderWithNewFiles_ThatContain_act\' + fn, engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
relevantData.to_excel(writer1, sheet_name='Sheet1', index=False, header=None)
# Close the Pandas Excel writer and output the Excel file.
writer1.save()
else:
## Ideally I would want to add the new row to the data frame here at
##9th position and then send back to the beginning of the loop.
writer2 = pd.ExcelWriter('H:FolderWithNonStandardFiles_DontContain_act\' + fn, engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
relevantData.to_excel(writer2, sheet_name='Sheet1', index=False, header=None)
# Close the Pandas Excel writer and output the Excel file.
writer2.save()
I expect that if the substring 'act'
is in the data value at df[0,0]
that the files should be outputted to newpath1
, however I can only get the files to pass in the else:
statement (newpath2
), even though when printing the value before passing it through the 2nd if loop, print (relevantData.iloc[0,0])
the value clearly contains the substring 'act' within the string 'Action Item/Dig #'(Example picture of positon clearly having the substring.)
Does anyone have any solutions as to why the for loop will not recognize the iloc[] positioning and validate if the string exists? I can provide sample spreadsheets if asked.
excel python-3.x pandas for-loop substring
add a comment |
I have a script that reads in hundreds of excel files from a directory. The files are of similar format (all are .xls or .xlsx, therefore readable by pandas) and the script converts them to pandas dataframes to write them back out in a format that I desire. The first 10 rows of each excel files are not needed, and I don't need all the empty rows and columns,(Example spreadsheet: I need all the data from row 11 down.) so I strip them out using df.iloc[9:, 0:47]
(47 is the max extent of columns) and use df.dropna(how='all')
to drop all the rows with empty values.
for fn in os.listdir(path):
file = os.path.join(path, fn)
if os.path.isfile(file):
##I need the second sheet of each spreadsheet
data = pd.read_excel(file, sheet_name=1, index=False)
relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of
##useless data, and caps the
##columns at 48.
relevantData.dropna(how='all')
However, it has come to my attention that some of the excel files are one whole row short. (Example Image.)Thus the headers will not be the same and the data will not append in further processing, and the script stops because of string vs float interpretation errors(the header columns are all strings and the rest of the data is mostly number values). I want to be able to filter out these files, or dynamically add another row to make sure they conform to the other majority of files.
Is there a way to include in a for-loop a check of the df.iloc[0,0]
position (even after I have already used df.iloc
previously in the code), to contain a substring that would indicate that the file is missing a row? If the value of the dataframe cell has the substring 'act', then I know that the format is correct and it can continue for further processing. If the substring does not exist (aka the file is 1 row short or a nonstandard format) then either add a row at the 9th position index (if possible), or at the least spit these files out to preprocess manually.
I have tried using if df.iloc[row position, column position].str.contains('act'):
, but it appears the placement of this code in the script renders the dataframe value as not a series? It throws exception: AttributeError: 'str' object has no attribute 'str'
.
Then I tried a different approach: if sub in df.iloc[0,0]:
(where sub is a variable = 'act'.
But even when 'act' existed in that position, the script was sending the values to the False portion of the for loop, however I need it to send it to the True route.
(Also tried if sub in df[0,0]:
, this throws error: KeyError: (0, 0)
)
import os
import pandas as pd
dfList = []
path = 'H:DirectoryWithExcelFiles'
newpath = 'H:FolderWithNewFiles_ThatContain_act'
newpath2 = 'H:FolderWithNonStandardFiles_DontContain_act'
for fn in os.listdir(path):
file = os.path.join(path, fn)
if os.path.isfile(file):
##I need the second sheet of each spreadsheet
data = pd.read_excel(file, sheet_name=1, index=False)
relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of
##useless data, and caps the
##columns at 48.
relevantData.dropna(how='all')
sub = 'act'
if sub in relevantData[0,0]:
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer1 = pd.ExcelWriter('H:FolderWithNewFiles_ThatContain_act\' + fn, engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
relevantData.to_excel(writer1, sheet_name='Sheet1', index=False, header=None)
# Close the Pandas Excel writer and output the Excel file.
writer1.save()
else:
## Ideally I would want to add the new row to the data frame here at
##9th position and then send back to the beginning of the loop.
writer2 = pd.ExcelWriter('H:FolderWithNonStandardFiles_DontContain_act\' + fn, engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
relevantData.to_excel(writer2, sheet_name='Sheet1', index=False, header=None)
# Close the Pandas Excel writer and output the Excel file.
writer2.save()
I expect that if the substring 'act'
is in the data value at df[0,0]
that the files should be outputted to newpath1
, however I can only get the files to pass in the else:
statement (newpath2
), even though when printing the value before passing it through the 2nd if loop, print (relevantData.iloc[0,0])
the value clearly contains the substring 'act' within the string 'Action Item/Dig #'(Example picture of positon clearly having the substring.)
Does anyone have any solutions as to why the for loop will not recognize the iloc[] positioning and validate if the string exists? I can provide sample spreadsheets if asked.
excel python-3.x pandas for-loop substring
I have a script that reads in hundreds of excel files from a directory. The files are of similar format (all are .xls or .xlsx, therefore readable by pandas) and the script converts them to pandas dataframes to write them back out in a format that I desire. The first 10 rows of each excel files are not needed, and I don't need all the empty rows and columns,(Example spreadsheet: I need all the data from row 11 down.) so I strip them out using df.iloc[9:, 0:47]
(47 is the max extent of columns) and use df.dropna(how='all')
to drop all the rows with empty values.
for fn in os.listdir(path):
file = os.path.join(path, fn)
if os.path.isfile(file):
##I need the second sheet of each spreadsheet
data = pd.read_excel(file, sheet_name=1, index=False)
relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of
##useless data, and caps the
##columns at 48.
relevantData.dropna(how='all')
However, it has come to my attention that some of the excel files are one whole row short. (Example Image.)Thus the headers will not be the same and the data will not append in further processing, and the script stops because of string vs float interpretation errors(the header columns are all strings and the rest of the data is mostly number values). I want to be able to filter out these files, or dynamically add another row to make sure they conform to the other majority of files.
Is there a way to include in a for-loop a check of the df.iloc[0,0]
position (even after I have already used df.iloc
previously in the code), to contain a substring that would indicate that the file is missing a row? If the value of the dataframe cell has the substring 'act', then I know that the format is correct and it can continue for further processing. If the substring does not exist (aka the file is 1 row short or a nonstandard format) then either add a row at the 9th position index (if possible), or at the least spit these files out to preprocess manually.
I have tried using if df.iloc[row position, column position].str.contains('act'):
, but it appears the placement of this code in the script renders the dataframe value as not a series? It throws exception: AttributeError: 'str' object has no attribute 'str'
.
Then I tried a different approach: if sub in df.iloc[0,0]:
(where sub is a variable = 'act'.
But even when 'act' existed in that position, the script was sending the values to the False portion of the for loop, however I need it to send it to the True route.
(Also tried if sub in df[0,0]:
, this throws error: KeyError: (0, 0)
)
import os
import pandas as pd
dfList = []
path = 'H:DirectoryWithExcelFiles'
newpath = 'H:FolderWithNewFiles_ThatContain_act'
newpath2 = 'H:FolderWithNonStandardFiles_DontContain_act'
for fn in os.listdir(path):
file = os.path.join(path, fn)
if os.path.isfile(file):
##I need the second sheet of each spreadsheet
data = pd.read_excel(file, sheet_name=1, index=False)
relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of
##useless data, and caps the
##columns at 48.
relevantData.dropna(how='all')
sub = 'act'
if sub in relevantData[0,0]:
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer1 = pd.ExcelWriter('H:FolderWithNewFiles_ThatContain_act\' + fn, engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
relevantData.to_excel(writer1, sheet_name='Sheet1', index=False, header=None)
# Close the Pandas Excel writer and output the Excel file.
writer1.save()
else:
## Ideally I would want to add the new row to the data frame here at
##9th position and then send back to the beginning of the loop.
writer2 = pd.ExcelWriter('H:FolderWithNonStandardFiles_DontContain_act\' + fn, engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
relevantData.to_excel(writer2, sheet_name='Sheet1', index=False, header=None)
# Close the Pandas Excel writer and output the Excel file.
writer2.save()
I expect that if the substring 'act'
is in the data value at df[0,0]
that the files should be outputted to newpath1
, however I can only get the files to pass in the else:
statement (newpath2
), even though when printing the value before passing it through the 2nd if loop, print (relevantData.iloc[0,0])
the value clearly contains the substring 'act' within the string 'Action Item/Dig #'(Example picture of positon clearly having the substring.)
Does anyone have any solutions as to why the for loop will not recognize the iloc[] positioning and validate if the string exists? I can provide sample spreadsheets if asked.
excel python-3.x pandas for-loop substring
excel python-3.x pandas for-loop substring
edited Mar 26 at 17:38
zmotuck
asked Mar 26 at 15:02
zmotuckzmotuck
13 bronze badges
13 bronze badges
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55360324%2fhow-to-verify-if-a-value-exists-at-specific-position-in-pandas-dataframe-index%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.
Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55360324%2fhow-to-verify-if-a-value-exists-at-specific-position-in-pandas-dataframe-index%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown