How to verify if a value exists at specific position in pandas dataframe (index and column) using a substring and for loopWhat is the most efficient way to loop through dataframes with pandas?Selecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameHow to drop rows of Pandas DataFrame whose value in a certain column is NaN“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headerspandas three-way joining multiple dataframes on columns

How can I stop myself from micromanaging other PCs' actions?

Why is a dedicated QA team member necessary?

"I you already know": is this proper English?

Determine if a triangle is equilateral, isosceles, or scalene

Are gangsters hired to attack people at a train station classified as a terrorist attack?

Is Grandpa Irrational? Another Grandpa Mystery

USA: Can a witness take the 5th to avoid perjury?

kids pooling money for Lego League and taxes

Where to place an artificial gland in the human body?

What exactly makes a General Products hull nearly indestructible?

Why are so many countries still in the Commonwealth?

How can I create a shape in Illustrator which follows a path in descending order size?

What do teaching faculty do during semester breaks?

Grid/table with lots of buttons

Why is chess failing to attract big name sponsors?

Moving files accidentally to an not existing directory erases files?

Other than a swing wing, what types of variable geometry have flown?

How important is a good quality camera for good photography?

arcpy.ListFields() displaying numerical field names instead of actual field names

Why did Saturn V not head straight to the moon?

This message is flooding my syslog, how to find where it comes from?

Why are angular mometum and angular velocity not necessarily parallel, but linear momentum and linear velocity are always parallel?

Keeping an "hot eyeball planet" wet

Terence Tao - type books in other fields?



How to verify if a value exists at specific position in pandas dataframe (index and column) using a substring and for loop


What is the most efficient way to loop through dataframes with pandas?Selecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameHow to drop rows of Pandas DataFrame whose value in a certain column is NaN“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headerspandas three-way joining multiple dataframes on columns






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















I have a script that reads in hundreds of excel files from a directory. The files are of similar format (all are .xls or .xlsx, therefore readable by pandas) and the script converts them to pandas dataframes to write them back out in a format that I desire. The first 10 rows of each excel files are not needed, and I don't need all the empty rows and columns,(Example spreadsheet: I need all the data from row 11 down.) so I strip them out using df.iloc[9:, 0:47] (47 is the max extent of columns) and use df.dropna(how='all') to drop all the rows with empty values.



for fn in os.listdir(path):
file = os.path.join(path, fn)
if os.path.isfile(file):
##I need the second sheet of each spreadsheet
data = pd.read_excel(file, sheet_name=1, index=False)
relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of
##useless data, and caps the
##columns at 48.
relevantData.dropna(how='all')


However, it has come to my attention that some of the excel files are one whole row short. (Example Image.)Thus the headers will not be the same and the data will not append in further processing, and the script stops because of string vs float interpretation errors(the header columns are all strings and the rest of the data is mostly number values). I want to be able to filter out these files, or dynamically add another row to make sure they conform to the other majority of files.



Is there a way to include in a for-loop a check of the df.iloc[0,0] position (even after I have already used df.iloc previously in the code), to contain a substring that would indicate that the file is missing a row? If the value of the dataframe cell has the substring 'act', then I know that the format is correct and it can continue for further processing. If the substring does not exist (aka the file is 1 row short or a nonstandard format) then either add a row at the 9th position index (if possible), or at the least spit these files out to preprocess manually.



I have tried using if df.iloc[row position, column position].str.contains('act'):, but it appears the placement of this code in the script renders the dataframe value as not a series? It throws exception: AttributeError: 'str' object has no attribute 'str'.



Then I tried a different approach: if sub in df.iloc[0,0]: (where sub is a variable = 'act'.
But even when 'act' existed in that position, the script was sending the values to the False portion of the for loop, however I need it to send it to the True route.
(Also tried if sub in df[0,0]:, this throws error: KeyError: (0, 0))



import os
import pandas as pd


dfList = []
path = 'H:DirectoryWithExcelFiles'
newpath = 'H:FolderWithNewFiles_ThatContain_act'
newpath2 = 'H:FolderWithNonStandardFiles_DontContain_act'


for fn in os.listdir(path):
file = os.path.join(path, fn)
if os.path.isfile(file):
##I need the second sheet of each spreadsheet
data = pd.read_excel(file, sheet_name=1, index=False)
relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of
##useless data, and caps the
##columns at 48.
relevantData.dropna(how='all')
sub = 'act'


if sub in relevantData[0,0]:
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer1 = pd.ExcelWriter('H:FolderWithNewFiles_ThatContain_act\' + fn, engine='xlsxwriter')

# Convert the dataframe to an XlsxWriter Excel object.
relevantData.to_excel(writer1, sheet_name='Sheet1', index=False, header=None)

# Close the Pandas Excel writer and output the Excel file.
writer1.save()

else:
## Ideally I would want to add the new row to the data frame here at
##9th position and then send back to the beginning of the loop.
writer2 = pd.ExcelWriter('H:FolderWithNonStandardFiles_DontContain_act\' + fn, engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
relevantData.to_excel(writer2, sheet_name='Sheet1', index=False, header=None)

# Close the Pandas Excel writer and output the Excel file.
writer2.save()


I expect that if the substring 'act' is in the data value at df[0,0] that the files should be outputted to newpath1, however I can only get the files to pass in the else: statement (newpath2), even though when printing the value before passing it through the 2nd if loop, print (relevantData.iloc[0,0]) the value clearly contains the substring 'act' within the string 'Action Item/Dig #'(Example picture of positon clearly having the substring.)



Does anyone have any solutions as to why the for loop will not recognize the iloc[] positioning and validate if the string exists? I can provide sample spreadsheets if asked.










share|improve this question






























    0















    I have a script that reads in hundreds of excel files from a directory. The files are of similar format (all are .xls or .xlsx, therefore readable by pandas) and the script converts them to pandas dataframes to write them back out in a format that I desire. The first 10 rows of each excel files are not needed, and I don't need all the empty rows and columns,(Example spreadsheet: I need all the data from row 11 down.) so I strip them out using df.iloc[9:, 0:47] (47 is the max extent of columns) and use df.dropna(how='all') to drop all the rows with empty values.



    for fn in os.listdir(path):
    file = os.path.join(path, fn)
    if os.path.isfile(file):
    ##I need the second sheet of each spreadsheet
    data = pd.read_excel(file, sheet_name=1, index=False)
    relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of
    ##useless data, and caps the
    ##columns at 48.
    relevantData.dropna(how='all')


    However, it has come to my attention that some of the excel files are one whole row short. (Example Image.)Thus the headers will not be the same and the data will not append in further processing, and the script stops because of string vs float interpretation errors(the header columns are all strings and the rest of the data is mostly number values). I want to be able to filter out these files, or dynamically add another row to make sure they conform to the other majority of files.



    Is there a way to include in a for-loop a check of the df.iloc[0,0] position (even after I have already used df.iloc previously in the code), to contain a substring that would indicate that the file is missing a row? If the value of the dataframe cell has the substring 'act', then I know that the format is correct and it can continue for further processing. If the substring does not exist (aka the file is 1 row short or a nonstandard format) then either add a row at the 9th position index (if possible), or at the least spit these files out to preprocess manually.



    I have tried using if df.iloc[row position, column position].str.contains('act'):, but it appears the placement of this code in the script renders the dataframe value as not a series? It throws exception: AttributeError: 'str' object has no attribute 'str'.



    Then I tried a different approach: if sub in df.iloc[0,0]: (where sub is a variable = 'act'.
    But even when 'act' existed in that position, the script was sending the values to the False portion of the for loop, however I need it to send it to the True route.
    (Also tried if sub in df[0,0]:, this throws error: KeyError: (0, 0))



    import os
    import pandas as pd


    dfList = []
    path = 'H:DirectoryWithExcelFiles'
    newpath = 'H:FolderWithNewFiles_ThatContain_act'
    newpath2 = 'H:FolderWithNonStandardFiles_DontContain_act'


    for fn in os.listdir(path):
    file = os.path.join(path, fn)
    if os.path.isfile(file):
    ##I need the second sheet of each spreadsheet
    data = pd.read_excel(file, sheet_name=1, index=False)
    relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of
    ##useless data, and caps the
    ##columns at 48.
    relevantData.dropna(how='all')
    sub = 'act'


    if sub in relevantData[0,0]:
    # Create a Pandas Excel writer using XlsxWriter as the engine.
    writer1 = pd.ExcelWriter('H:FolderWithNewFiles_ThatContain_act\' + fn, engine='xlsxwriter')

    # Convert the dataframe to an XlsxWriter Excel object.
    relevantData.to_excel(writer1, sheet_name='Sheet1', index=False, header=None)

    # Close the Pandas Excel writer and output the Excel file.
    writer1.save()

    else:
    ## Ideally I would want to add the new row to the data frame here at
    ##9th position and then send back to the beginning of the loop.
    writer2 = pd.ExcelWriter('H:FolderWithNonStandardFiles_DontContain_act\' + fn, engine='xlsxwriter')
    # Convert the dataframe to an XlsxWriter Excel object.
    relevantData.to_excel(writer2, sheet_name='Sheet1', index=False, header=None)

    # Close the Pandas Excel writer and output the Excel file.
    writer2.save()


    I expect that if the substring 'act' is in the data value at df[0,0] that the files should be outputted to newpath1, however I can only get the files to pass in the else: statement (newpath2), even though when printing the value before passing it through the 2nd if loop, print (relevantData.iloc[0,0]) the value clearly contains the substring 'act' within the string 'Action Item/Dig #'(Example picture of positon clearly having the substring.)



    Does anyone have any solutions as to why the for loop will not recognize the iloc[] positioning and validate if the string exists? I can provide sample spreadsheets if asked.










    share|improve this question


























      0












      0








      0








      I have a script that reads in hundreds of excel files from a directory. The files are of similar format (all are .xls or .xlsx, therefore readable by pandas) and the script converts them to pandas dataframes to write them back out in a format that I desire. The first 10 rows of each excel files are not needed, and I don't need all the empty rows and columns,(Example spreadsheet: I need all the data from row 11 down.) so I strip them out using df.iloc[9:, 0:47] (47 is the max extent of columns) and use df.dropna(how='all') to drop all the rows with empty values.



      for fn in os.listdir(path):
      file = os.path.join(path, fn)
      if os.path.isfile(file):
      ##I need the second sheet of each spreadsheet
      data = pd.read_excel(file, sheet_name=1, index=False)
      relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of
      ##useless data, and caps the
      ##columns at 48.
      relevantData.dropna(how='all')


      However, it has come to my attention that some of the excel files are one whole row short. (Example Image.)Thus the headers will not be the same and the data will not append in further processing, and the script stops because of string vs float interpretation errors(the header columns are all strings and the rest of the data is mostly number values). I want to be able to filter out these files, or dynamically add another row to make sure they conform to the other majority of files.



      Is there a way to include in a for-loop a check of the df.iloc[0,0] position (even after I have already used df.iloc previously in the code), to contain a substring that would indicate that the file is missing a row? If the value of the dataframe cell has the substring 'act', then I know that the format is correct and it can continue for further processing. If the substring does not exist (aka the file is 1 row short or a nonstandard format) then either add a row at the 9th position index (if possible), or at the least spit these files out to preprocess manually.



      I have tried using if df.iloc[row position, column position].str.contains('act'):, but it appears the placement of this code in the script renders the dataframe value as not a series? It throws exception: AttributeError: 'str' object has no attribute 'str'.



      Then I tried a different approach: if sub in df.iloc[0,0]: (where sub is a variable = 'act'.
      But even when 'act' existed in that position, the script was sending the values to the False portion of the for loop, however I need it to send it to the True route.
      (Also tried if sub in df[0,0]:, this throws error: KeyError: (0, 0))



      import os
      import pandas as pd


      dfList = []
      path = 'H:DirectoryWithExcelFiles'
      newpath = 'H:FolderWithNewFiles_ThatContain_act'
      newpath2 = 'H:FolderWithNonStandardFiles_DontContain_act'


      for fn in os.listdir(path):
      file = os.path.join(path, fn)
      if os.path.isfile(file):
      ##I need the second sheet of each spreadsheet
      data = pd.read_excel(file, sheet_name=1, index=False)
      relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of
      ##useless data, and caps the
      ##columns at 48.
      relevantData.dropna(how='all')
      sub = 'act'


      if sub in relevantData[0,0]:
      # Create a Pandas Excel writer using XlsxWriter as the engine.
      writer1 = pd.ExcelWriter('H:FolderWithNewFiles_ThatContain_act\' + fn, engine='xlsxwriter')

      # Convert the dataframe to an XlsxWriter Excel object.
      relevantData.to_excel(writer1, sheet_name='Sheet1', index=False, header=None)

      # Close the Pandas Excel writer and output the Excel file.
      writer1.save()

      else:
      ## Ideally I would want to add the new row to the data frame here at
      ##9th position and then send back to the beginning of the loop.
      writer2 = pd.ExcelWriter('H:FolderWithNonStandardFiles_DontContain_act\' + fn, engine='xlsxwriter')
      # Convert the dataframe to an XlsxWriter Excel object.
      relevantData.to_excel(writer2, sheet_name='Sheet1', index=False, header=None)

      # Close the Pandas Excel writer and output the Excel file.
      writer2.save()


      I expect that if the substring 'act' is in the data value at df[0,0] that the files should be outputted to newpath1, however I can only get the files to pass in the else: statement (newpath2), even though when printing the value before passing it through the 2nd if loop, print (relevantData.iloc[0,0]) the value clearly contains the substring 'act' within the string 'Action Item/Dig #'(Example picture of positon clearly having the substring.)



      Does anyone have any solutions as to why the for loop will not recognize the iloc[] positioning and validate if the string exists? I can provide sample spreadsheets if asked.










      share|improve this question
















      I have a script that reads in hundreds of excel files from a directory. The files are of similar format (all are .xls or .xlsx, therefore readable by pandas) and the script converts them to pandas dataframes to write them back out in a format that I desire. The first 10 rows of each excel files are not needed, and I don't need all the empty rows and columns,(Example spreadsheet: I need all the data from row 11 down.) so I strip them out using df.iloc[9:, 0:47] (47 is the max extent of columns) and use df.dropna(how='all') to drop all the rows with empty values.



      for fn in os.listdir(path):
      file = os.path.join(path, fn)
      if os.path.isfile(file):
      ##I need the second sheet of each spreadsheet
      data = pd.read_excel(file, sheet_name=1, index=False)
      relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of
      ##useless data, and caps the
      ##columns at 48.
      relevantData.dropna(how='all')


      However, it has come to my attention that some of the excel files are one whole row short. (Example Image.)Thus the headers will not be the same and the data will not append in further processing, and the script stops because of string vs float interpretation errors(the header columns are all strings and the rest of the data is mostly number values). I want to be able to filter out these files, or dynamically add another row to make sure they conform to the other majority of files.



      Is there a way to include in a for-loop a check of the df.iloc[0,0] position (even after I have already used df.iloc previously in the code), to contain a substring that would indicate that the file is missing a row? If the value of the dataframe cell has the substring 'act', then I know that the format is correct and it can continue for further processing. If the substring does not exist (aka the file is 1 row short or a nonstandard format) then either add a row at the 9th position index (if possible), or at the least spit these files out to preprocess manually.



      I have tried using if df.iloc[row position, column position].str.contains('act'):, but it appears the placement of this code in the script renders the dataframe value as not a series? It throws exception: AttributeError: 'str' object has no attribute 'str'.



      Then I tried a different approach: if sub in df.iloc[0,0]: (where sub is a variable = 'act'.
      But even when 'act' existed in that position, the script was sending the values to the False portion of the for loop, however I need it to send it to the True route.
      (Also tried if sub in df[0,0]:, this throws error: KeyError: (0, 0))



      import os
      import pandas as pd


      dfList = []
      path = 'H:DirectoryWithExcelFiles'
      newpath = 'H:FolderWithNewFiles_ThatContain_act'
      newpath2 = 'H:FolderWithNonStandardFiles_DontContain_act'


      for fn in os.listdir(path):
      file = os.path.join(path, fn)
      if os.path.isfile(file):
      ##I need the second sheet of each spreadsheet
      data = pd.read_excel(file, sheet_name=1, index=False)
      relevantData = data.iloc[9:, 0:47] ##This removes the first 10 rows of
      ##useless data, and caps the
      ##columns at 48.
      relevantData.dropna(how='all')
      sub = 'act'


      if sub in relevantData[0,0]:
      # Create a Pandas Excel writer using XlsxWriter as the engine.
      writer1 = pd.ExcelWriter('H:FolderWithNewFiles_ThatContain_act\' + fn, engine='xlsxwriter')

      # Convert the dataframe to an XlsxWriter Excel object.
      relevantData.to_excel(writer1, sheet_name='Sheet1', index=False, header=None)

      # Close the Pandas Excel writer and output the Excel file.
      writer1.save()

      else:
      ## Ideally I would want to add the new row to the data frame here at
      ##9th position and then send back to the beginning of the loop.
      writer2 = pd.ExcelWriter('H:FolderWithNonStandardFiles_DontContain_act\' + fn, engine='xlsxwriter')
      # Convert the dataframe to an XlsxWriter Excel object.
      relevantData.to_excel(writer2, sheet_name='Sheet1', index=False, header=None)

      # Close the Pandas Excel writer and output the Excel file.
      writer2.save()


      I expect that if the substring 'act' is in the data value at df[0,0] that the files should be outputted to newpath1, however I can only get the files to pass in the else: statement (newpath2), even though when printing the value before passing it through the 2nd if loop, print (relevantData.iloc[0,0]) the value clearly contains the substring 'act' within the string 'Action Item/Dig #'(Example picture of positon clearly having the substring.)



      Does anyone have any solutions as to why the for loop will not recognize the iloc[] positioning and validate if the string exists? I can provide sample spreadsheets if asked.







      excel python-3.x pandas for-loop substring






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 26 at 17:38







      zmotuck

















      asked Mar 26 at 15:02









      zmotuckzmotuck

      13 bronze badges




      13 bronze badges






















          0






          active

          oldest

          votes










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55360324%2fhow-to-verify-if-a-value-exists-at-specific-position-in-pandas-dataframe-index%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes




          Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.







          Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.



















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55360324%2fhow-to-verify-if-a-value-exists-at-specific-position-in-pandas-dataframe-index%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

          Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

          Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript