Merging and cleaning up csv files in PythonHow to merge two dictionaries in a single expression?How do I check whether a file exists without exceptions?Calling an external command in PythonWhat are metaclasses in Python?Finding the index of an item given a list containing it in PythonHow can I safely create a nested directory?Does Python have a ternary conditional operator?How do I list all files of a directory?Does Python have a string 'contains' substring method?“Large data” work flows using pandas

Better random (unique) file name

Why would "dead languages" be the only languages that spells could be written in?

What happens if the limit of 4 billion files was exceeded in an ext4 partition?

How to find the version of extensions used on a Joomla website without access to the backend?

Examples of fluid (including air) being used to transmit digital data?

Do Goblin tokens count as Goblins?

Wouldn't putting an electronic key inside a small Faraday cage render it completely useless?

Taking advantage when HR forgets to communicate the rules

Is a lowball salary then a part-time offer standard Japanese employment negotiations?

What was the significance of Spider-Man: Far From Home being an MCU Phase 3 film instead of a Phase 4 film?

Passwordless authentication - how invalidate login code

How do resistors generate different heat if we make the current fixed and changed the voltage and resistance? Notice the flow of charge is constant

Find max number you can create from an array of numbers

How predictable is $RANDOM really?

What do I need to see before Spider-Man: Far From Home?

Implicit conversion between decimals with different precisions

How to reclaim personal item I've lent to the office without burning bridges?

Do grungs have a written language?

Park the computer

Chilling juice in copper vessel

I'm feeling like my character doesn't fit the campaign

Can the Four Elements monk's Shape the Flowing River elemental discipline create stairs by expending a single ki point?

How to factor a fourth degree polynomial

Why does this function pointer assignment work when assigned directly but not with the conditional operator?



Merging and cleaning up csv files in Python


How to merge two dictionaries in a single expression?How do I check whether a file exists without exceptions?Calling an external command in PythonWhat are metaclasses in Python?Finding the index of an item given a list containing it in PythonHow can I safely create a nested directory?Does Python have a ternary conditional operator?How do I list all files of a directory?Does Python have a string 'contains' substring method?“Large data” work flows using pandas






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















I have been using pandas but am open to all suggestions, I'm not an expert at scripting but am a complete loss. My goal is the following:



  • Merge multiple CSV files. Was able to do this in Pandas and have a dataframe with the merged dataset.

Screenshot of how merged dataset looks like



  • Delete the duplicated "GEO" columns after the first set. This last part doesn't let me usedf = df.loc[:,~df.columns.duplicated()] because they are not technically duplicated.The repeated column names end with a .1,.2,etc. as I am guessing the concate adds this. Other problem is that some columns have a duplicated column name but are different datasets. I have been using the first row as the index since it's always the same coded values but this row is unnecessary and will be deleted afterwards in the script. This is my biggest problem right now.


  • Delete certain columns such as the ones with the "Margins". I use ~df2.columns.str.startswith for this and have no trouble with this.


  • Replace spaces, ":" and ";" with underscores in the first row. I have no clue how to do this.

  • Insert a new column, write '=TEXT(B1,0)' formula, do this for the whole column (formula would change to B2,B3, etc.), copy the column and paste as values. I was able to do this in openpyxl although was having trouble and was not able to try the final output thanks to excel trouble.
    source = excel.Workbooks.Open(filename)
    excel.Range("C1:C1337").Select()
    excel.Selection.Copy()
    excel.Selection.PasteSpecial(Paste=constants.xlPasteValues)

    Not sure if it works and was wondering if it was possible in pandas, win32com or I should stay with openpyxl. Thanks all!









share|improve this question






















  • Can you please paste the data as text values instead of an image?

    – Nathaniel
    Mar 25 at 20:55











  • It looks like some of these problems may be caused by the way you are merging the data frames. You might back-track to there to solve the issues.

    – Nathaniel
    Mar 25 at 20:57











  • Thank you @Nathaniel I was able to shorten the code and get the dataframe without .1,.2 at the end. Unfortunately when using df = result.columns.duplicated(['Col.id','Col.id2'],keep='first) I get the following error: TypeError: duplicated() got multiple values for keyword argument 'keep'. Works fine with the str.startswith(). But that would delete the first instance of the 'Col.id' which I dont want to do. I tried deleting the 'keep=' but get this error ValueError: keep must be either "first", "last" or False.

    – BoniPhila
    Mar 26 at 16:37












  • Glad to hear you are making progress. If you'd like to get some help here, please update your question to include a Minimal, Complete and Verifiable Example which includes example data in text form, the code required to demonstrate the problem, and a description of the desired output. Without this it is difficult to provide useful feedback. It may be helpful to post multiple specific questions and keep the focus on a single issue for each.

    – Nathaniel
    Mar 26 at 20:23


















0















I have been using pandas but am open to all suggestions, I'm not an expert at scripting but am a complete loss. My goal is the following:



  • Merge multiple CSV files. Was able to do this in Pandas and have a dataframe with the merged dataset.

Screenshot of how merged dataset looks like



  • Delete the duplicated "GEO" columns after the first set. This last part doesn't let me usedf = df.loc[:,~df.columns.duplicated()] because they are not technically duplicated.The repeated column names end with a .1,.2,etc. as I am guessing the concate adds this. Other problem is that some columns have a duplicated column name but are different datasets. I have been using the first row as the index since it's always the same coded values but this row is unnecessary and will be deleted afterwards in the script. This is my biggest problem right now.


  • Delete certain columns such as the ones with the "Margins". I use ~df2.columns.str.startswith for this and have no trouble with this.


  • Replace spaces, ":" and ";" with underscores in the first row. I have no clue how to do this.

  • Insert a new column, write '=TEXT(B1,0)' formula, do this for the whole column (formula would change to B2,B3, etc.), copy the column and paste as values. I was able to do this in openpyxl although was having trouble and was not able to try the final output thanks to excel trouble.
    source = excel.Workbooks.Open(filename)
    excel.Range("C1:C1337").Select()
    excel.Selection.Copy()
    excel.Selection.PasteSpecial(Paste=constants.xlPasteValues)

    Not sure if it works and was wondering if it was possible in pandas, win32com or I should stay with openpyxl. Thanks all!









share|improve this question






















  • Can you please paste the data as text values instead of an image?

    – Nathaniel
    Mar 25 at 20:55











  • It looks like some of these problems may be caused by the way you are merging the data frames. You might back-track to there to solve the issues.

    – Nathaniel
    Mar 25 at 20:57











  • Thank you @Nathaniel I was able to shorten the code and get the dataframe without .1,.2 at the end. Unfortunately when using df = result.columns.duplicated(['Col.id','Col.id2'],keep='first) I get the following error: TypeError: duplicated() got multiple values for keyword argument 'keep'. Works fine with the str.startswith(). But that would delete the first instance of the 'Col.id' which I dont want to do. I tried deleting the 'keep=' but get this error ValueError: keep must be either "first", "last" or False.

    – BoniPhila
    Mar 26 at 16:37












  • Glad to hear you are making progress. If you'd like to get some help here, please update your question to include a Minimal, Complete and Verifiable Example which includes example data in text form, the code required to demonstrate the problem, and a description of the desired output. Without this it is difficult to provide useful feedback. It may be helpful to post multiple specific questions and keep the focus on a single issue for each.

    – Nathaniel
    Mar 26 at 20:23














0












0








0








I have been using pandas but am open to all suggestions, I'm not an expert at scripting but am a complete loss. My goal is the following:



  • Merge multiple CSV files. Was able to do this in Pandas and have a dataframe with the merged dataset.

Screenshot of how merged dataset looks like



  • Delete the duplicated "GEO" columns after the first set. This last part doesn't let me usedf = df.loc[:,~df.columns.duplicated()] because they are not technically duplicated.The repeated column names end with a .1,.2,etc. as I am guessing the concate adds this. Other problem is that some columns have a duplicated column name but are different datasets. I have been using the first row as the index since it's always the same coded values but this row is unnecessary and will be deleted afterwards in the script. This is my biggest problem right now.


  • Delete certain columns such as the ones with the "Margins". I use ~df2.columns.str.startswith for this and have no trouble with this.


  • Replace spaces, ":" and ";" with underscores in the first row. I have no clue how to do this.

  • Insert a new column, write '=TEXT(B1,0)' formula, do this for the whole column (formula would change to B2,B3, etc.), copy the column and paste as values. I was able to do this in openpyxl although was having trouble and was not able to try the final output thanks to excel trouble.
    source = excel.Workbooks.Open(filename)
    excel.Range("C1:C1337").Select()
    excel.Selection.Copy()
    excel.Selection.PasteSpecial(Paste=constants.xlPasteValues)

    Not sure if it works and was wondering if it was possible in pandas, win32com or I should stay with openpyxl. Thanks all!









share|improve this question














I have been using pandas but am open to all suggestions, I'm not an expert at scripting but am a complete loss. My goal is the following:



  • Merge multiple CSV files. Was able to do this in Pandas and have a dataframe with the merged dataset.

Screenshot of how merged dataset looks like



  • Delete the duplicated "GEO" columns after the first set. This last part doesn't let me usedf = df.loc[:,~df.columns.duplicated()] because they are not technically duplicated.The repeated column names end with a .1,.2,etc. as I am guessing the concate adds this. Other problem is that some columns have a duplicated column name but are different datasets. I have been using the first row as the index since it's always the same coded values but this row is unnecessary and will be deleted afterwards in the script. This is my biggest problem right now.


  • Delete certain columns such as the ones with the "Margins". I use ~df2.columns.str.startswith for this and have no trouble with this.


  • Replace spaces, ":" and ";" with underscores in the first row. I have no clue how to do this.

  • Insert a new column, write '=TEXT(B1,0)' formula, do this for the whole column (formula would change to B2,B3, etc.), copy the column and paste as values. I was able to do this in openpyxl although was having trouble and was not able to try the final output thanks to excel trouble.
    source = excel.Workbooks.Open(filename)
    excel.Range("C1:C1337").Select()
    excel.Selection.Copy()
    excel.Selection.PasteSpecial(Paste=constants.xlPasteValues)

    Not sure if it works and was wondering if it was possible in pandas, win32com or I should stay with openpyxl. Thanks all!






python excel






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 25 at 20:53









BoniPhilaBoniPhila

157 bronze badges




157 bronze badges












  • Can you please paste the data as text values instead of an image?

    – Nathaniel
    Mar 25 at 20:55











  • It looks like some of these problems may be caused by the way you are merging the data frames. You might back-track to there to solve the issues.

    – Nathaniel
    Mar 25 at 20:57











  • Thank you @Nathaniel I was able to shorten the code and get the dataframe without .1,.2 at the end. Unfortunately when using df = result.columns.duplicated(['Col.id','Col.id2'],keep='first) I get the following error: TypeError: duplicated() got multiple values for keyword argument 'keep'. Works fine with the str.startswith(). But that would delete the first instance of the 'Col.id' which I dont want to do. I tried deleting the 'keep=' but get this error ValueError: keep must be either "first", "last" or False.

    – BoniPhila
    Mar 26 at 16:37












  • Glad to hear you are making progress. If you'd like to get some help here, please update your question to include a Minimal, Complete and Verifiable Example which includes example data in text form, the code required to demonstrate the problem, and a description of the desired output. Without this it is difficult to provide useful feedback. It may be helpful to post multiple specific questions and keep the focus on a single issue for each.

    – Nathaniel
    Mar 26 at 20:23


















  • Can you please paste the data as text values instead of an image?

    – Nathaniel
    Mar 25 at 20:55











  • It looks like some of these problems may be caused by the way you are merging the data frames. You might back-track to there to solve the issues.

    – Nathaniel
    Mar 25 at 20:57











  • Thank you @Nathaniel I was able to shorten the code and get the dataframe without .1,.2 at the end. Unfortunately when using df = result.columns.duplicated(['Col.id','Col.id2'],keep='first) I get the following error: TypeError: duplicated() got multiple values for keyword argument 'keep'. Works fine with the str.startswith(). But that would delete the first instance of the 'Col.id' which I dont want to do. I tried deleting the 'keep=' but get this error ValueError: keep must be either "first", "last" or False.

    – BoniPhila
    Mar 26 at 16:37












  • Glad to hear you are making progress. If you'd like to get some help here, please update your question to include a Minimal, Complete and Verifiable Example which includes example data in text form, the code required to demonstrate the problem, and a description of the desired output. Without this it is difficult to provide useful feedback. It may be helpful to post multiple specific questions and keep the focus on a single issue for each.

    – Nathaniel
    Mar 26 at 20:23

















Can you please paste the data as text values instead of an image?

– Nathaniel
Mar 25 at 20:55





Can you please paste the data as text values instead of an image?

– Nathaniel
Mar 25 at 20:55













It looks like some of these problems may be caused by the way you are merging the data frames. You might back-track to there to solve the issues.

– Nathaniel
Mar 25 at 20:57





It looks like some of these problems may be caused by the way you are merging the data frames. You might back-track to there to solve the issues.

– Nathaniel
Mar 25 at 20:57













Thank you @Nathaniel I was able to shorten the code and get the dataframe without .1,.2 at the end. Unfortunately when using df = result.columns.duplicated(['Col.id','Col.id2'],keep='first) I get the following error: TypeError: duplicated() got multiple values for keyword argument 'keep'. Works fine with the str.startswith(). But that would delete the first instance of the 'Col.id' which I dont want to do. I tried deleting the 'keep=' but get this error ValueError: keep must be either "first", "last" or False.

– BoniPhila
Mar 26 at 16:37






Thank you @Nathaniel I was able to shorten the code and get the dataframe without .1,.2 at the end. Unfortunately when using df = result.columns.duplicated(['Col.id','Col.id2'],keep='first) I get the following error: TypeError: duplicated() got multiple values for keyword argument 'keep'. Works fine with the str.startswith(). But that would delete the first instance of the 'Col.id' which I dont want to do. I tried deleting the 'keep=' but get this error ValueError: keep must be either "first", "last" or False.

– BoniPhila
Mar 26 at 16:37














Glad to hear you are making progress. If you'd like to get some help here, please update your question to include a Minimal, Complete and Verifiable Example which includes example data in text form, the code required to demonstrate the problem, and a description of the desired output. Without this it is difficult to provide useful feedback. It may be helpful to post multiple specific questions and keep the focus on a single issue for each.

– Nathaniel
Mar 26 at 20:23






Glad to hear you are making progress. If you'd like to get some help here, please update your question to include a Minimal, Complete and Verifiable Example which includes example data in text form, the code required to demonstrate the problem, and a description of the desired output. Without this it is difficult to provide useful feedback. It may be helpful to post multiple specific questions and keep the focus on a single issue for each.

– Nathaniel
Mar 26 at 20:23













0






active

oldest

votes










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55346245%2fmerging-and-cleaning-up-csv-files-in-python%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes




Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.







Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55346245%2fmerging-and-cleaning-up-csv-files-in-python%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript