Merging and cleaning up csv files in PythonHow to merge two dictionaries in a single expression?How do I check whether a file exists without exceptions?Calling an external command in PythonWhat are metaclasses in Python?Finding the index of an item given a list containing it in PythonHow can I safely create a nested directory?Does Python have a ternary conditional operator?How do I list all files of a directory?Does Python have a string 'contains' substring method?“Large data” work flows using pandas
Better random (unique) file name
Why would "dead languages" be the only languages that spells could be written in?
What happens if the limit of 4 billion files was exceeded in an ext4 partition?
How to find the version of extensions used on a Joomla website without access to the backend?
Examples of fluid (including air) being used to transmit digital data?
Do Goblin tokens count as Goblins?
Wouldn't putting an electronic key inside a small Faraday cage render it completely useless?
Taking advantage when HR forgets to communicate the rules
Is a lowball salary then a part-time offer standard Japanese employment negotiations?
What was the significance of Spider-Man: Far From Home being an MCU Phase 3 film instead of a Phase 4 film?
Passwordless authentication - how invalidate login code
How do resistors generate different heat if we make the current fixed and changed the voltage and resistance? Notice the flow of charge is constant
Find max number you can create from an array of numbers
How predictable is $RANDOM really?
What do I need to see before Spider-Man: Far From Home?
Implicit conversion between decimals with different precisions
How to reclaim personal item I've lent to the office without burning bridges?
Do grungs have a written language?
Park the computer
Chilling juice in copper vessel
I'm feeling like my character doesn't fit the campaign
Can the Four Elements monk's Shape the Flowing River elemental discipline create stairs by expending a single ki point?
How to factor a fourth degree polynomial
Why does this function pointer assignment work when assigned directly but not with the conditional operator?
Merging and cleaning up csv files in Python
How to merge two dictionaries in a single expression?How do I check whether a file exists without exceptions?Calling an external command in PythonWhat are metaclasses in Python?Finding the index of an item given a list containing it in PythonHow can I safely create a nested directory?Does Python have a ternary conditional operator?How do I list all files of a directory?Does Python have a string 'contains' substring method?“Large data” work flows using pandas
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I have been using pandas but am open to all suggestions, I'm not an expert at scripting but am a complete loss. My goal is the following:
- Merge multiple CSV files. Was able to do this in Pandas and have a dataframe with the merged dataset.
Screenshot of how merged dataset looks like
Delete the duplicated "GEO" columns after the first set. This last part doesn't let me use
df = df.loc[:,~df.columns.duplicated()]
because they are not technically duplicated.The repeated column names end with a .1,.2,etc. as I am guessing the concate adds this. Other problem is that some columns have a duplicated column name but are different datasets. I have been using the first row as the index since it's always the same coded values but this row is unnecessary and will be deleted afterwards in the script. This is my biggest problem right now.Delete certain columns such as the ones with the "Margins". I use
~df2.columns.str.startswith
for this and have no trouble with this.- Replace spaces, ":" and ";" with underscores in the first row. I have no clue how to do this.
- Insert a new column, write '=TEXT(B1,0)' formula, do this for the whole column (formula would change to B2,B3, etc.), copy the column and paste as values. I was able to do this in openpyxl although was having trouble and was not able to try the final output thanks to excel trouble.
source = excel.Workbooks.Open(filename)
excel.Range("C1:C1337").Select()
excel.Selection.Copy()
excel.Selection.PasteSpecial(Paste=constants.xlPasteValues)
Not sure if it works and was wondering if it was possible in pandas, win32com or I should stay with openpyxl. Thanks all!
python excel
add a comment |
I have been using pandas but am open to all suggestions, I'm not an expert at scripting but am a complete loss. My goal is the following:
- Merge multiple CSV files. Was able to do this in Pandas and have a dataframe with the merged dataset.
Screenshot of how merged dataset looks like
Delete the duplicated "GEO" columns after the first set. This last part doesn't let me use
df = df.loc[:,~df.columns.duplicated()]
because they are not technically duplicated.The repeated column names end with a .1,.2,etc. as I am guessing the concate adds this. Other problem is that some columns have a duplicated column name but are different datasets. I have been using the first row as the index since it's always the same coded values but this row is unnecessary and will be deleted afterwards in the script. This is my biggest problem right now.Delete certain columns such as the ones with the "Margins". I use
~df2.columns.str.startswith
for this and have no trouble with this.- Replace spaces, ":" and ";" with underscores in the first row. I have no clue how to do this.
- Insert a new column, write '=TEXT(B1,0)' formula, do this for the whole column (formula would change to B2,B3, etc.), copy the column and paste as values. I was able to do this in openpyxl although was having trouble and was not able to try the final output thanks to excel trouble.
source = excel.Workbooks.Open(filename)
excel.Range("C1:C1337").Select()
excel.Selection.Copy()
excel.Selection.PasteSpecial(Paste=constants.xlPasteValues)
Not sure if it works and was wondering if it was possible in pandas, win32com or I should stay with openpyxl. Thanks all!
python excel
Can you please paste the data as text values instead of an image?
– Nathaniel
Mar 25 at 20:55
It looks like some of these problems may be caused by the way you are merging the data frames. You might back-track to there to solve the issues.
– Nathaniel
Mar 25 at 20:57
Thank you @Nathaniel I was able to shorten the code and get the dataframe without .1,.2 at the end. Unfortunately when usingdf = result.columns.duplicated(['Col.id','Col.id2'],keep='first)
I get the following error: TypeError: duplicated() got multiple values for keyword argument 'keep'. Works fine with thestr.startswith()
. But that would delete the first instance of the 'Col.id' which I dont want to do. I tried deleting the 'keep=' but get this error ValueError: keep must be either "first", "last" or False.
– BoniPhila
Mar 26 at 16:37
Glad to hear you are making progress. If you'd like to get some help here, please update your question to include a Minimal, Complete and Verifiable Example which includes example data in text form, the code required to demonstrate the problem, and a description of the desired output. Without this it is difficult to provide useful feedback. It may be helpful to post multiple specific questions and keep the focus on a single issue for each.
– Nathaniel
Mar 26 at 20:23
add a comment |
I have been using pandas but am open to all suggestions, I'm not an expert at scripting but am a complete loss. My goal is the following:
- Merge multiple CSV files. Was able to do this in Pandas and have a dataframe with the merged dataset.
Screenshot of how merged dataset looks like
Delete the duplicated "GEO" columns after the first set. This last part doesn't let me use
df = df.loc[:,~df.columns.duplicated()]
because they are not technically duplicated.The repeated column names end with a .1,.2,etc. as I am guessing the concate adds this. Other problem is that some columns have a duplicated column name but are different datasets. I have been using the first row as the index since it's always the same coded values but this row is unnecessary and will be deleted afterwards in the script. This is my biggest problem right now.Delete certain columns such as the ones with the "Margins". I use
~df2.columns.str.startswith
for this and have no trouble with this.- Replace spaces, ":" and ";" with underscores in the first row. I have no clue how to do this.
- Insert a new column, write '=TEXT(B1,0)' formula, do this for the whole column (formula would change to B2,B3, etc.), copy the column and paste as values. I was able to do this in openpyxl although was having trouble and was not able to try the final output thanks to excel trouble.
source = excel.Workbooks.Open(filename)
excel.Range("C1:C1337").Select()
excel.Selection.Copy()
excel.Selection.PasteSpecial(Paste=constants.xlPasteValues)
Not sure if it works and was wondering if it was possible in pandas, win32com or I should stay with openpyxl. Thanks all!
python excel
I have been using pandas but am open to all suggestions, I'm not an expert at scripting but am a complete loss. My goal is the following:
- Merge multiple CSV files. Was able to do this in Pandas and have a dataframe with the merged dataset.
Screenshot of how merged dataset looks like
Delete the duplicated "GEO" columns after the first set. This last part doesn't let me use
df = df.loc[:,~df.columns.duplicated()]
because they are not technically duplicated.The repeated column names end with a .1,.2,etc. as I am guessing the concate adds this. Other problem is that some columns have a duplicated column name but are different datasets. I have been using the first row as the index since it's always the same coded values but this row is unnecessary and will be deleted afterwards in the script. This is my biggest problem right now.Delete certain columns such as the ones with the "Margins". I use
~df2.columns.str.startswith
for this and have no trouble with this.- Replace spaces, ":" and ";" with underscores in the first row. I have no clue how to do this.
- Insert a new column, write '=TEXT(B1,0)' formula, do this for the whole column (formula would change to B2,B3, etc.), copy the column and paste as values. I was able to do this in openpyxl although was having trouble and was not able to try the final output thanks to excel trouble.
source = excel.Workbooks.Open(filename)
excel.Range("C1:C1337").Select()
excel.Selection.Copy()
excel.Selection.PasteSpecial(Paste=constants.xlPasteValues)
Not sure if it works and was wondering if it was possible in pandas, win32com or I should stay with openpyxl. Thanks all!
python excel
python excel
asked Mar 25 at 20:53
BoniPhilaBoniPhila
157 bronze badges
157 bronze badges
Can you please paste the data as text values instead of an image?
– Nathaniel
Mar 25 at 20:55
It looks like some of these problems may be caused by the way you are merging the data frames. You might back-track to there to solve the issues.
– Nathaniel
Mar 25 at 20:57
Thank you @Nathaniel I was able to shorten the code and get the dataframe without .1,.2 at the end. Unfortunately when usingdf = result.columns.duplicated(['Col.id','Col.id2'],keep='first)
I get the following error: TypeError: duplicated() got multiple values for keyword argument 'keep'. Works fine with thestr.startswith()
. But that would delete the first instance of the 'Col.id' which I dont want to do. I tried deleting the 'keep=' but get this error ValueError: keep must be either "first", "last" or False.
– BoniPhila
Mar 26 at 16:37
Glad to hear you are making progress. If you'd like to get some help here, please update your question to include a Minimal, Complete and Verifiable Example which includes example data in text form, the code required to demonstrate the problem, and a description of the desired output. Without this it is difficult to provide useful feedback. It may be helpful to post multiple specific questions and keep the focus on a single issue for each.
– Nathaniel
Mar 26 at 20:23
add a comment |
Can you please paste the data as text values instead of an image?
– Nathaniel
Mar 25 at 20:55
It looks like some of these problems may be caused by the way you are merging the data frames. You might back-track to there to solve the issues.
– Nathaniel
Mar 25 at 20:57
Thank you @Nathaniel I was able to shorten the code and get the dataframe without .1,.2 at the end. Unfortunately when usingdf = result.columns.duplicated(['Col.id','Col.id2'],keep='first)
I get the following error: TypeError: duplicated() got multiple values for keyword argument 'keep'. Works fine with thestr.startswith()
. But that would delete the first instance of the 'Col.id' which I dont want to do. I tried deleting the 'keep=' but get this error ValueError: keep must be either "first", "last" or False.
– BoniPhila
Mar 26 at 16:37
Glad to hear you are making progress. If you'd like to get some help here, please update your question to include a Minimal, Complete and Verifiable Example which includes example data in text form, the code required to demonstrate the problem, and a description of the desired output. Without this it is difficult to provide useful feedback. It may be helpful to post multiple specific questions and keep the focus on a single issue for each.
– Nathaniel
Mar 26 at 20:23
Can you please paste the data as text values instead of an image?
– Nathaniel
Mar 25 at 20:55
Can you please paste the data as text values instead of an image?
– Nathaniel
Mar 25 at 20:55
It looks like some of these problems may be caused by the way you are merging the data frames. You might back-track to there to solve the issues.
– Nathaniel
Mar 25 at 20:57
It looks like some of these problems may be caused by the way you are merging the data frames. You might back-track to there to solve the issues.
– Nathaniel
Mar 25 at 20:57
Thank you @Nathaniel I was able to shorten the code and get the dataframe without .1,.2 at the end. Unfortunately when using
df = result.columns.duplicated(['Col.id','Col.id2'],keep='first)
I get the following error: TypeError: duplicated() got multiple values for keyword argument 'keep'. Works fine with the str.startswith()
. But that would delete the first instance of the 'Col.id' which I dont want to do. I tried deleting the 'keep=' but get this error ValueError: keep must be either "first", "last" or False.– BoniPhila
Mar 26 at 16:37
Thank you @Nathaniel I was able to shorten the code and get the dataframe without .1,.2 at the end. Unfortunately when using
df = result.columns.duplicated(['Col.id','Col.id2'],keep='first)
I get the following error: TypeError: duplicated() got multiple values for keyword argument 'keep'. Works fine with the str.startswith()
. But that would delete the first instance of the 'Col.id' which I dont want to do. I tried deleting the 'keep=' but get this error ValueError: keep must be either "first", "last" or False.– BoniPhila
Mar 26 at 16:37
Glad to hear you are making progress. If you'd like to get some help here, please update your question to include a Minimal, Complete and Verifiable Example which includes example data in text form, the code required to demonstrate the problem, and a description of the desired output. Without this it is difficult to provide useful feedback. It may be helpful to post multiple specific questions and keep the focus on a single issue for each.
– Nathaniel
Mar 26 at 20:23
Glad to hear you are making progress. If you'd like to get some help here, please update your question to include a Minimal, Complete and Verifiable Example which includes example data in text form, the code required to demonstrate the problem, and a description of the desired output. Without this it is difficult to provide useful feedback. It may be helpful to post multiple specific questions and keep the focus on a single issue for each.
– Nathaniel
Mar 26 at 20:23
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55346245%2fmerging-and-cleaning-up-csv-files-in-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.
Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55346245%2fmerging-and-cleaning-up-csv-files-in-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you please paste the data as text values instead of an image?
– Nathaniel
Mar 25 at 20:55
It looks like some of these problems may be caused by the way you are merging the data frames. You might back-track to there to solve the issues.
– Nathaniel
Mar 25 at 20:57
Thank you @Nathaniel I was able to shorten the code and get the dataframe without .1,.2 at the end. Unfortunately when using
df = result.columns.duplicated(['Col.id','Col.id2'],keep='first)
I get the following error: TypeError: duplicated() got multiple values for keyword argument 'keep'. Works fine with thestr.startswith()
. But that would delete the first instance of the 'Col.id' which I dont want to do. I tried deleting the 'keep=' but get this error ValueError: keep must be either "first", "last" or False.– BoniPhila
Mar 26 at 16:37
Glad to hear you are making progress. If you'd like to get some help here, please update your question to include a Minimal, Complete and Verifiable Example which includes example data in text form, the code required to demonstrate the problem, and a description of the desired output. Without this it is difficult to provide useful feedback. It may be helpful to post multiple specific questions and keep the focus on a single issue for each.
– Nathaniel
Mar 26 at 20:23