Merging and cleaning up csv files in PythonHow to merge two dictionaries in a single expression?How do I check whether a file exists without exceptions?Calling an external command in PythonWhat are metaclasses in Python?Finding the index of an item given a list containing it in PythonHow can I safely create a nested directory?Does Python have a ternary conditional operator?How do I list all files of a directory?Does Python have a string 'contains' substring method?“Large data” work flows using pandas

Better random (unique) file name

Why would "dead languages" be the only languages that spells could be written in?

What happens if the limit of 4 billion files was exceeded in an ext4 partition?

How to find the version of extensions used on a Joomla website without access to the backend?

Examples of fluid (including air) being used to transmit digital data?

Do Goblin tokens count as Goblins?

Wouldn't putting an electronic key inside a small Faraday cage render it completely useless?

Taking advantage when HR forgets to communicate the rules

Is a lowball salary then a part-time offer standard Japanese employment negotiations?

What was the significance of Spider-Man: Far From Home being an MCU Phase 3 film instead of a Phase 4 film?

Passwordless authentication - how invalidate login code

How do resistors generate different heat if we make the current fixed and changed the voltage and resistance? Notice the flow of charge is constant

Find max number you can create from an array of numbers

How predictable is $RANDOM really?

What do I need to see before Spider-Man: Far From Home?

Implicit conversion between decimals with different precisions

How to reclaim personal item I've lent to the office without burning bridges?

Do grungs have a written language?

Park the computer

Chilling juice in copper vessel

I'm feeling like my character doesn't fit the campaign

Can the Four Elements monk's Shape the Flowing River elemental discipline create stairs by expending a single ki point?

How to factor a fourth degree polynomial

Why does this function pointer assignment work when assigned directly but not with the conditional operator?

Merging and cleaning up csv files in Python

How to merge two dictionaries in a single expression?How do I check whether a file exists without exceptions?Calling an external command in PythonWhat are metaclasses in Python?Finding the index of an item given a list containing it in PythonHow can I safely create a nested directory?Does Python have a ternary conditional operator?How do I list all files of a directory?Does Python have a string 'contains' substring method?“Large data” work flows using pandas

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I have been using pandas but am open to all suggestions, I'm not an expert at scripting but am a complete loss. My goal is the following:

Merge multiple CSV files. Was able to do this in Pandas and have a dataframe with the merged dataset.

Screenshot of how merged dataset looks like

Delete the duplicated "GEO" columns after the first set. This last part doesn't let me usedf = df.loc[:,~df.columns.duplicated()] because they are not technically duplicated.The repeated column names end with a .1,.2,etc. as I am guessing the concate adds this. Other problem is that some columns have a duplicated column name but are different datasets. I have been using the first row as the index since it's always the same coded values but this row is unnecessary and will be deleted afterwards in the script. This is my biggest problem right now.

Delete certain columns such as the ones with the "Margins". I use ~df2.columns.str.startswith for this and have no trouble with this.

Replace spaces, ":" and ";" with underscores in the first row. I have no clue how to do this.

Insert a new column, write '=TEXT(B1,0)' formula, do this for the whole column (formula would change to B2,B3, etc.), copy the column and paste as values. I was able to do this in openpyxl although was having trouble and was not able to try the final output thanks to excel trouble. source = excel.Workbooks.Open(filename) excel.Range("C1:C1337").Select() excel.Selection.Copy() excel.Selection.PasteSpecial(Paste=constants.xlPasteValues)
Not sure if it works and was wondering if it was possible in pandas, win32com or I should stay with openpyxl. Thanks all!

asked Mar 25 at 20:53

BoniPhila

157 bronze badges

Can you please paste the data as text values instead of an image?

– Nathaniel
Mar 25 at 20:55

It looks like some of these problems may be caused by the way you are merging the data frames. You might back-track to there to solve the issues.

– Nathaniel
Mar 25 at 20:57

Thank you @Nathaniel I was able to shorten the code and get the dataframe without .1,.2 at the end. Unfortunately when using df = result.columns.duplicated(['Col.id','Col.id2'],keep='first) I get the following error: TypeError: duplicated() got multiple values for keyword argument 'keep'. Works fine with the str.startswith(). But that would delete the first instance of the 'Col.id' which I dont want to do. I tried deleting the 'keep=' but get this error ValueError: keep must be either "first", "last" or False.

– BoniPhila
Mar 26 at 16:37

Glad to hear you are making progress. If you'd like to get some help here, please update your question to include a Minimal, Complete and Verifiable Example which includes example data in text form, the code required to demonstrate the problem, and a description of the desired output. Without this it is difficult to provide useful feedback. It may be helpful to post multiple specific questions and keep the focus on a single issue for each.

– Nathaniel
Mar 26 at 20:23

add a comment |

I have been using pandas but am open to all suggestions, I'm not an expert at scripting but am a complete loss. My goal is the following:

Merge multiple CSV files. Was able to do this in Pandas and have a dataframe with the merged dataset.

Screenshot of how merged dataset looks like

Delete the duplicated "GEO" columns after the first set. This last part doesn't let me usedf = df.loc[:,~df.columns.duplicated()] because they are not technically duplicated.The repeated column names end with a .1,.2,etc. as I am guessing the concate adds this. Other problem is that some columns have a duplicated column name but are different datasets. I have been using the first row as the index since it's always the same coded values but this row is unnecessary and will be deleted afterwards in the script. This is my biggest problem right now.

Delete certain columns such as the ones with the "Margins". I use ~df2.columns.str.startswith for this and have no trouble with this.

Replace spaces, ":" and ";" with underscores in the first row. I have no clue how to do this.

Insert a new column, write '=TEXT(B1,0)' formula, do this for the whole column (formula would change to B2,B3, etc.), copy the column and paste as values. I was able to do this in openpyxl although was having trouble and was not able to try the final output thanks to excel trouble. source = excel.Workbooks.Open(filename) excel.Range("C1:C1337").Select() excel.Selection.Copy() excel.Selection.PasteSpecial(Paste=constants.xlPasteValues)
Not sure if it works and was wondering if it was possible in pandas, win32com or I should stay with openpyxl. Thanks all!

asked Mar 25 at 20:53

BoniPhila

157 bronze badges

Can you please paste the data as text values instead of an image?

– Nathaniel
Mar 25 at 20:55

It looks like some of these problems may be caused by the way you are merging the data frames. You might back-track to there to solve the issues.

– Nathaniel
Mar 25 at 20:57

Thank you @Nathaniel I was able to shorten the code and get the dataframe without .1,.2 at the end. Unfortunately when using df = result.columns.duplicated(['Col.id','Col.id2'],keep='first) I get the following error: TypeError: duplicated() got multiple values for keyword argument 'keep'. Works fine with the str.startswith(). But that would delete the first instance of the 'Col.id' which I dont want to do. I tried deleting the 'keep=' but get this error ValueError: keep must be either "first", "last" or False.

– BoniPhila
Mar 26 at 16:37

Glad to hear you are making progress. If you'd like to get some help here, please update your question to include a Minimal, Complete and Verifiable Example which includes example data in text form, the code required to demonstrate the problem, and a description of the desired output. Without this it is difficult to provide useful feedback. It may be helpful to post multiple specific questions and keep the focus on a single issue for each.

– Nathaniel
Mar 26 at 20:23

add a comment |

I have been using pandas but am open to all suggestions, I'm not an expert at scripting but am a complete loss. My goal is the following:

Merge multiple CSV files. Was able to do this in Pandas and have a dataframe with the merged dataset.

Screenshot of how merged dataset looks like

Delete the duplicated "GEO" columns after the first set. This last part doesn't let me usedf = df.loc[:,~df.columns.duplicated()] because they are not technically duplicated.The repeated column names end with a .1,.2,etc. as I am guessing the concate adds this. Other problem is that some columns have a duplicated column name but are different datasets. I have been using the first row as the index since it's always the same coded values but this row is unnecessary and will be deleted afterwards in the script. This is my biggest problem right now.

Delete certain columns such as the ones with the "Margins". I use ~df2.columns.str.startswith for this and have no trouble with this.

Replace spaces, ":" and ";" with underscores in the first row. I have no clue how to do this.

Insert a new column, write '=TEXT(B1,0)' formula, do this for the whole column (formula would change to B2,B3, etc.), copy the column and paste as values. I was able to do this in openpyxl although was having trouble and was not able to try the final output thanks to excel trouble. source = excel.Workbooks.Open(filename) excel.Range("C1:C1337").Select() excel.Selection.Copy() excel.Selection.PasteSpecial(Paste=constants.xlPasteValues)
Not sure if it works and was wondering if it was possible in pandas, win32com or I should stay with openpyxl. Thanks all!

asked Mar 25 at 20:53

BoniPhila

157 bronze badges

I have been using pandas but am open to all suggestions, I'm not an expert at scripting but am a complete loss. My goal is the following:

Merge multiple CSV files. Was able to do this in Pandas and have a dataframe with the merged dataset.

Screenshot of how merged dataset looks like

Delete the duplicated "GEO" columns after the first set. This last part doesn't let me usedf = df.loc[:,~df.columns.duplicated()] because they are not technically duplicated.The repeated column names end with a .1,.2,etc. as I am guessing the concate adds this. Other problem is that some columns have a duplicated column name but are different datasets. I have been using the first row as the index since it's always the same coded values but this row is unnecessary and will be deleted afterwards in the script. This is my biggest problem right now.

Delete certain columns such as the ones with the "Margins". I use ~df2.columns.str.startswith for this and have no trouble with this.

Replace spaces, ":" and ";" with underscores in the first row. I have no clue how to do this.

Insert a new column, write '=TEXT(B1,0)' formula, do this for the whole column (formula would change to B2,B3, etc.), copy the column and paste as values. I was able to do this in openpyxl although was having trouble and was not able to try the final output thanks to excel trouble. source = excel.Workbooks.Open(filename) excel.Range("C1:C1337").Select() excel.Selection.Copy() excel.Selection.PasteSpecial(Paste=constants.xlPasteValues)
Not sure if it works and was wondering if it was possible in pandas, win32com or I should stay with openpyxl. Thanks all!

python excel

asked Mar 25 at 20:53

BoniPhila

157 bronze badges

asked Mar 25 at 20:53

BoniPhila

157 bronze badges

asked Mar 25 at 20:53

BoniPhila

157 bronze badges

asked Mar 25 at 20:53

BoniPhila

157 bronze badges

asked Mar 25 at 20:53

BoniPhila

157 bronze badges

Can you please paste the data as text values instead of an image?

– Nathaniel
Mar 25 at 20:55

It looks like some of these problems may be caused by the way you are merging the data frames. You might back-track to there to solve the issues.

– Nathaniel
Mar 25 at 20:57

Thank you @Nathaniel I was able to shorten the code and get the dataframe without .1,.2 at the end. Unfortunately when using df = result.columns.duplicated(['Col.id','Col.id2'],keep='first) I get the following error: TypeError: duplicated() got multiple values for keyword argument 'keep'. Works fine with the str.startswith(). But that would delete the first instance of the 'Col.id' which I dont want to do. I tried deleting the 'keep=' but get this error ValueError: keep must be either "first", "last" or False.

– BoniPhila
Mar 26 at 16:37

Glad to hear you are making progress. If you'd like to get some help here, please update your question to include a Minimal, Complete and Verifiable Example which includes example data in text form, the code required to demonstrate the problem, and a description of the desired output. Without this it is difficult to provide useful feedback. It may be helpful to post multiple specific questions and keep the focus on a single issue for each.

– Nathaniel
Mar 26 at 20:23

add a comment |

Can you please paste the data as text values instead of an image?

– Nathaniel
Mar 25 at 20:55

It looks like some of these problems may be caused by the way you are merging the data frames. You might back-track to there to solve the issues.

– Nathaniel
Mar 25 at 20:57

Thank you @Nathaniel I was able to shorten the code and get the dataframe without .1,.2 at the end. Unfortunately when using df = result.columns.duplicated(['Col.id','Col.id2'],keep='first) I get the following error: TypeError: duplicated() got multiple values for keyword argument 'keep'. Works fine with the str.startswith(). But that would delete the first instance of the 'Col.id' which I dont want to do. I tried deleting the 'keep=' but get this error ValueError: keep must be either "first", "last" or False.

– BoniPhila
Mar 26 at 16:37

Glad to hear you are making progress. If you'd like to get some help here, please update your question to include a Minimal, Complete and Verifiable Example which includes example data in text form, the code required to demonstrate the problem, and a description of the desired output. Without this it is difficult to provide useful feedback. It may be helpful to post multiple specific questions and keep the focus on a single issue for each.

– Nathaniel
Mar 26 at 20:23

Can you please paste the data as text values instead of an image?

– Nathaniel
Mar 25 at 20:55

It looks like some of these problems may be caused by the way you are merging the data frames. You might back-track to there to solve the issues.

– Nathaniel
Mar 25 at 20:57

Thank you @Nathaniel I was able to shorten the code and get the dataframe without .1,.2 at the end. Unfortunately when using df = result.columns.duplicated(['Col.id','Col.id2'],keep='first) I get the following error: TypeError: duplicated() got multiple values for keyword argument 'keep'. Works fine with the str.startswith(). But that would delete the first instance of the 'Col.id' which I dont want to do. I tried deleting the 'keep=' but get this error ValueError: keep must be either "first", "last" or False.

– BoniPhila
Mar 26 at 16:37

Glad to hear you are making progress. If you'd like to get some help here, please update your question to include a Minimal, Complete and Verifiable Example which includes example data in text form, the code required to demonstrate the problem, and a description of the desired output. Without this it is difficult to provide useful feedback. It may be helpful to post multiple specific questions and keep the focus on a single issue for each.

– Nathaniel
Mar 26 at 20:23

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55346245%2fmerging-and-cleaning-up-csv-files-in-python%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴