Merge DF's on str ID and take averagesHow to merge two dictionaries in a single expression?How to merge a specific commit in GitJPA EntityManager: Why use persist() over merge()?How to join (merge) data frames (inner, outer, left, right)How do you merge two Git repositories?How to merge two arrays in JavaScript and de-duplicate itemsMerge / convert multiple PDF files into one PDFWhat is the difference between `git merge` and `git merge --no-ff`?How to merge multiple dataframes with different column namesMerge Dataframes in Pandas (without column names)

Is it possible to invoke "super" with less ambiguous results?

🍩🔔🔥Scrambled emoji tale⚛️🎶🛒 #2️⃣

What makes MOVEQ quicker than a normal MOVE in 68000 assembly?

Which modern firearm should a time traveler bring to be easily reproducible for a historic civilization?

Is it possible to have a career in SciComp without contributing to arms research?

Why is the Intel 8086 CPU called a 16-bit CPU?

How do I reproduce this layout and typography?

How to interpret a promising preprint that was never published in peer-review?

Why aren't there any women super GMs?

How was Luke's prosthetic hand in Episode V filmed?

Which GPUs to get for Mathematical Optimization (if any)?

What is a Romeo Word™?

How can I help our ranger feel special about her beast companion?

How much solution to fill Paterson Universal Tank when developing film?

Why is an object not defined as identity morphism?

Has sea level rise slowed down?

Why didn't Doctor Strange restore Tony Stark after he used the Stones?

How slow ( not zero) can a car engine run without hurting engine and saving on fuel

What is actually sent/loaded to a microcontroller / stm32

Three Subway Escalators

I have found a mistake on someone's code published online: what is the protocol?

Operation Unzalgo

What is this green alien supposed to be on the American covers of the "Hitchhiker's Guide to the Galaxy"?

Is encryption still applied if you ignore the SSL certificate warning for self-signed certs?



Merge DF's on str ID and take averages


How to merge two dictionaries in a single expression?How to merge a specific commit in GitJPA EntityManager: Why use persist() over merge()?How to join (merge) data frames (inner, outer, left, right)How do you merge two Git repositories?How to merge two arrays in JavaScript and de-duplicate itemsMerge / convert multiple PDF files into one PDFWhat is the difference between `git merge` and `git merge --no-ff`?How to merge multiple dataframes with different column namesMerge Dataframes in Pandas (without column names)






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















I have a bunch of csvs in a folder in this format:



chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [53] [[45930]] [45930] 53
1 [53, 50] [[45930], [46480]] [46480] 53-50 550.0
2 [53, 50, 63] [[45930], [46480], [47980]] [47980] 53-50-63 1025.0 671.7514421272201
3 [53, 50, 63, 60] [[45930], [46480], [47980], [49360]] [49360] 53-50-63-60 1143.3333333333333 517.3329037798903
4 [53, 50, 63, 60, 73] [[45930], [46480], [47980], [49360], [50040]] [50040] 53-50-63-60-73 1027.5 481.75893003313035
5 [53, 50, 63, 60, 73, 70] [[45930], [46480], [47980], [49360], [50040], [50310]] [50310] 53-50-63-60-73-70 876.0 537.4290650867331
6 [50] [[46480]] [46480] 50
7 [50, 63] [[46480], [47980]] [47980] 50-63 1500.0
8 [50, 63, 60] [[46480], [47980], [49360]] [49360] 50-63-60 1440.0 84.8528137423857
9 [50, 63, 60, 73] [[46480], [47980], [49360], [50040]] [50040] 50-63-60-73 1186.6666666666667 442.86943147313


I am read them in as DF's and make a list of the DF's:



csvs = []
list_of_files = [i for i in glob.glob('*.'.format('csv'))]
for file in list_of_files:
f = pd.read_csv(file)
csvs.append(f)


What I am trying to do is reduce them down to one dataframe with no repetitions of "chunk_id". Instead, I would like to merge on this ID.



I tried this:



from functools import reduce
red = reduce(pd.merge, csvs)


which just gives me a really wide dataframe with no entries.



I haven't attempted the averaging yet, but I would like to end up with a dataframe that has exactly the same columns as the example above, but where each row in all the dataframes with the same "chunk_id" is merged but their "diffs_avg", "timecodes", "chunk_completed" and "sd" columns averaged.



So, if I had read in the following dfs:



DF1



chunk timecodes chunk_completed chunk_id diffs_avg sd
[60 62] [100, 200] 500 60-62 2 1
[58 53] [800, 900] 1000 58-53 4 6


DF2



chunk timecodes chunk_completed chunk_id diffs_avg sd
[60 62] [200, 400] 1000 60-62 4 2
[30 33] [200, 700] 800 30-33 6 7


RESULT:



*[60 62] [150, 300] 750 60-62 3 1.5*
[58 53] [800, 900] 1000 58-53 4 6
[30 33] [200, 700] 800 30-33 6 7


Reproducible DF:



'chunk': 0: '[53]',
1: '[53, 50]',
2: '[53, 50, 63]',
3: '[53, 50, 63, 60]',
4: '[53, 50, 63, 60, 73]',
5: '[53, 50, 63, 60, 73, 70]',
6: '[50]',
7: '[50, 63]',
8: '[50, 63, 60]',
9: '[50, 63, 60, 73]',
'chunk_completed': 0: '[45930]',
1: '[46480]',
2: '[47980]',
3: '[49360]',
4: '[50040]',
5: '[50310]',
6: '[46480]',
7: '[47980]',
8: '[49360]',
9: '[50040]',
'chunk_id': 0: '53',
1: '53-50',
2: '53-50-63',
3: '53-50-63-60',
4: '53-50-63-60-73',
5: '53-50-63-60-73-70',
6: '50',
7: '50-63',
8: '50-63-60',
9: '50-63-60-73',
'diffs_avg': 0: np.nan,
1: 550.0,
2: 1025.0,
3: 1143.3333333333333,
4: 1027.5,
5: 876.0,
6: np.nan,
7: 1500.0,
8: 1440.0,
9: 1186.6666666666667,
'sd': 0: np.nan,
1: np.nan,
2: 671.7514421272201,
3: 517.3329037798903,
4: 481.75893003313035,
5: 537.4290650867331,
6: np.nan,
7: np.nan,
8: 84.8528137423857,
9: 442.86943147313,
'timecodes': 0: '[[45930]]',
1: '[[45930], [46480]]',
2: '[[45930], [46480], [47980]]',
3: '[[45930], [46480], [47980], [49360]]',
4: '[[45930], [46480], [47980], [49360], [50040]]',
5: '[[45930], [46480], [47980], [49360], [50040], [50310]]',
6: '[[46480]]',
7: '[[46480], [47980]]',
8: '[[46480], [47980], [49360]]',
9: '[[46480], [47980], [49360], [50040]]'









share|improve this question
























  • Which type is column timecodes? Are these lists or strings?

    – Erfan
    Mar 26 at 10:31












  • Please add the dataframe code, so that people working on this directly copy-paste it,

    – min2bro
    Mar 26 at 10:33











  • They are strings, but they should be a list of integers. That's a mistake which I'll have to sort.

    – syntheso
    Mar 26 at 10:45











  • Added reproducible DF.

    – syntheso
    Mar 26 at 10:51











  • This is not the appended df right? Because im not seeing any duplicate chunk_id so its not the same as your two small example dataframe. This changed your question to be honest.

    – Erfan
    Mar 26 at 10:59

















0















I have a bunch of csvs in a folder in this format:



chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [53] [[45930]] [45930] 53
1 [53, 50] [[45930], [46480]] [46480] 53-50 550.0
2 [53, 50, 63] [[45930], [46480], [47980]] [47980] 53-50-63 1025.0 671.7514421272201
3 [53, 50, 63, 60] [[45930], [46480], [47980], [49360]] [49360] 53-50-63-60 1143.3333333333333 517.3329037798903
4 [53, 50, 63, 60, 73] [[45930], [46480], [47980], [49360], [50040]] [50040] 53-50-63-60-73 1027.5 481.75893003313035
5 [53, 50, 63, 60, 73, 70] [[45930], [46480], [47980], [49360], [50040], [50310]] [50310] 53-50-63-60-73-70 876.0 537.4290650867331
6 [50] [[46480]] [46480] 50
7 [50, 63] [[46480], [47980]] [47980] 50-63 1500.0
8 [50, 63, 60] [[46480], [47980], [49360]] [49360] 50-63-60 1440.0 84.8528137423857
9 [50, 63, 60, 73] [[46480], [47980], [49360], [50040]] [50040] 50-63-60-73 1186.6666666666667 442.86943147313


I am read them in as DF's and make a list of the DF's:



csvs = []
list_of_files = [i for i in glob.glob('*.'.format('csv'))]
for file in list_of_files:
f = pd.read_csv(file)
csvs.append(f)


What I am trying to do is reduce them down to one dataframe with no repetitions of "chunk_id". Instead, I would like to merge on this ID.



I tried this:



from functools import reduce
red = reduce(pd.merge, csvs)


which just gives me a really wide dataframe with no entries.



I haven't attempted the averaging yet, but I would like to end up with a dataframe that has exactly the same columns as the example above, but where each row in all the dataframes with the same "chunk_id" is merged but their "diffs_avg", "timecodes", "chunk_completed" and "sd" columns averaged.



So, if I had read in the following dfs:



DF1



chunk timecodes chunk_completed chunk_id diffs_avg sd
[60 62] [100, 200] 500 60-62 2 1
[58 53] [800, 900] 1000 58-53 4 6


DF2



chunk timecodes chunk_completed chunk_id diffs_avg sd
[60 62] [200, 400] 1000 60-62 4 2
[30 33] [200, 700] 800 30-33 6 7


RESULT:



*[60 62] [150, 300] 750 60-62 3 1.5*
[58 53] [800, 900] 1000 58-53 4 6
[30 33] [200, 700] 800 30-33 6 7


Reproducible DF:



'chunk': 0: '[53]',
1: '[53, 50]',
2: '[53, 50, 63]',
3: '[53, 50, 63, 60]',
4: '[53, 50, 63, 60, 73]',
5: '[53, 50, 63, 60, 73, 70]',
6: '[50]',
7: '[50, 63]',
8: '[50, 63, 60]',
9: '[50, 63, 60, 73]',
'chunk_completed': 0: '[45930]',
1: '[46480]',
2: '[47980]',
3: '[49360]',
4: '[50040]',
5: '[50310]',
6: '[46480]',
7: '[47980]',
8: '[49360]',
9: '[50040]',
'chunk_id': 0: '53',
1: '53-50',
2: '53-50-63',
3: '53-50-63-60',
4: '53-50-63-60-73',
5: '53-50-63-60-73-70',
6: '50',
7: '50-63',
8: '50-63-60',
9: '50-63-60-73',
'diffs_avg': 0: np.nan,
1: 550.0,
2: 1025.0,
3: 1143.3333333333333,
4: 1027.5,
5: 876.0,
6: np.nan,
7: 1500.0,
8: 1440.0,
9: 1186.6666666666667,
'sd': 0: np.nan,
1: np.nan,
2: 671.7514421272201,
3: 517.3329037798903,
4: 481.75893003313035,
5: 537.4290650867331,
6: np.nan,
7: np.nan,
8: 84.8528137423857,
9: 442.86943147313,
'timecodes': 0: '[[45930]]',
1: '[[45930], [46480]]',
2: '[[45930], [46480], [47980]]',
3: '[[45930], [46480], [47980], [49360]]',
4: '[[45930], [46480], [47980], [49360], [50040]]',
5: '[[45930], [46480], [47980], [49360], [50040], [50310]]',
6: '[[46480]]',
7: '[[46480], [47980]]',
8: '[[46480], [47980], [49360]]',
9: '[[46480], [47980], [49360], [50040]]'









share|improve this question
























  • Which type is column timecodes? Are these lists or strings?

    – Erfan
    Mar 26 at 10:31












  • Please add the dataframe code, so that people working on this directly copy-paste it,

    – min2bro
    Mar 26 at 10:33











  • They are strings, but they should be a list of integers. That's a mistake which I'll have to sort.

    – syntheso
    Mar 26 at 10:45











  • Added reproducible DF.

    – syntheso
    Mar 26 at 10:51











  • This is not the appended df right? Because im not seeing any duplicate chunk_id so its not the same as your two small example dataframe. This changed your question to be honest.

    – Erfan
    Mar 26 at 10:59













0












0








0








I have a bunch of csvs in a folder in this format:



chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [53] [[45930]] [45930] 53
1 [53, 50] [[45930], [46480]] [46480] 53-50 550.0
2 [53, 50, 63] [[45930], [46480], [47980]] [47980] 53-50-63 1025.0 671.7514421272201
3 [53, 50, 63, 60] [[45930], [46480], [47980], [49360]] [49360] 53-50-63-60 1143.3333333333333 517.3329037798903
4 [53, 50, 63, 60, 73] [[45930], [46480], [47980], [49360], [50040]] [50040] 53-50-63-60-73 1027.5 481.75893003313035
5 [53, 50, 63, 60, 73, 70] [[45930], [46480], [47980], [49360], [50040], [50310]] [50310] 53-50-63-60-73-70 876.0 537.4290650867331
6 [50] [[46480]] [46480] 50
7 [50, 63] [[46480], [47980]] [47980] 50-63 1500.0
8 [50, 63, 60] [[46480], [47980], [49360]] [49360] 50-63-60 1440.0 84.8528137423857
9 [50, 63, 60, 73] [[46480], [47980], [49360], [50040]] [50040] 50-63-60-73 1186.6666666666667 442.86943147313


I am read them in as DF's and make a list of the DF's:



csvs = []
list_of_files = [i for i in glob.glob('*.'.format('csv'))]
for file in list_of_files:
f = pd.read_csv(file)
csvs.append(f)


What I am trying to do is reduce them down to one dataframe with no repetitions of "chunk_id". Instead, I would like to merge on this ID.



I tried this:



from functools import reduce
red = reduce(pd.merge, csvs)


which just gives me a really wide dataframe with no entries.



I haven't attempted the averaging yet, but I would like to end up with a dataframe that has exactly the same columns as the example above, but where each row in all the dataframes with the same "chunk_id" is merged but their "diffs_avg", "timecodes", "chunk_completed" and "sd" columns averaged.



So, if I had read in the following dfs:



DF1



chunk timecodes chunk_completed chunk_id diffs_avg sd
[60 62] [100, 200] 500 60-62 2 1
[58 53] [800, 900] 1000 58-53 4 6


DF2



chunk timecodes chunk_completed chunk_id diffs_avg sd
[60 62] [200, 400] 1000 60-62 4 2
[30 33] [200, 700] 800 30-33 6 7


RESULT:



*[60 62] [150, 300] 750 60-62 3 1.5*
[58 53] [800, 900] 1000 58-53 4 6
[30 33] [200, 700] 800 30-33 6 7


Reproducible DF:



'chunk': 0: '[53]',
1: '[53, 50]',
2: '[53, 50, 63]',
3: '[53, 50, 63, 60]',
4: '[53, 50, 63, 60, 73]',
5: '[53, 50, 63, 60, 73, 70]',
6: '[50]',
7: '[50, 63]',
8: '[50, 63, 60]',
9: '[50, 63, 60, 73]',
'chunk_completed': 0: '[45930]',
1: '[46480]',
2: '[47980]',
3: '[49360]',
4: '[50040]',
5: '[50310]',
6: '[46480]',
7: '[47980]',
8: '[49360]',
9: '[50040]',
'chunk_id': 0: '53',
1: '53-50',
2: '53-50-63',
3: '53-50-63-60',
4: '53-50-63-60-73',
5: '53-50-63-60-73-70',
6: '50',
7: '50-63',
8: '50-63-60',
9: '50-63-60-73',
'diffs_avg': 0: np.nan,
1: 550.0,
2: 1025.0,
3: 1143.3333333333333,
4: 1027.5,
5: 876.0,
6: np.nan,
7: 1500.0,
8: 1440.0,
9: 1186.6666666666667,
'sd': 0: np.nan,
1: np.nan,
2: 671.7514421272201,
3: 517.3329037798903,
4: 481.75893003313035,
5: 537.4290650867331,
6: np.nan,
7: np.nan,
8: 84.8528137423857,
9: 442.86943147313,
'timecodes': 0: '[[45930]]',
1: '[[45930], [46480]]',
2: '[[45930], [46480], [47980]]',
3: '[[45930], [46480], [47980], [49360]]',
4: '[[45930], [46480], [47980], [49360], [50040]]',
5: '[[45930], [46480], [47980], [49360], [50040], [50310]]',
6: '[[46480]]',
7: '[[46480], [47980]]',
8: '[[46480], [47980], [49360]]',
9: '[[46480], [47980], [49360], [50040]]'









share|improve this question
















I have a bunch of csvs in a folder in this format:



chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [53] [[45930]] [45930] 53
1 [53, 50] [[45930], [46480]] [46480] 53-50 550.0
2 [53, 50, 63] [[45930], [46480], [47980]] [47980] 53-50-63 1025.0 671.7514421272201
3 [53, 50, 63, 60] [[45930], [46480], [47980], [49360]] [49360] 53-50-63-60 1143.3333333333333 517.3329037798903
4 [53, 50, 63, 60, 73] [[45930], [46480], [47980], [49360], [50040]] [50040] 53-50-63-60-73 1027.5 481.75893003313035
5 [53, 50, 63, 60, 73, 70] [[45930], [46480], [47980], [49360], [50040], [50310]] [50310] 53-50-63-60-73-70 876.0 537.4290650867331
6 [50] [[46480]] [46480] 50
7 [50, 63] [[46480], [47980]] [47980] 50-63 1500.0
8 [50, 63, 60] [[46480], [47980], [49360]] [49360] 50-63-60 1440.0 84.8528137423857
9 [50, 63, 60, 73] [[46480], [47980], [49360], [50040]] [50040] 50-63-60-73 1186.6666666666667 442.86943147313


I am read them in as DF's and make a list of the DF's:



csvs = []
list_of_files = [i for i in glob.glob('*.'.format('csv'))]
for file in list_of_files:
f = pd.read_csv(file)
csvs.append(f)


What I am trying to do is reduce them down to one dataframe with no repetitions of "chunk_id". Instead, I would like to merge on this ID.



I tried this:



from functools import reduce
red = reduce(pd.merge, csvs)


which just gives me a really wide dataframe with no entries.



I haven't attempted the averaging yet, but I would like to end up with a dataframe that has exactly the same columns as the example above, but where each row in all the dataframes with the same "chunk_id" is merged but their "diffs_avg", "timecodes", "chunk_completed" and "sd" columns averaged.



So, if I had read in the following dfs:



DF1



chunk timecodes chunk_completed chunk_id diffs_avg sd
[60 62] [100, 200] 500 60-62 2 1
[58 53] [800, 900] 1000 58-53 4 6


DF2



chunk timecodes chunk_completed chunk_id diffs_avg sd
[60 62] [200, 400] 1000 60-62 4 2
[30 33] [200, 700] 800 30-33 6 7


RESULT:



*[60 62] [150, 300] 750 60-62 3 1.5*
[58 53] [800, 900] 1000 58-53 4 6
[30 33] [200, 700] 800 30-33 6 7


Reproducible DF:



'chunk': 0: '[53]',
1: '[53, 50]',
2: '[53, 50, 63]',
3: '[53, 50, 63, 60]',
4: '[53, 50, 63, 60, 73]',
5: '[53, 50, 63, 60, 73, 70]',
6: '[50]',
7: '[50, 63]',
8: '[50, 63, 60]',
9: '[50, 63, 60, 73]',
'chunk_completed': 0: '[45930]',
1: '[46480]',
2: '[47980]',
3: '[49360]',
4: '[50040]',
5: '[50310]',
6: '[46480]',
7: '[47980]',
8: '[49360]',
9: '[50040]',
'chunk_id': 0: '53',
1: '53-50',
2: '53-50-63',
3: '53-50-63-60',
4: '53-50-63-60-73',
5: '53-50-63-60-73-70',
6: '50',
7: '50-63',
8: '50-63-60',
9: '50-63-60-73',
'diffs_avg': 0: np.nan,
1: 550.0,
2: 1025.0,
3: 1143.3333333333333,
4: 1027.5,
5: 876.0,
6: np.nan,
7: 1500.0,
8: 1440.0,
9: 1186.6666666666667,
'sd': 0: np.nan,
1: np.nan,
2: 671.7514421272201,
3: 517.3329037798903,
4: 481.75893003313035,
5: 537.4290650867331,
6: np.nan,
7: np.nan,
8: 84.8528137423857,
9: 442.86943147313,
'timecodes': 0: '[[45930]]',
1: '[[45930], [46480]]',
2: '[[45930], [46480], [47980]]',
3: '[[45930], [46480], [47980], [49360]]',
4: '[[45930], [46480], [47980], [49360], [50040]]',
5: '[[45930], [46480], [47980], [49360], [50040], [50310]]',
6: '[[46480]]',
7: '[[46480], [47980]]',
8: '[[46480], [47980], [49360]]',
9: '[[46480], [47980], [49360], [50040]]'






python pandas dataframe merge






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 26 at 11:49









Erfan

9,0502 gold badges7 silver badges25 bronze badges




9,0502 gold badges7 silver badges25 bronze badges










asked Mar 26 at 10:22









synthesosyntheso

929 bronze badges




929 bronze badges












  • Which type is column timecodes? Are these lists or strings?

    – Erfan
    Mar 26 at 10:31












  • Please add the dataframe code, so that people working on this directly copy-paste it,

    – min2bro
    Mar 26 at 10:33











  • They are strings, but they should be a list of integers. That's a mistake which I'll have to sort.

    – syntheso
    Mar 26 at 10:45











  • Added reproducible DF.

    – syntheso
    Mar 26 at 10:51











  • This is not the appended df right? Because im not seeing any duplicate chunk_id so its not the same as your two small example dataframe. This changed your question to be honest.

    – Erfan
    Mar 26 at 10:59

















  • Which type is column timecodes? Are these lists or strings?

    – Erfan
    Mar 26 at 10:31












  • Please add the dataframe code, so that people working on this directly copy-paste it,

    – min2bro
    Mar 26 at 10:33











  • They are strings, but they should be a list of integers. That's a mistake which I'll have to sort.

    – syntheso
    Mar 26 at 10:45











  • Added reproducible DF.

    – syntheso
    Mar 26 at 10:51











  • This is not the appended df right? Because im not seeing any duplicate chunk_id so its not the same as your two small example dataframe. This changed your question to be honest.

    – Erfan
    Mar 26 at 10:59
















Which type is column timecodes? Are these lists or strings?

– Erfan
Mar 26 at 10:31






Which type is column timecodes? Are these lists or strings?

– Erfan
Mar 26 at 10:31














Please add the dataframe code, so that people working on this directly copy-paste it,

– min2bro
Mar 26 at 10:33





Please add the dataframe code, so that people working on this directly copy-paste it,

– min2bro
Mar 26 at 10:33













They are strings, but they should be a list of integers. That's a mistake which I'll have to sort.

– syntheso
Mar 26 at 10:45





They are strings, but they should be a list of integers. That's a mistake which I'll have to sort.

– syntheso
Mar 26 at 10:45













Added reproducible DF.

– syntheso
Mar 26 at 10:51





Added reproducible DF.

– syntheso
Mar 26 at 10:51













This is not the appended df right? Because im not seeing any duplicate chunk_id so its not the same as your two small example dataframe. This changed your question to be honest.

– Erfan
Mar 26 at 10:59





This is not the appended df right? Because im not seeing any duplicate chunk_id so its not the same as your two small example dataframe. This changed your question to be honest.

– Erfan
Mar 26 at 10:59












1 Answer
1






active

oldest

votes


















1














Without knowing your timecodes column and which type it has, you can average the rest over chunk_id by using pandas.DataFrame.groupby and .agg



# First of all you should concat your csv's into one big dataframe:
df3 = pd.concat(csvs, axis=0, ignore_index=True)


# First we concat df1 & df2 which is the appending of the CSV's
# Note this is a simulation of your csv's
df3 = pd.concat([df1,df2], ignore_index=True)

print(df3)
chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [60 62] [100, 200] 500 60-62 2 1
1 [58 53] [800, 900] 1000 58-53 4 6
2 [60 62] [200, 400] 1000 60-62 4 2
3 [30 33] [200, 700] 800 30-33 6 7


Now we can aggregate with groupby



df_grouped = df3.groupby('chunk_id').agg('chunk_completed':'mean',
'diffs_avg':'mean',
'sd':'mean').reset_index()

print(df_grouped)
chunk_id chunk_completed diffs_avg sd
0 30-33 800 6 7.0
1 58-53 1000 4 6.0
2 60-62 750 3 1.5





share|improve this answer

























  • This isn't a complete solution since a main issue is retrieving multiple files which from a directory which become many dataframes. How could this work with concat?

    – syntheso
    Mar 26 at 10:48











  • Its to simulate your appending of CSV's which I understood was already done by you and not your problem. I cannot "append" your csv's, I dont have them. Its just to show I append the two dataframes. @SebSilas

    – Erfan
    Mar 26 at 10:50











  • See my edit, this should clear things up @SebSilas

    – Erfan
    Mar 26 at 10:53











  • Thank you Erfan!

    – syntheso
    Mar 26 at 11:07











  • Additionally, how might we get an average for each item in the list in the timecodes column? Assuming it is a list of integers.

    – syntheso
    Mar 26 at 11:08











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55354742%2fmerge-dfs-on-str-id-and-take-averages%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Without knowing your timecodes column and which type it has, you can average the rest over chunk_id by using pandas.DataFrame.groupby and .agg



# First of all you should concat your csv's into one big dataframe:
df3 = pd.concat(csvs, axis=0, ignore_index=True)


# First we concat df1 & df2 which is the appending of the CSV's
# Note this is a simulation of your csv's
df3 = pd.concat([df1,df2], ignore_index=True)

print(df3)
chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [60 62] [100, 200] 500 60-62 2 1
1 [58 53] [800, 900] 1000 58-53 4 6
2 [60 62] [200, 400] 1000 60-62 4 2
3 [30 33] [200, 700] 800 30-33 6 7


Now we can aggregate with groupby



df_grouped = df3.groupby('chunk_id').agg('chunk_completed':'mean',
'diffs_avg':'mean',
'sd':'mean').reset_index()

print(df_grouped)
chunk_id chunk_completed diffs_avg sd
0 30-33 800 6 7.0
1 58-53 1000 4 6.0
2 60-62 750 3 1.5





share|improve this answer

























  • This isn't a complete solution since a main issue is retrieving multiple files which from a directory which become many dataframes. How could this work with concat?

    – syntheso
    Mar 26 at 10:48











  • Its to simulate your appending of CSV's which I understood was already done by you and not your problem. I cannot "append" your csv's, I dont have them. Its just to show I append the two dataframes. @SebSilas

    – Erfan
    Mar 26 at 10:50











  • See my edit, this should clear things up @SebSilas

    – Erfan
    Mar 26 at 10:53











  • Thank you Erfan!

    – syntheso
    Mar 26 at 11:07











  • Additionally, how might we get an average for each item in the list in the timecodes column? Assuming it is a list of integers.

    – syntheso
    Mar 26 at 11:08
















1














Without knowing your timecodes column and which type it has, you can average the rest over chunk_id by using pandas.DataFrame.groupby and .agg



# First of all you should concat your csv's into one big dataframe:
df3 = pd.concat(csvs, axis=0, ignore_index=True)


# First we concat df1 & df2 which is the appending of the CSV's
# Note this is a simulation of your csv's
df3 = pd.concat([df1,df2], ignore_index=True)

print(df3)
chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [60 62] [100, 200] 500 60-62 2 1
1 [58 53] [800, 900] 1000 58-53 4 6
2 [60 62] [200, 400] 1000 60-62 4 2
3 [30 33] [200, 700] 800 30-33 6 7


Now we can aggregate with groupby



df_grouped = df3.groupby('chunk_id').agg('chunk_completed':'mean',
'diffs_avg':'mean',
'sd':'mean').reset_index()

print(df_grouped)
chunk_id chunk_completed diffs_avg sd
0 30-33 800 6 7.0
1 58-53 1000 4 6.0
2 60-62 750 3 1.5





share|improve this answer

























  • This isn't a complete solution since a main issue is retrieving multiple files which from a directory which become many dataframes. How could this work with concat?

    – syntheso
    Mar 26 at 10:48











  • Its to simulate your appending of CSV's which I understood was already done by you and not your problem. I cannot "append" your csv's, I dont have them. Its just to show I append the two dataframes. @SebSilas

    – Erfan
    Mar 26 at 10:50











  • See my edit, this should clear things up @SebSilas

    – Erfan
    Mar 26 at 10:53











  • Thank you Erfan!

    – syntheso
    Mar 26 at 11:07











  • Additionally, how might we get an average for each item in the list in the timecodes column? Assuming it is a list of integers.

    – syntheso
    Mar 26 at 11:08














1












1








1







Without knowing your timecodes column and which type it has, you can average the rest over chunk_id by using pandas.DataFrame.groupby and .agg



# First of all you should concat your csv's into one big dataframe:
df3 = pd.concat(csvs, axis=0, ignore_index=True)


# First we concat df1 & df2 which is the appending of the CSV's
# Note this is a simulation of your csv's
df3 = pd.concat([df1,df2], ignore_index=True)

print(df3)
chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [60 62] [100, 200] 500 60-62 2 1
1 [58 53] [800, 900] 1000 58-53 4 6
2 [60 62] [200, 400] 1000 60-62 4 2
3 [30 33] [200, 700] 800 30-33 6 7


Now we can aggregate with groupby



df_grouped = df3.groupby('chunk_id').agg('chunk_completed':'mean',
'diffs_avg':'mean',
'sd':'mean').reset_index()

print(df_grouped)
chunk_id chunk_completed diffs_avg sd
0 30-33 800 6 7.0
1 58-53 1000 4 6.0
2 60-62 750 3 1.5





share|improve this answer















Without knowing your timecodes column and which type it has, you can average the rest over chunk_id by using pandas.DataFrame.groupby and .agg



# First of all you should concat your csv's into one big dataframe:
df3 = pd.concat(csvs, axis=0, ignore_index=True)


# First we concat df1 & df2 which is the appending of the CSV's
# Note this is a simulation of your csv's
df3 = pd.concat([df1,df2], ignore_index=True)

print(df3)
chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [60 62] [100, 200] 500 60-62 2 1
1 [58 53] [800, 900] 1000 58-53 4 6
2 [60 62] [200, 400] 1000 60-62 4 2
3 [30 33] [200, 700] 800 30-33 6 7


Now we can aggregate with groupby



df_grouped = df3.groupby('chunk_id').agg('chunk_completed':'mean',
'diffs_avg':'mean',
'sd':'mean').reset_index()

print(df_grouped)
chunk_id chunk_completed diffs_avg sd
0 30-33 800 6 7.0
1 58-53 1000 4 6.0
2 60-62 750 3 1.5






share|improve this answer














share|improve this answer



share|improve this answer








edited Mar 26 at 10:52

























answered Mar 26 at 10:43









ErfanErfan

9,0502 gold badges7 silver badges25 bronze badges




9,0502 gold badges7 silver badges25 bronze badges












  • This isn't a complete solution since a main issue is retrieving multiple files which from a directory which become many dataframes. How could this work with concat?

    – syntheso
    Mar 26 at 10:48











  • Its to simulate your appending of CSV's which I understood was already done by you and not your problem. I cannot "append" your csv's, I dont have them. Its just to show I append the two dataframes. @SebSilas

    – Erfan
    Mar 26 at 10:50











  • See my edit, this should clear things up @SebSilas

    – Erfan
    Mar 26 at 10:53











  • Thank you Erfan!

    – syntheso
    Mar 26 at 11:07











  • Additionally, how might we get an average for each item in the list in the timecodes column? Assuming it is a list of integers.

    – syntheso
    Mar 26 at 11:08


















  • This isn't a complete solution since a main issue is retrieving multiple files which from a directory which become many dataframes. How could this work with concat?

    – syntheso
    Mar 26 at 10:48











  • Its to simulate your appending of CSV's which I understood was already done by you and not your problem. I cannot "append" your csv's, I dont have them. Its just to show I append the two dataframes. @SebSilas

    – Erfan
    Mar 26 at 10:50











  • See my edit, this should clear things up @SebSilas

    – Erfan
    Mar 26 at 10:53











  • Thank you Erfan!

    – syntheso
    Mar 26 at 11:07











  • Additionally, how might we get an average for each item in the list in the timecodes column? Assuming it is a list of integers.

    – syntheso
    Mar 26 at 11:08

















This isn't a complete solution since a main issue is retrieving multiple files which from a directory which become many dataframes. How could this work with concat?

– syntheso
Mar 26 at 10:48





This isn't a complete solution since a main issue is retrieving multiple files which from a directory which become many dataframes. How could this work with concat?

– syntheso
Mar 26 at 10:48













Its to simulate your appending of CSV's which I understood was already done by you and not your problem. I cannot "append" your csv's, I dont have them. Its just to show I append the two dataframes. @SebSilas

– Erfan
Mar 26 at 10:50





Its to simulate your appending of CSV's which I understood was already done by you and not your problem. I cannot "append" your csv's, I dont have them. Its just to show I append the two dataframes. @SebSilas

– Erfan
Mar 26 at 10:50













See my edit, this should clear things up @SebSilas

– Erfan
Mar 26 at 10:53





See my edit, this should clear things up @SebSilas

– Erfan
Mar 26 at 10:53













Thank you Erfan!

– syntheso
Mar 26 at 11:07





Thank you Erfan!

– syntheso
Mar 26 at 11:07













Additionally, how might we get an average for each item in the list in the timecodes column? Assuming it is a list of integers.

– syntheso
Mar 26 at 11:08






Additionally, how might we get an average for each item in the list in the timecodes column? Assuming it is a list of integers.

– syntheso
Mar 26 at 11:08









Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.







Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55354742%2fmerge-dfs-on-str-id-and-take-averages%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript