Merge DF's on str ID and take averagesHow to merge two dictionaries in a single expression?How to merge a specific commit in GitJPA EntityManager: Why use persist() over merge()?How to join (merge) data frames (inner, outer, left, right)How do you merge two Git repositories?How to merge two arrays in JavaScript and de-duplicate itemsMerge / convert multiple PDF files into one PDFWhat is the difference between `git merge` and `git merge --no-ff`?How to merge multiple dataframes with different column namesMerge Dataframes in Pandas (without column names)

Is it possible to invoke "super" with less ambiguous results?

🍩🔔🔥Scrambled emoji tale⚛️🎶🛒 #2️⃣

What makes MOVEQ quicker than a normal MOVE in 68000 assembly?

Which modern firearm should a time traveler bring to be easily reproducible for a historic civilization?

Is it possible to have a career in SciComp without contributing to arms research?

Why is the Intel 8086 CPU called a 16-bit CPU?

How do I reproduce this layout and typography?

How to interpret a promising preprint that was never published in peer-review?

Why aren't there any women super GMs?

How was Luke's prosthetic hand in Episode V filmed?

Which GPUs to get for Mathematical Optimization (if any)?

What is a Romeo Word™?

How can I help our ranger feel special about her beast companion?

How much solution to fill Paterson Universal Tank when developing film?

Why is an object not defined as identity morphism?

Has sea level rise slowed down?

Why didn't Doctor Strange restore Tony Stark after he used the Stones?

How slow ( not zero) can a car engine run without hurting engine and saving on fuel

What is actually sent/loaded to a microcontroller / stm32

Three Subway Escalators

I have found a mistake on someone's code published online: what is the protocol?

Operation Unzalgo

What is this green alien supposed to be on the American covers of the "Hitchhiker's Guide to the Galaxy"?

Is encryption still applied if you ignore the SSL certificate warning for self-signed certs?

Merge DF's on str ID and take averages

How to merge two dictionaries in a single expression?How to merge a specific commit in GitJPA EntityManager: Why use persist() over merge()?How to join (merge) data frames (inner, outer, left, right)How do you merge two Git repositories?How to merge two arrays in JavaScript and de-duplicate itemsMerge / convert multiple PDF files into one PDFWhat is the difference between `git merge` and `git merge --no-ff`?How to merge multiple dataframes with different column namesMerge Dataframes in Pandas (without column names)

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I have a bunch of csvs in a folder in this format:

chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [53] [[45930]] [45930] 53 
1 [53, 50] [[45930], [46480]] [46480] 53-50 550.0 
2 [53, 50, 63] [[45930], [46480], [47980]] [47980] 53-50-63 1025.0 671.7514421272201
3 [53, 50, 63, 60] [[45930], [46480], [47980], [49360]] [49360] 53-50-63-60 1143.3333333333333 517.3329037798903
4 [53, 50, 63, 60, 73] [[45930], [46480], [47980], [49360], [50040]] [50040] 53-50-63-60-73 1027.5 481.75893003313035
5 [53, 50, 63, 60, 73, 70] [[45930], [46480], [47980], [49360], [50040], [50310]] [50310] 53-50-63-60-73-70 876.0 537.4290650867331
6 [50] [[46480]] [46480] 50 
7 [50, 63] [[46480], [47980]] [47980] 50-63 1500.0 
8 [50, 63, 60] [[46480], [47980], [49360]] [49360] 50-63-60 1440.0 84.8528137423857
9 [50, 63, 60, 73] [[46480], [47980], [49360], [50040]] [50040] 50-63-60-73 1186.6666666666667 442.86943147313

I am read them in as DF's and make a list of the DF's:

csvs = []
list_of_files = [i for i in glob.glob('*.'.format('csv'))]
for file in list_of_files:
 f = pd.read_csv(file)
 csvs.append(f)

What I am trying to do is reduce them down to one dataframe with no repetitions of "chunk_id". Instead, I would like to merge on this ID.

I tried this:

from functools import reduce
red = reduce(pd.merge, csvs)

which just gives me a really wide dataframe with no entries.

I haven't attempted the averaging yet, but I would like to end up with a dataframe that has exactly the same columns as the example above, but where each row in all the dataframes with the same "chunk_id" is merged but their "diffs_avg", "timecodes", "chunk_completed" and "sd" columns averaged.

So, if I had read in the following dfs:

DF1

chunk timecodes chunk_completed chunk_id diffs_avg sd
[60 62] [100, 200] 500 60-62 2 1
[58 53] [800, 900] 1000 58-53 4 6

DF2

chunk timecodes chunk_completed chunk_id diffs_avg sd
[60 62] [200, 400] 1000 60-62 4 2
[30 33] [200, 700] 800 30-33 6 7

RESULT:

*[60 62] [150, 300] 750 60-62 3 1.5*
[58 53] [800, 900] 1000 58-53 4 6
[30 33] [200, 700] 800 30-33 6 7

Reproducible DF:

'chunk': 0: '[53]',
 1: '[53, 50]',
 2: '[53, 50, 63]',
 3: '[53, 50, 63, 60]',
 4: '[53, 50, 63, 60, 73]',
 5: '[53, 50, 63, 60, 73, 70]',
 6: '[50]',
 7: '[50, 63]',
 8: '[50, 63, 60]',
 9: '[50, 63, 60, 73]',
 'chunk_completed': 0: '[45930]',
 1: '[46480]',
 2: '[47980]',
 3: '[49360]',
 4: '[50040]',
 5: '[50310]',
 6: '[46480]',
 7: '[47980]',
 8: '[49360]',
 9: '[50040]',
 'chunk_id': 0: '53',
 1: '53-50',
 2: '53-50-63',
 3: '53-50-63-60',
 4: '53-50-63-60-73',
 5: '53-50-63-60-73-70',
 6: '50',
 7: '50-63',
 8: '50-63-60',
 9: '50-63-60-73',
 'diffs_avg': 0: np.nan,
 1: 550.0,
 2: 1025.0,
 3: 1143.3333333333333,
 4: 1027.5,
 5: 876.0,
 6: np.nan,
 7: 1500.0,
 8: 1440.0,
 9: 1186.6666666666667,
 'sd': 0: np.nan,
 1: np.nan,
 2: 671.7514421272201,
 3: 517.3329037798903,
 4: 481.75893003313035,
 5: 537.4290650867331,
 6: np.nan,
 7: np.nan,
 8: 84.8528137423857,
 9: 442.86943147313,
 'timecodes': 0: '[[45930]]',
 1: '[[45930], [46480]]',
 2: '[[45930], [46480], [47980]]',
 3: '[[45930], [46480], [47980], [49360]]',
 4: '[[45930], [46480], [47980], [49360], [50040]]',
 5: '[[45930], [46480], [47980], [49360], [50040], [50310]]',
 6: '[[46480]]',
 7: '[[46480], [47980]]',
 8: '[[46480], [47980], [49360]]',
 9: '[[46480], [47980], [49360], [50040]]'

edited Mar 26 at 11:49

Erfan

9,0502 gold badges7 silver badges25 bronze badges

asked Mar 26 at 10:22

syntheso

929 bronze badges

Which type is column timecodes? Are these lists or strings?

– Erfan
Mar 26 at 10:31

Please add the dataframe code, so that people working on this directly copy-paste it,

– min2bro
Mar 26 at 10:33

They are strings, but they should be a list of integers. That's a mistake which I'll have to sort.

– syntheso
Mar 26 at 10:45

Added reproducible DF.

– syntheso
Mar 26 at 10:51

This is not the appended df right? Because im not seeing any duplicate chunk_id so its not the same as your two small example dataframe. This changed your question to be honest.

– Erfan
Mar 26 at 10:59

|
show 2 more comments

I have a bunch of csvs in a folder in this format:

chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [53] [[45930]] [45930] 53 
1 [53, 50] [[45930], [46480]] [46480] 53-50 550.0 
2 [53, 50, 63] [[45930], [46480], [47980]] [47980] 53-50-63 1025.0 671.7514421272201
3 [53, 50, 63, 60] [[45930], [46480], [47980], [49360]] [49360] 53-50-63-60 1143.3333333333333 517.3329037798903
4 [53, 50, 63, 60, 73] [[45930], [46480], [47980], [49360], [50040]] [50040] 53-50-63-60-73 1027.5 481.75893003313035
5 [53, 50, 63, 60, 73, 70] [[45930], [46480], [47980], [49360], [50040], [50310]] [50310] 53-50-63-60-73-70 876.0 537.4290650867331
6 [50] [[46480]] [46480] 50 
7 [50, 63] [[46480], [47980]] [47980] 50-63 1500.0 
8 [50, 63, 60] [[46480], [47980], [49360]] [49360] 50-63-60 1440.0 84.8528137423857
9 [50, 63, 60, 73] [[46480], [47980], [49360], [50040]] [50040] 50-63-60-73 1186.6666666666667 442.86943147313

I am read them in as DF's and make a list of the DF's:

csvs = []
list_of_files = [i for i in glob.glob('*.'.format('csv'))]
for file in list_of_files:
 f = pd.read_csv(file)
 csvs.append(f)

What I am trying to do is reduce them down to one dataframe with no repetitions of "chunk_id". Instead, I would like to merge on this ID.

I tried this:

from functools import reduce
red = reduce(pd.merge, csvs)

which just gives me a really wide dataframe with no entries.

So, if I had read in the following dfs:

DF1

chunk timecodes chunk_completed chunk_id diffs_avg sd
[60 62] [100, 200] 500 60-62 2 1
[58 53] [800, 900] 1000 58-53 4 6

DF2

chunk timecodes chunk_completed chunk_id diffs_avg sd
[60 62] [200, 400] 1000 60-62 4 2
[30 33] [200, 700] 800 30-33 6 7

RESULT:

*[60 62] [150, 300] 750 60-62 3 1.5*
[58 53] [800, 900] 1000 58-53 4 6
[30 33] [200, 700] 800 30-33 6 7

Reproducible DF:

'chunk': 0: '[53]',
 1: '[53, 50]',
 2: '[53, 50, 63]',
 3: '[53, 50, 63, 60]',
 4: '[53, 50, 63, 60, 73]',
 5: '[53, 50, 63, 60, 73, 70]',
 6: '[50]',
 7: '[50, 63]',
 8: '[50, 63, 60]',
 9: '[50, 63, 60, 73]',
 'chunk_completed': 0: '[45930]',
 1: '[46480]',
 2: '[47980]',
 3: '[49360]',
 4: '[50040]',
 5: '[50310]',
 6: '[46480]',
 7: '[47980]',
 8: '[49360]',
 9: '[50040]',
 'chunk_id': 0: '53',
 1: '53-50',
 2: '53-50-63',
 3: '53-50-63-60',
 4: '53-50-63-60-73',
 5: '53-50-63-60-73-70',
 6: '50',
 7: '50-63',
 8: '50-63-60',
 9: '50-63-60-73',
 'diffs_avg': 0: np.nan,
 1: 550.0,
 2: 1025.0,
 3: 1143.3333333333333,
 4: 1027.5,
 5: 876.0,
 6: np.nan,
 7: 1500.0,
 8: 1440.0,
 9: 1186.6666666666667,
 'sd': 0: np.nan,
 1: np.nan,
 2: 671.7514421272201,
 3: 517.3329037798903,
 4: 481.75893003313035,
 5: 537.4290650867331,
 6: np.nan,
 7: np.nan,
 8: 84.8528137423857,
 9: 442.86943147313,
 'timecodes': 0: '[[45930]]',
 1: '[[45930], [46480]]',
 2: '[[45930], [46480], [47980]]',
 3: '[[45930], [46480], [47980], [49360]]',
 4: '[[45930], [46480], [47980], [49360], [50040]]',
 5: '[[45930], [46480], [47980], [49360], [50040], [50310]]',
 6: '[[46480]]',
 7: '[[46480], [47980]]',
 8: '[[46480], [47980], [49360]]',
 9: '[[46480], [47980], [49360], [50040]]'

edited Mar 26 at 11:49

Erfan

9,0502 gold badges7 silver badges25 bronze badges

asked Mar 26 at 10:22

syntheso

929 bronze badges

Which type is column timecodes? Are these lists or strings?

– Erfan
Mar 26 at 10:31

Please add the dataframe code, so that people working on this directly copy-paste it,

– min2bro
Mar 26 at 10:33

They are strings, but they should be a list of integers. That's a mistake which I'll have to sort.

– syntheso
Mar 26 at 10:45

Added reproducible DF.

– syntheso
Mar 26 at 10:51

This is not the appended df right? Because im not seeing any duplicate chunk_id so its not the same as your two small example dataframe. This changed your question to be honest.

– Erfan
Mar 26 at 10:59

|
show 2 more comments

I have a bunch of csvs in a folder in this format:

chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [53] [[45930]] [45930] 53 
1 [53, 50] [[45930], [46480]] [46480] 53-50 550.0 
2 [53, 50, 63] [[45930], [46480], [47980]] [47980] 53-50-63 1025.0 671.7514421272201
3 [53, 50, 63, 60] [[45930], [46480], [47980], [49360]] [49360] 53-50-63-60 1143.3333333333333 517.3329037798903
4 [53, 50, 63, 60, 73] [[45930], [46480], [47980], [49360], [50040]] [50040] 53-50-63-60-73 1027.5 481.75893003313035
5 [53, 50, 63, 60, 73, 70] [[45930], [46480], [47980], [49360], [50040], [50310]] [50310] 53-50-63-60-73-70 876.0 537.4290650867331
6 [50] [[46480]] [46480] 50 
7 [50, 63] [[46480], [47980]] [47980] 50-63 1500.0 
8 [50, 63, 60] [[46480], [47980], [49360]] [49360] 50-63-60 1440.0 84.8528137423857
9 [50, 63, 60, 73] [[46480], [47980], [49360], [50040]] [50040] 50-63-60-73 1186.6666666666667 442.86943147313

I am read them in as DF's and make a list of the DF's:

csvs = []
list_of_files = [i for i in glob.glob('*.'.format('csv'))]
for file in list_of_files:
 f = pd.read_csv(file)
 csvs.append(f)

What I am trying to do is reduce them down to one dataframe with no repetitions of "chunk_id". Instead, I would like to merge on this ID.

I tried this:

from functools import reduce
red = reduce(pd.merge, csvs)

which just gives me a really wide dataframe with no entries.

So, if I had read in the following dfs:

DF1

chunk timecodes chunk_completed chunk_id diffs_avg sd
[60 62] [100, 200] 500 60-62 2 1
[58 53] [800, 900] 1000 58-53 4 6

DF2

chunk timecodes chunk_completed chunk_id diffs_avg sd
[60 62] [200, 400] 1000 60-62 4 2
[30 33] [200, 700] 800 30-33 6 7

RESULT:

*[60 62] [150, 300] 750 60-62 3 1.5*
[58 53] [800, 900] 1000 58-53 4 6
[30 33] [200, 700] 800 30-33 6 7

Reproducible DF:

'chunk': 0: '[53]',
 1: '[53, 50]',
 2: '[53, 50, 63]',
 3: '[53, 50, 63, 60]',
 4: '[53, 50, 63, 60, 73]',
 5: '[53, 50, 63, 60, 73, 70]',
 6: '[50]',
 7: '[50, 63]',
 8: '[50, 63, 60]',
 9: '[50, 63, 60, 73]',
 'chunk_completed': 0: '[45930]',
 1: '[46480]',
 2: '[47980]',
 3: '[49360]',
 4: '[50040]',
 5: '[50310]',
 6: '[46480]',
 7: '[47980]',
 8: '[49360]',
 9: '[50040]',
 'chunk_id': 0: '53',
 1: '53-50',
 2: '53-50-63',
 3: '53-50-63-60',
 4: '53-50-63-60-73',
 5: '53-50-63-60-73-70',
 6: '50',
 7: '50-63',
 8: '50-63-60',
 9: '50-63-60-73',
 'diffs_avg': 0: np.nan,
 1: 550.0,
 2: 1025.0,
 3: 1143.3333333333333,
 4: 1027.5,
 5: 876.0,
 6: np.nan,
 7: 1500.0,
 8: 1440.0,
 9: 1186.6666666666667,
 'sd': 0: np.nan,
 1: np.nan,
 2: 671.7514421272201,
 3: 517.3329037798903,
 4: 481.75893003313035,
 5: 537.4290650867331,
 6: np.nan,
 7: np.nan,
 8: 84.8528137423857,
 9: 442.86943147313,
 'timecodes': 0: '[[45930]]',
 1: '[[45930], [46480]]',
 2: '[[45930], [46480], [47980]]',
 3: '[[45930], [46480], [47980], [49360]]',
 4: '[[45930], [46480], [47980], [49360], [50040]]',
 5: '[[45930], [46480], [47980], [49360], [50040], [50310]]',
 6: '[[46480]]',
 7: '[[46480], [47980]]',
 8: '[[46480], [47980], [49360]]',
 9: '[[46480], [47980], [49360], [50040]]'

edited Mar 26 at 11:49

Erfan

9,0502 gold badges7 silver badges25 bronze badges

asked Mar 26 at 10:22

syntheso

929 bronze badges

I have a bunch of csvs in a folder in this format:

chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [53] [[45930]] [45930] 53 
1 [53, 50] [[45930], [46480]] [46480] 53-50 550.0 
2 [53, 50, 63] [[45930], [46480], [47980]] [47980] 53-50-63 1025.0 671.7514421272201
3 [53, 50, 63, 60] [[45930], [46480], [47980], [49360]] [49360] 53-50-63-60 1143.3333333333333 517.3329037798903
4 [53, 50, 63, 60, 73] [[45930], [46480], [47980], [49360], [50040]] [50040] 53-50-63-60-73 1027.5 481.75893003313035
5 [53, 50, 63, 60, 73, 70] [[45930], [46480], [47980], [49360], [50040], [50310]] [50310] 53-50-63-60-73-70 876.0 537.4290650867331
6 [50] [[46480]] [46480] 50 
7 [50, 63] [[46480], [47980]] [47980] 50-63 1500.0 
8 [50, 63, 60] [[46480], [47980], [49360]] [49360] 50-63-60 1440.0 84.8528137423857
9 [50, 63, 60, 73] [[46480], [47980], [49360], [50040]] [50040] 50-63-60-73 1186.6666666666667 442.86943147313

I am read them in as DF's and make a list of the DF's:

csvs = []
list_of_files = [i for i in glob.glob('*.'.format('csv'))]
for file in list_of_files:
 f = pd.read_csv(file)
 csvs.append(f)

What I am trying to do is reduce them down to one dataframe with no repetitions of "chunk_id". Instead, I would like to merge on this ID.

I tried this:

from functools import reduce
red = reduce(pd.merge, csvs)

which just gives me a really wide dataframe with no entries.

So, if I had read in the following dfs:

DF1

chunk timecodes chunk_completed chunk_id diffs_avg sd
[60 62] [100, 200] 500 60-62 2 1
[58 53] [800, 900] 1000 58-53 4 6

DF2

chunk timecodes chunk_completed chunk_id diffs_avg sd
[60 62] [200, 400] 1000 60-62 4 2
[30 33] [200, 700] 800 30-33 6 7

RESULT:

*[60 62] [150, 300] 750 60-62 3 1.5*
[58 53] [800, 900] 1000 58-53 4 6
[30 33] [200, 700] 800 30-33 6 7

Reproducible DF:

'chunk': 0: '[53]',
 1: '[53, 50]',
 2: '[53, 50, 63]',
 3: '[53, 50, 63, 60]',
 4: '[53, 50, 63, 60, 73]',
 5: '[53, 50, 63, 60, 73, 70]',
 6: '[50]',
 7: '[50, 63]',
 8: '[50, 63, 60]',
 9: '[50, 63, 60, 73]',
 'chunk_completed': 0: '[45930]',
 1: '[46480]',
 2: '[47980]',
 3: '[49360]',
 4: '[50040]',
 5: '[50310]',
 6: '[46480]',
 7: '[47980]',
 8: '[49360]',
 9: '[50040]',
 'chunk_id': 0: '53',
 1: '53-50',
 2: '53-50-63',
 3: '53-50-63-60',
 4: '53-50-63-60-73',
 5: '53-50-63-60-73-70',
 6: '50',
 7: '50-63',
 8: '50-63-60',
 9: '50-63-60-73',
 'diffs_avg': 0: np.nan,
 1: 550.0,
 2: 1025.0,
 3: 1143.3333333333333,
 4: 1027.5,
 5: 876.0,
 6: np.nan,
 7: 1500.0,
 8: 1440.0,
 9: 1186.6666666666667,
 'sd': 0: np.nan,
 1: np.nan,
 2: 671.7514421272201,
 3: 517.3329037798903,
 4: 481.75893003313035,
 5: 537.4290650867331,
 6: np.nan,
 7: np.nan,
 8: 84.8528137423857,
 9: 442.86943147313,
 'timecodes': 0: '[[45930]]',
 1: '[[45930], [46480]]',
 2: '[[45930], [46480], [47980]]',
 3: '[[45930], [46480], [47980], [49360]]',
 4: '[[45930], [46480], [47980], [49360], [50040]]',
 5: '[[45930], [46480], [47980], [49360], [50040], [50310]]',
 6: '[[46480]]',
 7: '[[46480], [47980]]',
 8: '[[46480], [47980], [49360]]',
 9: '[[46480], [47980], [49360], [50040]]'

python pandas dataframe merge

edited Mar 26 at 11:49

Erfan

9,0502 gold badges7 silver badges25 bronze badges

asked Mar 26 at 10:22

syntheso

929 bronze badges

edited Mar 26 at 11:49

Erfan

9,0502 gold badges7 silver badges25 bronze badges

asked Mar 26 at 10:22

syntheso

929 bronze badges

edited Mar 26 at 11:49

Erfan

9,0502 gold badges7 silver badges25 bronze badges

edited Mar 26 at 11:49

Erfan

9,0502 gold badges7 silver badges25 bronze badges

edited Mar 26 at 11:49

Erfan

9,0502 gold badges7 silver badges25 bronze badges

asked Mar 26 at 10:22

syntheso

929 bronze badges

asked Mar 26 at 10:22

syntheso

929 bronze badges

asked Mar 26 at 10:22

syntheso

929 bronze badges

Which type is column timecodes? Are these lists or strings?

– Erfan
Mar 26 at 10:31

Please add the dataframe code, so that people working on this directly copy-paste it,

– min2bro
Mar 26 at 10:33

They are strings, but they should be a list of integers. That's a mistake which I'll have to sort.

– syntheso
Mar 26 at 10:45

Added reproducible DF.

– syntheso
Mar 26 at 10:51

This is not the appended df right? Because im not seeing any duplicate chunk_id so its not the same as your two small example dataframe. This changed your question to be honest.

– Erfan
Mar 26 at 10:59

|
show 2 more comments

Which type is column timecodes? Are these lists or strings?

– Erfan
Mar 26 at 10:31

Please add the dataframe code, so that people working on this directly copy-paste it,

– min2bro
Mar 26 at 10:33

They are strings, but they should be a list of integers. That's a mistake which I'll have to sort.

– syntheso
Mar 26 at 10:45

Added reproducible DF.

– syntheso
Mar 26 at 10:51

This is not the appended df right? Because im not seeing any duplicate chunk_id so its not the same as your two small example dataframe. This changed your question to be honest.

– Erfan
Mar 26 at 10:59

Which type is column timecodes? Are these lists or strings?

– Erfan
Mar 26 at 10:31

Please add the dataframe code, so that people working on this directly copy-paste it,

– min2bro
Mar 26 at 10:33

They are strings, but they should be a list of integers. That's a mistake which I'll have to sort.

– syntheso
Mar 26 at 10:45

Added reproducible DF.

– syntheso
Mar 26 at 10:51

This is not the appended df right? Because im not seeing any duplicate chunk_id so its not the same as your two small example dataframe. This changed your question to be honest.

– Erfan
Mar 26 at 10:59

|
show 2 more comments

1 Answer
1

active

oldest

votes

Without knowing your timecodes column and which type it has, you can average the rest over chunk_id by using pandas.DataFrame.groupby and .agg

# First of all you should concat your csv's into one big dataframe:
df3 = pd.concat(csvs, axis=0, ignore_index=True)

# First we concat df1 & df2 which is the appending of the CSV's
# Note this is a simulation of your csv's
df3 = pd.concat([df1,df2], ignore_index=True)

print(df3)
 chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [60 62] [100, 200] 500 60-62 2 1
1 [58 53] [800, 900] 1000 58-53 4 6
2 [60 62] [200, 400] 1000 60-62 4 2
3 [30 33] [200, 700] 800 30-33 6 7

Now we can aggregate with groupby

df_grouped = df3.groupby('chunk_id').agg('chunk_completed':'mean',
 'diffs_avg':'mean',
 'sd':'mean').reset_index()

print(df_grouped)
 chunk_id chunk_completed diffs_avg sd
0 30-33 800 6 7.0
1 58-53 1000 4 6.0
2 60-62 750 3 1.5

edited Mar 26 at 10:52

answered Mar 26 at 10:43

Erfan

9,0502 gold badges7 silver badges25 bronze badges

This isn't a complete solution since a main issue is retrieving multiple files which from a directory which become many dataframes. How could this work with concat?

– syntheso
Mar 26 at 10:48

Its to simulate your appending of CSV's which I understood was already done by you and not your problem. I cannot "append" your csv's, I dont have them. Its just to show I append the two dataframes. @SebSilas

– Erfan
Mar 26 at 10:50

See my edit, this should clear things up @SebSilas

– Erfan
Mar 26 at 10:53

Thank you Erfan!

– syntheso
Mar 26 at 11:07

Additionally, how might we get an average for each item in the list in the timecodes column? Assuming it is a list of integers.

– syntheso
Mar 26 at 11:08

|
show 2 more comments

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55354742%2fmerge-dfs-on-str-id-and-take-averages%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Without knowing your timecodes column and which type it has, you can average the rest over chunk_id by using pandas.DataFrame.groupby and .agg

# First of all you should concat your csv's into one big dataframe:
df3 = pd.concat(csvs, axis=0, ignore_index=True)

# First we concat df1 & df2 which is the appending of the CSV's
# Note this is a simulation of your csv's
df3 = pd.concat([df1,df2], ignore_index=True)

print(df3)
 chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [60 62] [100, 200] 500 60-62 2 1
1 [58 53] [800, 900] 1000 58-53 4 6
2 [60 62] [200, 400] 1000 60-62 4 2
3 [30 33] [200, 700] 800 30-33 6 7

Now we can aggregate with groupby

df_grouped = df3.groupby('chunk_id').agg('chunk_completed':'mean',
 'diffs_avg':'mean',
 'sd':'mean').reset_index()

print(df_grouped)
 chunk_id chunk_completed diffs_avg sd
0 30-33 800 6 7.0
1 58-53 1000 4 6.0
2 60-62 750 3 1.5

edited Mar 26 at 10:52

answered Mar 26 at 10:43

Erfan

9,0502 gold badges7 silver badges25 bronze badges

This isn't a complete solution since a main issue is retrieving multiple files which from a directory which become many dataframes. How could this work with concat?

– syntheso
Mar 26 at 10:48

Its to simulate your appending of CSV's which I understood was already done by you and not your problem. I cannot "append" your csv's, I dont have them. Its just to show I append the two dataframes. @SebSilas

– Erfan
Mar 26 at 10:50

See my edit, this should clear things up @SebSilas

– Erfan
Mar 26 at 10:53

Thank you Erfan!

– syntheso
Mar 26 at 11:07

Additionally, how might we get an average for each item in the list in the timecodes column? Assuming it is a list of integers.

– syntheso
Mar 26 at 11:08

|
show 2 more comments

Without knowing your timecodes column and which type it has, you can average the rest over chunk_id by using pandas.DataFrame.groupby and .agg

# First of all you should concat your csv's into one big dataframe:
df3 = pd.concat(csvs, axis=0, ignore_index=True)

# First we concat df1 & df2 which is the appending of the CSV's
# Note this is a simulation of your csv's
df3 = pd.concat([df1,df2], ignore_index=True)

print(df3)
 chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [60 62] [100, 200] 500 60-62 2 1
1 [58 53] [800, 900] 1000 58-53 4 6
2 [60 62] [200, 400] 1000 60-62 4 2
3 [30 33] [200, 700] 800 30-33 6 7

Now we can aggregate with groupby

df_grouped = df3.groupby('chunk_id').agg('chunk_completed':'mean',
 'diffs_avg':'mean',
 'sd':'mean').reset_index()

print(df_grouped)
 chunk_id chunk_completed diffs_avg sd
0 30-33 800 6 7.0
1 58-53 1000 4 6.0
2 60-62 750 3 1.5

edited Mar 26 at 10:52

answered Mar 26 at 10:43

Erfan

9,0502 gold badges7 silver badges25 bronze badges

This isn't a complete solution since a main issue is retrieving multiple files which from a directory which become many dataframes. How could this work with concat?

– syntheso
Mar 26 at 10:48

Its to simulate your appending of CSV's which I understood was already done by you and not your problem. I cannot "append" your csv's, I dont have them. Its just to show I append the two dataframes. @SebSilas

– Erfan
Mar 26 at 10:50

See my edit, this should clear things up @SebSilas

– Erfan
Mar 26 at 10:53

Thank you Erfan!

– syntheso
Mar 26 at 11:07

Additionally, how might we get an average for each item in the list in the timecodes column? Assuming it is a list of integers.

– syntheso
Mar 26 at 11:08

|
show 2 more comments

Without knowing your timecodes column and which type it has, you can average the rest over chunk_id by using pandas.DataFrame.groupby and .agg

# First of all you should concat your csv's into one big dataframe:
df3 = pd.concat(csvs, axis=0, ignore_index=True)

# First we concat df1 & df2 which is the appending of the CSV's
# Note this is a simulation of your csv's
df3 = pd.concat([df1,df2], ignore_index=True)

print(df3)
 chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [60 62] [100, 200] 500 60-62 2 1
1 [58 53] [800, 900] 1000 58-53 4 6
2 [60 62] [200, 400] 1000 60-62 4 2
3 [30 33] [200, 700] 800 30-33 6 7

Now we can aggregate with groupby

df_grouped = df3.groupby('chunk_id').agg('chunk_completed':'mean',
 'diffs_avg':'mean',
 'sd':'mean').reset_index()

print(df_grouped)
 chunk_id chunk_completed diffs_avg sd
0 30-33 800 6 7.0
1 58-53 1000 4 6.0
2 60-62 750 3 1.5

edited Mar 26 at 10:52

answered Mar 26 at 10:43

Erfan

9,0502 gold badges7 silver badges25 bronze badges

Without knowing your timecodes column and which type it has, you can average the rest over chunk_id by using pandas.DataFrame.groupby and .agg

# First of all you should concat your csv's into one big dataframe:
df3 = pd.concat(csvs, axis=0, ignore_index=True)

# First we concat df1 & df2 which is the appending of the CSV's
# Note this is a simulation of your csv's
df3 = pd.concat([df1,df2], ignore_index=True)

print(df3)
 chunk timecodes chunk_completed chunk_id diffs_avg sd
0 [60 62] [100, 200] 500 60-62 2 1
1 [58 53] [800, 900] 1000 58-53 4 6
2 [60 62] [200, 400] 1000 60-62 4 2
3 [30 33] [200, 700] 800 30-33 6 7

Now we can aggregate with groupby

df_grouped = df3.groupby('chunk_id').agg('chunk_completed':'mean',
 'diffs_avg':'mean',
 'sd':'mean').reset_index()

print(df_grouped)
 chunk_id chunk_completed diffs_avg sd
0 30-33 800 6 7.0
1 58-53 1000 4 6.0
2 60-62 750 3 1.5

edited Mar 26 at 10:52

answered Mar 26 at 10:43

Erfan

9,0502 gold badges7 silver badges25 bronze badges

edited Mar 26 at 10:52

answered Mar 26 at 10:43

Erfan

9,0502 gold badges7 silver badges25 bronze badges

answered Mar 26 at 10:43

Erfan

9,0502 gold badges7 silver badges25 bronze badges

answered Mar 26 at 10:43

Erfan

9,0502 gold badges7 silver badges25 bronze badges

This isn't a complete solution since a main issue is retrieving multiple files which from a directory which become many dataframes. How could this work with concat?

– syntheso
Mar 26 at 10:48

Its to simulate your appending of CSV's which I understood was already done by you and not your problem. I cannot "append" your csv's, I dont have them. Its just to show I append the two dataframes. @SebSilas

– Erfan
Mar 26 at 10:50

See my edit, this should clear things up @SebSilas

– Erfan
Mar 26 at 10:53

Thank you Erfan!

– syntheso
Mar 26 at 11:07

Additionally, how might we get an average for each item in the list in the timecodes column? Assuming it is a list of integers.

– syntheso
Mar 26 at 11:08

|
show 2 more comments

This isn't a complete solution since a main issue is retrieving multiple files which from a directory which become many dataframes. How could this work with concat?

– syntheso
Mar 26 at 10:48

Its to simulate your appending of CSV's which I understood was already done by you and not your problem. I cannot "append" your csv's, I dont have them. Its just to show I append the two dataframes. @SebSilas

– Erfan
Mar 26 at 10:50

See my edit, this should clear things up @SebSilas

– Erfan
Mar 26 at 10:53

Thank you Erfan!

– syntheso
Mar 26 at 11:07

Additionally, how might we get an average for each item in the list in the timecodes column? Assuming it is a list of integers.

– syntheso
Mar 26 at 11:08

This isn't a complete solution since a main issue is retrieving multiple files which from a directory which become many dataframes. How could this work with concat?

– syntheso
Mar 26 at 10:48

Its to simulate your appending of CSV's which I understood was already done by you and not your problem. I cannot "append" your csv's, I dont have them. Its just to show I append the two dataframes. @SebSilas

– Erfan
Mar 26 at 10:50

See my edit, this should clear things up @SebSilas

– Erfan
Mar 26 at 10:53

Thank you Erfan!

– syntheso
Mar 26 at 11:07

Additionally, how might we get an average for each item in the list in the timecodes column? Assuming it is a list of integers.

– syntheso
Mar 26 at 11:08

|
show 2 more comments

Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer
1

1 Answer
1

1 Answer
1