How to join multiple rows in single pandas dataframe by common key column (fixed length limit)?Reshape DataFrame from long to wide along one columnAdd one row to pandas DataFrameSelecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameHow to drop rows of Pandas DataFrame whose value in a certain column is NaN“Large data” work flows using pandasHow do I get the row count of a pandas DataFrame?How to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers
Coupling two 15 Amp circuit breaker for 20 Amp
Convert shapefille to KML
Why did Lucius make a deal out of Buckbeak hurting Draco but not about Draco being turned into a ferret?
Heat output from a 200W electric radiator?
What is this "opened" cube called?
How to determine the convexity of my problem and categorize it?
Why is 3/4 a simple meter while 6/8 is a compound meter?
Why do motor drives have multiple bus capacitors of small value capacitance instead of a single bus capacitor of large value?
In Endgame, wouldn't Stark have remembered Hulk busting out of the stairwell?
RAID0 instead of RAID1 or 5, is this crazy?
Why didn't Doc believe Marty was from the future?
Is it recommended to point out a professor's mistake during their lecture?
Is this position a forced win for Black after move 14?
In what language did Túrin converse with Mím?
Can two aircraft stay on the same runway at the same time?
Are spot colors limited and why CMYK mix is not treated same as spot color mix?
How can I observe Sgr A* with itelescope.net
Why do presidential pardons exist in a country having a clear separation of powers?
I feel cheated by my new employer, does this sound right?
Why did Starhopper's exhaust plume become brighter just before landing?
How can I throw a body?
How can I reply to coworkers who accuse me of automating people out of work?
How to save money by shopping at a variety of grocery stores?
Under GDPR, can I give permission once to allow everyone to store and process my data?
How to join multiple rows in single pandas dataframe by common key column (fixed length limit)?
Reshape DataFrame from long to wide along one columnAdd one row to pandas DataFrameSelecting multiple columns in a pandas dataframeAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameHow to drop rows of Pandas DataFrame whose value in a certain column is NaN“Large data” work flows using pandasHow do I get the row count of a pandas DataFrame?How to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
How can you join multiple rows in single pandas dataframe by common key column where we let there be a fixed length limit for any combined row of rows (as the number of rows with a given common key in this case is variable)?
Have a dataframe of a form like...
key x1 x2 x3
-------------
1 a1 a2 a3
1 b1 b2 b3
2 c1 c2 c3
3 d1 d2 d3
3 e1 e2 e3
3 f1 f2 f3
3 g1 g2 g3
....
and would like to change it to something like
key x11 x12 x13 x21 x22 x23 x31 x32 x33
-------------
1 a1 a2 a3 b1 b2 b3 NA NA NA
2 c1 c2 c3 NA NA NA NA NA NA
3 d1 d2 d3 e1 e2 e3 f1 f2 f3
....
where column xjk is the kth feature of the jth row having the same key as the other rows grouped in this same row up to (in this case is manually set to...) 3 per group (but may want to change later and may be a value greater than the amount of groupable rows (eg. 5 here) in which case it should just fill with NAs). Notice that when there are less than the max limit of individual original rows to group we fill the values with NA and when there are too many rows we group only up to the max limit of rows and drop the rest from the dataframe. Also note that sometimes an individual row may have missing values.
Any suggestions on how this could be done?
python pandas
add a comment |
How can you join multiple rows in single pandas dataframe by common key column where we let there be a fixed length limit for any combined row of rows (as the number of rows with a given common key in this case is variable)?
Have a dataframe of a form like...
key x1 x2 x3
-------------
1 a1 a2 a3
1 b1 b2 b3
2 c1 c2 c3
3 d1 d2 d3
3 e1 e2 e3
3 f1 f2 f3
3 g1 g2 g3
....
and would like to change it to something like
key x11 x12 x13 x21 x22 x23 x31 x32 x33
-------------
1 a1 a2 a3 b1 b2 b3 NA NA NA
2 c1 c2 c3 NA NA NA NA NA NA
3 d1 d2 d3 e1 e2 e3 f1 f2 f3
....
where column xjk is the kth feature of the jth row having the same key as the other rows grouped in this same row up to (in this case is manually set to...) 3 per group (but may want to change later and may be a value greater than the amount of groupable rows (eg. 5 here) in which case it should just fill with NAs). Notice that when there are less than the max limit of individual original rows to group we fill the values with NA and when there are too many rows we group only up to the max limit of rows and drop the rest from the dataframe. Also note that sometimes an individual row may have missing values.
Any suggestions on how this could be done?
python pandas
add a comment |
How can you join multiple rows in single pandas dataframe by common key column where we let there be a fixed length limit for any combined row of rows (as the number of rows with a given common key in this case is variable)?
Have a dataframe of a form like...
key x1 x2 x3
-------------
1 a1 a2 a3
1 b1 b2 b3
2 c1 c2 c3
3 d1 d2 d3
3 e1 e2 e3
3 f1 f2 f3
3 g1 g2 g3
....
and would like to change it to something like
key x11 x12 x13 x21 x22 x23 x31 x32 x33
-------------
1 a1 a2 a3 b1 b2 b3 NA NA NA
2 c1 c2 c3 NA NA NA NA NA NA
3 d1 d2 d3 e1 e2 e3 f1 f2 f3
....
where column xjk is the kth feature of the jth row having the same key as the other rows grouped in this same row up to (in this case is manually set to...) 3 per group (but may want to change later and may be a value greater than the amount of groupable rows (eg. 5 here) in which case it should just fill with NAs). Notice that when there are less than the max limit of individual original rows to group we fill the values with NA and when there are too many rows we group only up to the max limit of rows and drop the rest from the dataframe. Also note that sometimes an individual row may have missing values.
Any suggestions on how this could be done?
python pandas
How can you join multiple rows in single pandas dataframe by common key column where we let there be a fixed length limit for any combined row of rows (as the number of rows with a given common key in this case is variable)?
Have a dataframe of a form like...
key x1 x2 x3
-------------
1 a1 a2 a3
1 b1 b2 b3
2 c1 c2 c3
3 d1 d2 d3
3 e1 e2 e3
3 f1 f2 f3
3 g1 g2 g3
....
and would like to change it to something like
key x11 x12 x13 x21 x22 x23 x31 x32 x33
-------------
1 a1 a2 a3 b1 b2 b3 NA NA NA
2 c1 c2 c3 NA NA NA NA NA NA
3 d1 d2 d3 e1 e2 e3 f1 f2 f3
....
where column xjk is the kth feature of the jth row having the same key as the other rows grouped in this same row up to (in this case is manually set to...) 3 per group (but may want to change later and may be a value greater than the amount of groupable rows (eg. 5 here) in which case it should just fill with NAs). Notice that when there are less than the max limit of individual original rows to group we fill the values with NA and when there are too many rows we group only up to the max limit of rows and drop the rest from the dataframe. Also note that sometimes an individual row may have missing values.
Any suggestions on how this could be done?
python pandas
python pandas
edited Mar 28 at 1:50
lampShadesDrifter
asked Mar 27 at 22:16
lampShadesDrifterlampShadesDrifter
1,2822 gold badges9 silver badges31 bronze badges
1,2822 gold badges9 silver badges31 bronze badges
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Using groupby
and then ravel
to flatten all values inside a group:
lim = 5
df = df.set_index('key')
k = len(df.columns)
x = df.groupby(level=0).apply(
lambda z: z.iloc[:lim].values.ravel().tolist() +
[np.nan]*(lim*k-z.size))
x = pd.DataFrame(x.tolist(), x.index)
x.columns = [f'x1+i//k1+i%k' for i in x.columns]
print(x)
Output:
x11 x12 x13 x21 x22 x23 x31 x32 x33 x41 x42 x43 x51 x52 x53
key
1 a1 a2 a3 b1 b2 b3 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 c1 c2 c3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 d1 d2 d3 e1 e2 e3 f1 f2 f3 g1 g2 g3 NaN NaN NaN
2
wow, amazing answer
– Yuca
Mar 27 at 22:25
2
Thanks. Note for others using this question, note that the last line's string formatting for labeling the columns will only work in python 3.6+, if still using python 2.7 need to use'x.format((1+i//len(x)), (1+i%len(x)))'
.
– lampShadesDrifter
Mar 27 at 22:50
Sorry, you're right, I missed that requirement. Updated my answer withlim
variable that sets this limit. We basically need to take the firstlim
rows in the apply with.iloc[:lim]
– perl
Mar 27 at 22:53
@lampShadesDrifter: And thanks, it's a very good point about the f-strings in python 3.6
– perl
Mar 27 at 22:57
Oddly, this code does not seem to be working for me (using python 2.7) a test dataframe made to be like the that in the original question. Getting column labels:x11 x12 x13 x14 x15 x16 x17 x18 x19
. I think the last line in the given code should be something like:x.columns = [f'x1+i//len(df.columns)1+i%len(df.columns)' for i in x.columns]
. That then gave me the results shown in this answer.
– lampShadesDrifter
Mar 28 at 2:17
|
show 2 more comments
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55387317%2fhow-to-join-multiple-rows-in-single-pandas-dataframe-by-common-key-column-fixed%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using groupby
and then ravel
to flatten all values inside a group:
lim = 5
df = df.set_index('key')
k = len(df.columns)
x = df.groupby(level=0).apply(
lambda z: z.iloc[:lim].values.ravel().tolist() +
[np.nan]*(lim*k-z.size))
x = pd.DataFrame(x.tolist(), x.index)
x.columns = [f'x1+i//k1+i%k' for i in x.columns]
print(x)
Output:
x11 x12 x13 x21 x22 x23 x31 x32 x33 x41 x42 x43 x51 x52 x53
key
1 a1 a2 a3 b1 b2 b3 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 c1 c2 c3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 d1 d2 d3 e1 e2 e3 f1 f2 f3 g1 g2 g3 NaN NaN NaN
2
wow, amazing answer
– Yuca
Mar 27 at 22:25
2
Thanks. Note for others using this question, note that the last line's string formatting for labeling the columns will only work in python 3.6+, if still using python 2.7 need to use'x.format((1+i//len(x)), (1+i%len(x)))'
.
– lampShadesDrifter
Mar 27 at 22:50
Sorry, you're right, I missed that requirement. Updated my answer withlim
variable that sets this limit. We basically need to take the firstlim
rows in the apply with.iloc[:lim]
– perl
Mar 27 at 22:53
@lampShadesDrifter: And thanks, it's a very good point about the f-strings in python 3.6
– perl
Mar 27 at 22:57
Oddly, this code does not seem to be working for me (using python 2.7) a test dataframe made to be like the that in the original question. Getting column labels:x11 x12 x13 x14 x15 x16 x17 x18 x19
. I think the last line in the given code should be something like:x.columns = [f'x1+i//len(df.columns)1+i%len(df.columns)' for i in x.columns]
. That then gave me the results shown in this answer.
– lampShadesDrifter
Mar 28 at 2:17
|
show 2 more comments
Using groupby
and then ravel
to flatten all values inside a group:
lim = 5
df = df.set_index('key')
k = len(df.columns)
x = df.groupby(level=0).apply(
lambda z: z.iloc[:lim].values.ravel().tolist() +
[np.nan]*(lim*k-z.size))
x = pd.DataFrame(x.tolist(), x.index)
x.columns = [f'x1+i//k1+i%k' for i in x.columns]
print(x)
Output:
x11 x12 x13 x21 x22 x23 x31 x32 x33 x41 x42 x43 x51 x52 x53
key
1 a1 a2 a3 b1 b2 b3 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 c1 c2 c3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 d1 d2 d3 e1 e2 e3 f1 f2 f3 g1 g2 g3 NaN NaN NaN
2
wow, amazing answer
– Yuca
Mar 27 at 22:25
2
Thanks. Note for others using this question, note that the last line's string formatting for labeling the columns will only work in python 3.6+, if still using python 2.7 need to use'x.format((1+i//len(x)), (1+i%len(x)))'
.
– lampShadesDrifter
Mar 27 at 22:50
Sorry, you're right, I missed that requirement. Updated my answer withlim
variable that sets this limit. We basically need to take the firstlim
rows in the apply with.iloc[:lim]
– perl
Mar 27 at 22:53
@lampShadesDrifter: And thanks, it's a very good point about the f-strings in python 3.6
– perl
Mar 27 at 22:57
Oddly, this code does not seem to be working for me (using python 2.7) a test dataframe made to be like the that in the original question. Getting column labels:x11 x12 x13 x14 x15 x16 x17 x18 x19
. I think the last line in the given code should be something like:x.columns = [f'x1+i//len(df.columns)1+i%len(df.columns)' for i in x.columns]
. That then gave me the results shown in this answer.
– lampShadesDrifter
Mar 28 at 2:17
|
show 2 more comments
Using groupby
and then ravel
to flatten all values inside a group:
lim = 5
df = df.set_index('key')
k = len(df.columns)
x = df.groupby(level=0).apply(
lambda z: z.iloc[:lim].values.ravel().tolist() +
[np.nan]*(lim*k-z.size))
x = pd.DataFrame(x.tolist(), x.index)
x.columns = [f'x1+i//k1+i%k' for i in x.columns]
print(x)
Output:
x11 x12 x13 x21 x22 x23 x31 x32 x33 x41 x42 x43 x51 x52 x53
key
1 a1 a2 a3 b1 b2 b3 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 c1 c2 c3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 d1 d2 d3 e1 e2 e3 f1 f2 f3 g1 g2 g3 NaN NaN NaN
Using groupby
and then ravel
to flatten all values inside a group:
lim = 5
df = df.set_index('key')
k = len(df.columns)
x = df.groupby(level=0).apply(
lambda z: z.iloc[:lim].values.ravel().tolist() +
[np.nan]*(lim*k-z.size))
x = pd.DataFrame(x.tolist(), x.index)
x.columns = [f'x1+i//k1+i%k' for i in x.columns]
print(x)
Output:
x11 x12 x13 x21 x22 x23 x31 x32 x33 x41 x42 x43 x51 x52 x53
key
1 a1 a2 a3 b1 b2 b3 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 c1 c2 c3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 d1 d2 d3 e1 e2 e3 f1 f2 f3 g1 g2 g3 NaN NaN NaN
edited Mar 28 at 6:19
answered Mar 27 at 22:24
perlperl
2,1014 silver badges17 bronze badges
2,1014 silver badges17 bronze badges
2
wow, amazing answer
– Yuca
Mar 27 at 22:25
2
Thanks. Note for others using this question, note that the last line's string formatting for labeling the columns will only work in python 3.6+, if still using python 2.7 need to use'x.format((1+i//len(x)), (1+i%len(x)))'
.
– lampShadesDrifter
Mar 27 at 22:50
Sorry, you're right, I missed that requirement. Updated my answer withlim
variable that sets this limit. We basically need to take the firstlim
rows in the apply with.iloc[:lim]
– perl
Mar 27 at 22:53
@lampShadesDrifter: And thanks, it's a very good point about the f-strings in python 3.6
– perl
Mar 27 at 22:57
Oddly, this code does not seem to be working for me (using python 2.7) a test dataframe made to be like the that in the original question. Getting column labels:x11 x12 x13 x14 x15 x16 x17 x18 x19
. I think the last line in the given code should be something like:x.columns = [f'x1+i//len(df.columns)1+i%len(df.columns)' for i in x.columns]
. That then gave me the results shown in this answer.
– lampShadesDrifter
Mar 28 at 2:17
|
show 2 more comments
2
wow, amazing answer
– Yuca
Mar 27 at 22:25
2
Thanks. Note for others using this question, note that the last line's string formatting for labeling the columns will only work in python 3.6+, if still using python 2.7 need to use'x.format((1+i//len(x)), (1+i%len(x)))'
.
– lampShadesDrifter
Mar 27 at 22:50
Sorry, you're right, I missed that requirement. Updated my answer withlim
variable that sets this limit. We basically need to take the firstlim
rows in the apply with.iloc[:lim]
– perl
Mar 27 at 22:53
@lampShadesDrifter: And thanks, it's a very good point about the f-strings in python 3.6
– perl
Mar 27 at 22:57
Oddly, this code does not seem to be working for me (using python 2.7) a test dataframe made to be like the that in the original question. Getting column labels:x11 x12 x13 x14 x15 x16 x17 x18 x19
. I think the last line in the given code should be something like:x.columns = [f'x1+i//len(df.columns)1+i%len(df.columns)' for i in x.columns]
. That then gave me the results shown in this answer.
– lampShadesDrifter
Mar 28 at 2:17
2
2
wow, amazing answer
– Yuca
Mar 27 at 22:25
wow, amazing answer
– Yuca
Mar 27 at 22:25
2
2
Thanks. Note for others using this question, note that the last line's string formatting for labeling the columns will only work in python 3.6+, if still using python 2.7 need to use
'x.format((1+i//len(x)), (1+i%len(x)))'
.– lampShadesDrifter
Mar 27 at 22:50
Thanks. Note for others using this question, note that the last line's string formatting for labeling the columns will only work in python 3.6+, if still using python 2.7 need to use
'x.format((1+i//len(x)), (1+i%len(x)))'
.– lampShadesDrifter
Mar 27 at 22:50
Sorry, you're right, I missed that requirement. Updated my answer with
lim
variable that sets this limit. We basically need to take the first lim
rows in the apply with .iloc[:lim]
– perl
Mar 27 at 22:53
Sorry, you're right, I missed that requirement. Updated my answer with
lim
variable that sets this limit. We basically need to take the first lim
rows in the apply with .iloc[:lim]
– perl
Mar 27 at 22:53
@lampShadesDrifter: And thanks, it's a very good point about the f-strings in python 3.6
– perl
Mar 27 at 22:57
@lampShadesDrifter: And thanks, it's a very good point about the f-strings in python 3.6
– perl
Mar 27 at 22:57
Oddly, this code does not seem to be working for me (using python 2.7) a test dataframe made to be like the that in the original question. Getting column labels:
x11 x12 x13 x14 x15 x16 x17 x18 x19
. I think the last line in the given code should be something like: x.columns = [f'x1+i//len(df.columns)1+i%len(df.columns)' for i in x.columns]
. That then gave me the results shown in this answer.– lampShadesDrifter
Mar 28 at 2:17
Oddly, this code does not seem to be working for me (using python 2.7) a test dataframe made to be like the that in the original question. Getting column labels:
x11 x12 x13 x14 x15 x16 x17 x18 x19
. I think the last line in the given code should be something like: x.columns = [f'x1+i//len(df.columns)1+i%len(df.columns)' for i in x.columns]
. That then gave me the results shown in this answer.– lampShadesDrifter
Mar 28 at 2:17
|
show 2 more comments
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55387317%2fhow-to-join-multiple-rows-in-single-pandas-dataframe-by-common-key-column-fixed%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown