Pandas, stack some columns, unstack some othersSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers
A life of PhD: is it feasible?
Is the first of the 10 Commandments considered a mitzvah?
Is there a radar system monitoring the UK mainland border?
What does BREAD stand for while drafting?
Changing the PK column of a data extension without completely recreating it
How to deal with an excess of white-space in a CRM UI?
A team managed by my peer is close to melting down
Can a 40amp breaker be used safely and without issue with a 40amp device on 6AWG wire?
What do I need to do, tax-wise, for a sudden windfall?
When editor does not respond to the request for withdrawal
What did the 8086 (and 8088) do upon encountering an illegal instruction?
Boss making me feel guilty for leaving the company at the end of my internship
ISP is not hashing the password I log in with online. Should I take any action?
Is all-caps blackletter no longer taboo?
Is fission/fusion to iron the most efficient way to convert mass to energy?
Must a CPU have a GPU if the motherboard provides a display port (when there isn't any separate video card)?
Are athlete's college degrees discounted by employers and graduate school admissions?
As easy as Three, Two, One... How fast can you go from Five to Four?
Do they make "karaoke" versions of concertos for solo practice?
What do you call the action of "describing events as they happen" like sports anchors do?
Why does there seem to be an extreme lack of public trashcans in Taiwan?
What's the difference between DHCP and NAT? Are they mutually exclusive?
Must I use my personal social media account for work?
Nth term of Van Eck Sequence
Pandas, stack some columns, unstack some others
Selecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
Pandas tidy data, spread variables from one column, gather from another
My Problem
I need to turn the dataframe below into a tidy
format, where each row will be a unique ['GEOG_CODE','COUNTRY']
- 'YEAR'
pairing, and there are two variables, defined by Group1
.
Using Hadley Wickham's notation for tidy data:
- The observations are defined by the Location-Time pairings.
- The variables are defined by the column
Group1
- The values are currently stored for different Years in columns
['2016' '2017' '2018']
.
In R I would want to:
gather
the values from the columns['2016' '2017' '2018']
.spread
the values fromGroup1
.- see Garrett Grolemund's explanation here
For my problem:
Location is defined by the['GEOG_CODE','COUNTRY']
.
Values at different times are defined in the columns['2016' '2017' '2018']
.
Variables are defined byGroup1 == A
orGroup1 == B
.
I want to have each row as a Location-Time pair, with two variables. One for Group1 = A
, one for Group1 = B
I have this
toy_data =
'GEOG_CODE':['123','234','567','901'],
'COUNTRY':['England' for _ in range(4)],
'Group1':['A','A','B','B'],
'2016':np.arange(0,4),
'2017':np.arange(0,4),
'2018':np.arange(0,4),
in_df = pd.DataFrame(toy_data)
in_df
Out[]:
GEOG_CODE COUNTRY Group1 2016 2017 2018
0 123 England A 0 0 0
1 234 England A 1 1 1
2 567 England B 2 2 2
3 901 England B 3 3 3
I want this
So I want the output to look like the dataframe below with columns for each of the values in 'Group1'
outcome_data =
'GEOG_CODE': np.tile(['123','234','567','901'],3),
'COUNTRY':['England' for _ in range(4*3)],
'year':np.tile([2016,2017,2018],4),
'low_A':np.tile(np.arange(0,4),3),
'low_B':np.tile(np.arange(0,4),3),
out = pd.DataFrame(outcome_data)
out
Out[]:
GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 234 England 2017 1 1
2 567 England 2018 2 2
3 901 England 2016 3 3
4 123 England 2017 0 0
5 234 England 2018 1 1
6 567 England 2016 2 2
7 901 England 2017 3 3
8 123 England 2018 0 0
9 234 England 2016 1 1
10 567 England 2017 2 2
11 901 England 2018 3 3
I tried df.melt()
I managed to get the data half of the way by using the melt functionality but then I don't know how to turn the groups into rows.
id_vars = ['GEOG_CODE', 'COUNTRY', 'Group1']
value_vars = ['2016', '2017', '2018']
var_name = 'Year'
value_name = 'low_Value'
melt = in_df.melt(id_vars=id_vars,value_vars=value_vars,var_name=var_name, value_name=value_name)
melt
Out[]:
GEOG_CODE COUNTRY Group1 Year low_Value
0 123 England A 2016 0
1 234 England A 2016 1
2 567 England B 2016 2
3 901 England B 2016 3
4 123 England A 2017 0
5 234 England A 2017 1
6 567 England B 2017 2
7 901 England B 2017 3
8 123 England A 2018 0
9 234 England A 2018 1
10 567 England B 2018 2
11 901 England B 2018 3
python python-3.x pandas
|
show 1 more comment
Pandas tidy data, spread variables from one column, gather from another
My Problem
I need to turn the dataframe below into a tidy
format, where each row will be a unique ['GEOG_CODE','COUNTRY']
- 'YEAR'
pairing, and there are two variables, defined by Group1
.
Using Hadley Wickham's notation for tidy data:
- The observations are defined by the Location-Time pairings.
- The variables are defined by the column
Group1
- The values are currently stored for different Years in columns
['2016' '2017' '2018']
.
In R I would want to:
gather
the values from the columns['2016' '2017' '2018']
.spread
the values fromGroup1
.- see Garrett Grolemund's explanation here
For my problem:
Location is defined by the['GEOG_CODE','COUNTRY']
.
Values at different times are defined in the columns['2016' '2017' '2018']
.
Variables are defined byGroup1 == A
orGroup1 == B
.
I want to have each row as a Location-Time pair, with two variables. One for Group1 = A
, one for Group1 = B
I have this
toy_data =
'GEOG_CODE':['123','234','567','901'],
'COUNTRY':['England' for _ in range(4)],
'Group1':['A','A','B','B'],
'2016':np.arange(0,4),
'2017':np.arange(0,4),
'2018':np.arange(0,4),
in_df = pd.DataFrame(toy_data)
in_df
Out[]:
GEOG_CODE COUNTRY Group1 2016 2017 2018
0 123 England A 0 0 0
1 234 England A 1 1 1
2 567 England B 2 2 2
3 901 England B 3 3 3
I want this
So I want the output to look like the dataframe below with columns for each of the values in 'Group1'
outcome_data =
'GEOG_CODE': np.tile(['123','234','567','901'],3),
'COUNTRY':['England' for _ in range(4*3)],
'year':np.tile([2016,2017,2018],4),
'low_A':np.tile(np.arange(0,4),3),
'low_B':np.tile(np.arange(0,4),3),
out = pd.DataFrame(outcome_data)
out
Out[]:
GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 234 England 2017 1 1
2 567 England 2018 2 2
3 901 England 2016 3 3
4 123 England 2017 0 0
5 234 England 2018 1 1
6 567 England 2016 2 2
7 901 England 2017 3 3
8 123 England 2018 0 0
9 234 England 2016 1 1
10 567 England 2017 2 2
11 901 England 2018 3 3
I tried df.melt()
I managed to get the data half of the way by using the melt functionality but then I don't know how to turn the groups into rows.
id_vars = ['GEOG_CODE', 'COUNTRY', 'Group1']
value_vars = ['2016', '2017', '2018']
var_name = 'Year'
value_name = 'low_Value'
melt = in_df.melt(id_vars=id_vars,value_vars=value_vars,var_name=var_name, value_name=value_name)
melt
Out[]:
GEOG_CODE COUNTRY Group1 Year low_Value
0 123 England A 2016 0
1 234 England A 2016 1
2 567 England B 2016 2
3 901 England B 2016 3
4 123 England A 2017 0
5 234 England A 2017 1
6 567 England B 2017 2
7 901 England B 2017 3
8 123 England A 2018 0
9 234 England A 2018 1
10 567 England B 2018 2
11 901 England B 2018 3
python python-3.x pandas
1
have you looked in to thepd.wide_to_long
function?
– aws_apprentice
Mar 25 at 0:08
Yes but I can't see how that can help me directly because the values forstubnames
are not columns, they're values in the columnGroup1
– Tommy Lees
Mar 25 at 0:11
1
Why do you have a "low_C" and where does it come from?
– cs95
Mar 25 at 0:27
1
@TommyLees Let me know if the answer below wasn't what you're looking for. Thanks!
– cs95
Mar 25 at 21:05
1
You are 100% a genius. Thank you for editing my title last night too
– Tommy Lees
Mar 25 at 21:16
|
show 1 more comment
Pandas tidy data, spread variables from one column, gather from another
My Problem
I need to turn the dataframe below into a tidy
format, where each row will be a unique ['GEOG_CODE','COUNTRY']
- 'YEAR'
pairing, and there are two variables, defined by Group1
.
Using Hadley Wickham's notation for tidy data:
- The observations are defined by the Location-Time pairings.
- The variables are defined by the column
Group1
- The values are currently stored for different Years in columns
['2016' '2017' '2018']
.
In R I would want to:
gather
the values from the columns['2016' '2017' '2018']
.spread
the values fromGroup1
.- see Garrett Grolemund's explanation here
For my problem:
Location is defined by the['GEOG_CODE','COUNTRY']
.
Values at different times are defined in the columns['2016' '2017' '2018']
.
Variables are defined byGroup1 == A
orGroup1 == B
.
I want to have each row as a Location-Time pair, with two variables. One for Group1 = A
, one for Group1 = B
I have this
toy_data =
'GEOG_CODE':['123','234','567','901'],
'COUNTRY':['England' for _ in range(4)],
'Group1':['A','A','B','B'],
'2016':np.arange(0,4),
'2017':np.arange(0,4),
'2018':np.arange(0,4),
in_df = pd.DataFrame(toy_data)
in_df
Out[]:
GEOG_CODE COUNTRY Group1 2016 2017 2018
0 123 England A 0 0 0
1 234 England A 1 1 1
2 567 England B 2 2 2
3 901 England B 3 3 3
I want this
So I want the output to look like the dataframe below with columns for each of the values in 'Group1'
outcome_data =
'GEOG_CODE': np.tile(['123','234','567','901'],3),
'COUNTRY':['England' for _ in range(4*3)],
'year':np.tile([2016,2017,2018],4),
'low_A':np.tile(np.arange(0,4),3),
'low_B':np.tile(np.arange(0,4),3),
out = pd.DataFrame(outcome_data)
out
Out[]:
GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 234 England 2017 1 1
2 567 England 2018 2 2
3 901 England 2016 3 3
4 123 England 2017 0 0
5 234 England 2018 1 1
6 567 England 2016 2 2
7 901 England 2017 3 3
8 123 England 2018 0 0
9 234 England 2016 1 1
10 567 England 2017 2 2
11 901 England 2018 3 3
I tried df.melt()
I managed to get the data half of the way by using the melt functionality but then I don't know how to turn the groups into rows.
id_vars = ['GEOG_CODE', 'COUNTRY', 'Group1']
value_vars = ['2016', '2017', '2018']
var_name = 'Year'
value_name = 'low_Value'
melt = in_df.melt(id_vars=id_vars,value_vars=value_vars,var_name=var_name, value_name=value_name)
melt
Out[]:
GEOG_CODE COUNTRY Group1 Year low_Value
0 123 England A 2016 0
1 234 England A 2016 1
2 567 England B 2016 2
3 901 England B 2016 3
4 123 England A 2017 0
5 234 England A 2017 1
6 567 England B 2017 2
7 901 England B 2017 3
8 123 England A 2018 0
9 234 England A 2018 1
10 567 England B 2018 2
11 901 England B 2018 3
python python-3.x pandas
Pandas tidy data, spread variables from one column, gather from another
My Problem
I need to turn the dataframe below into a tidy
format, where each row will be a unique ['GEOG_CODE','COUNTRY']
- 'YEAR'
pairing, and there are two variables, defined by Group1
.
Using Hadley Wickham's notation for tidy data:
- The observations are defined by the Location-Time pairings.
- The variables are defined by the column
Group1
- The values are currently stored for different Years in columns
['2016' '2017' '2018']
.
In R I would want to:
gather
the values from the columns['2016' '2017' '2018']
.spread
the values fromGroup1
.- see Garrett Grolemund's explanation here
For my problem:
Location is defined by the['GEOG_CODE','COUNTRY']
.
Values at different times are defined in the columns['2016' '2017' '2018']
.
Variables are defined byGroup1 == A
orGroup1 == B
.
I want to have each row as a Location-Time pair, with two variables. One for Group1 = A
, one for Group1 = B
I have this
toy_data =
'GEOG_CODE':['123','234','567','901'],
'COUNTRY':['England' for _ in range(4)],
'Group1':['A','A','B','B'],
'2016':np.arange(0,4),
'2017':np.arange(0,4),
'2018':np.arange(0,4),
in_df = pd.DataFrame(toy_data)
in_df
Out[]:
GEOG_CODE COUNTRY Group1 2016 2017 2018
0 123 England A 0 0 0
1 234 England A 1 1 1
2 567 England B 2 2 2
3 901 England B 3 3 3
I want this
So I want the output to look like the dataframe below with columns for each of the values in 'Group1'
outcome_data =
'GEOG_CODE': np.tile(['123','234','567','901'],3),
'COUNTRY':['England' for _ in range(4*3)],
'year':np.tile([2016,2017,2018],4),
'low_A':np.tile(np.arange(0,4),3),
'low_B':np.tile(np.arange(0,4),3),
out = pd.DataFrame(outcome_data)
out
Out[]:
GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 234 England 2017 1 1
2 567 England 2018 2 2
3 901 England 2016 3 3
4 123 England 2017 0 0
5 234 England 2018 1 1
6 567 England 2016 2 2
7 901 England 2017 3 3
8 123 England 2018 0 0
9 234 England 2016 1 1
10 567 England 2017 2 2
11 901 England 2018 3 3
I tried df.melt()
I managed to get the data half of the way by using the melt functionality but then I don't know how to turn the groups into rows.
id_vars = ['GEOG_CODE', 'COUNTRY', 'Group1']
value_vars = ['2016', '2017', '2018']
var_name = 'Year'
value_name = 'low_Value'
melt = in_df.melt(id_vars=id_vars,value_vars=value_vars,var_name=var_name, value_name=value_name)
melt
Out[]:
GEOG_CODE COUNTRY Group1 Year low_Value
0 123 England A 2016 0
1 234 England A 2016 1
2 567 England B 2016 2
3 901 England B 2016 3
4 123 England A 2017 0
5 234 England A 2017 1
6 567 England B 2017 2
7 901 England B 2017 3
8 123 England A 2018 0
9 234 England A 2018 1
10 567 England B 2018 2
11 901 England B 2018 3
python python-3.x pandas
python python-3.x pandas
edited Mar 25 at 21:32
Tommy Lees
asked Mar 25 at 0:04
Tommy LeesTommy Lees
33412
33412
1
have you looked in to thepd.wide_to_long
function?
– aws_apprentice
Mar 25 at 0:08
Yes but I can't see how that can help me directly because the values forstubnames
are not columns, they're values in the columnGroup1
– Tommy Lees
Mar 25 at 0:11
1
Why do you have a "low_C" and where does it come from?
– cs95
Mar 25 at 0:27
1
@TommyLees Let me know if the answer below wasn't what you're looking for. Thanks!
– cs95
Mar 25 at 21:05
1
You are 100% a genius. Thank you for editing my title last night too
– Tommy Lees
Mar 25 at 21:16
|
show 1 more comment
1
have you looked in to thepd.wide_to_long
function?
– aws_apprentice
Mar 25 at 0:08
Yes but I can't see how that can help me directly because the values forstubnames
are not columns, they're values in the columnGroup1
– Tommy Lees
Mar 25 at 0:11
1
Why do you have a "low_C" and where does it come from?
– cs95
Mar 25 at 0:27
1
@TommyLees Let me know if the answer below wasn't what you're looking for. Thanks!
– cs95
Mar 25 at 21:05
1
You are 100% a genius. Thank you for editing my title last night too
– Tommy Lees
Mar 25 at 21:16
1
1
have you looked in to the
pd.wide_to_long
function?– aws_apprentice
Mar 25 at 0:08
have you looked in to the
pd.wide_to_long
function?– aws_apprentice
Mar 25 at 0:08
Yes but I can't see how that can help me directly because the values for
stubnames
are not columns, they're values in the column Group1
– Tommy Lees
Mar 25 at 0:11
Yes but I can't see how that can help me directly because the values for
stubnames
are not columns, they're values in the column Group1
– Tommy Lees
Mar 25 at 0:11
1
1
Why do you have a "low_C" and where does it come from?
– cs95
Mar 25 at 0:27
Why do you have a "low_C" and where does it come from?
– cs95
Mar 25 at 0:27
1
1
@TommyLees Let me know if the answer below wasn't what you're looking for. Thanks!
– cs95
Mar 25 at 21:05
@TommyLees Let me know if the answer below wasn't what you're looking for. Thanks!
– cs95
Mar 25 at 21:05
1
1
You are 100% a genius. Thank you for editing my title last night too
– Tommy Lees
Mar 25 at 21:16
You are 100% a genius. Thank you for editing my title last night too
– Tommy Lees
Mar 25 at 21:16
|
show 1 more comment
1 Answer
1
active
oldest
votes
Perhaps you're looking for stack
instead of melt
:
(df.set_index(['GEOG_CODE', 'COUNTRY', 'Group1'])
.stack()
.unstack(-2)
.ffill(axis=1)
.bfill(axis=1, downcast='infer')
.add_prefix('low_')
.reset_index()
.rename('level_2': 'year', axis=1))
Group1 GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 123 England 2017 0 0
2 123 England 2018 0 0
3 234 England 2016 1 1
4 234 England 2017 1 1
5 234 England 2018 1 1
6 567 England 2016 2 2
7 567 England 2017 2 2
8 567 England 2018 2 2
9 901 England 2016 3 3
10 901 England 2017 3 3
11 901 England 2018 3 3
Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because:ValueError: Index contains duplicate entries, cannot reshape
.
– Tommy Lees
Mar 25 at 21:29
1
@TommyLees This works under the assumption that each['GEOG_CODE', 'COUNTRY', 'Group1']
combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can trydf = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1'])
before running this.
– cs95
Mar 25 at 21:34
If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.
– Tommy Lees
Mar 25 at 22:04
@TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.
– cs95
Mar 25 at 22:30
I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because['COUNTRY','AREA CODE']
have multipleGroup1
values over many timesteps. Can I edit the question or should i ask a new question?
– Tommy Lees
Apr 4 at 15:16
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55329764%2fpandas-stack-some-columns-unstack-some-others%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Perhaps you're looking for stack
instead of melt
:
(df.set_index(['GEOG_CODE', 'COUNTRY', 'Group1'])
.stack()
.unstack(-2)
.ffill(axis=1)
.bfill(axis=1, downcast='infer')
.add_prefix('low_')
.reset_index()
.rename('level_2': 'year', axis=1))
Group1 GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 123 England 2017 0 0
2 123 England 2018 0 0
3 234 England 2016 1 1
4 234 England 2017 1 1
5 234 England 2018 1 1
6 567 England 2016 2 2
7 567 England 2017 2 2
8 567 England 2018 2 2
9 901 England 2016 3 3
10 901 England 2017 3 3
11 901 England 2018 3 3
Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because:ValueError: Index contains duplicate entries, cannot reshape
.
– Tommy Lees
Mar 25 at 21:29
1
@TommyLees This works under the assumption that each['GEOG_CODE', 'COUNTRY', 'Group1']
combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can trydf = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1'])
before running this.
– cs95
Mar 25 at 21:34
If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.
– Tommy Lees
Mar 25 at 22:04
@TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.
– cs95
Mar 25 at 22:30
I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because['COUNTRY','AREA CODE']
have multipleGroup1
values over many timesteps. Can I edit the question or should i ask a new question?
– Tommy Lees
Apr 4 at 15:16
|
show 1 more comment
Perhaps you're looking for stack
instead of melt
:
(df.set_index(['GEOG_CODE', 'COUNTRY', 'Group1'])
.stack()
.unstack(-2)
.ffill(axis=1)
.bfill(axis=1, downcast='infer')
.add_prefix('low_')
.reset_index()
.rename('level_2': 'year', axis=1))
Group1 GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 123 England 2017 0 0
2 123 England 2018 0 0
3 234 England 2016 1 1
4 234 England 2017 1 1
5 234 England 2018 1 1
6 567 England 2016 2 2
7 567 England 2017 2 2
8 567 England 2018 2 2
9 901 England 2016 3 3
10 901 England 2017 3 3
11 901 England 2018 3 3
Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because:ValueError: Index contains duplicate entries, cannot reshape
.
– Tommy Lees
Mar 25 at 21:29
1
@TommyLees This works under the assumption that each['GEOG_CODE', 'COUNTRY', 'Group1']
combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can trydf = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1'])
before running this.
– cs95
Mar 25 at 21:34
If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.
– Tommy Lees
Mar 25 at 22:04
@TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.
– cs95
Mar 25 at 22:30
I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because['COUNTRY','AREA CODE']
have multipleGroup1
values over many timesteps. Can I edit the question or should i ask a new question?
– Tommy Lees
Apr 4 at 15:16
|
show 1 more comment
Perhaps you're looking for stack
instead of melt
:
(df.set_index(['GEOG_CODE', 'COUNTRY', 'Group1'])
.stack()
.unstack(-2)
.ffill(axis=1)
.bfill(axis=1, downcast='infer')
.add_prefix('low_')
.reset_index()
.rename('level_2': 'year', axis=1))
Group1 GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 123 England 2017 0 0
2 123 England 2018 0 0
3 234 England 2016 1 1
4 234 England 2017 1 1
5 234 England 2018 1 1
6 567 England 2016 2 2
7 567 England 2017 2 2
8 567 England 2018 2 2
9 901 England 2016 3 3
10 901 England 2017 3 3
11 901 England 2018 3 3
Perhaps you're looking for stack
instead of melt
:
(df.set_index(['GEOG_CODE', 'COUNTRY', 'Group1'])
.stack()
.unstack(-2)
.ffill(axis=1)
.bfill(axis=1, downcast='infer')
.add_prefix('low_')
.reset_index()
.rename('level_2': 'year', axis=1))
Group1 GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 123 England 2017 0 0
2 123 England 2018 0 0
3 234 England 2016 1 1
4 234 England 2017 1 1
5 234 England 2018 1 1
6 567 England 2016 2 2
7 567 England 2017 2 2
8 567 England 2018 2 2
9 901 England 2016 3 3
10 901 England 2017 3 3
11 901 England 2018 3 3
answered Mar 25 at 0:31
cs95cs95
151k26200270
151k26200270
Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because:ValueError: Index contains duplicate entries, cannot reshape
.
– Tommy Lees
Mar 25 at 21:29
1
@TommyLees This works under the assumption that each['GEOG_CODE', 'COUNTRY', 'Group1']
combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can trydf = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1'])
before running this.
– cs95
Mar 25 at 21:34
If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.
– Tommy Lees
Mar 25 at 22:04
@TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.
– cs95
Mar 25 at 22:30
I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because['COUNTRY','AREA CODE']
have multipleGroup1
values over many timesteps. Can I edit the question or should i ask a new question?
– Tommy Lees
Apr 4 at 15:16
|
show 1 more comment
Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because:ValueError: Index contains duplicate entries, cannot reshape
.
– Tommy Lees
Mar 25 at 21:29
1
@TommyLees This works under the assumption that each['GEOG_CODE', 'COUNTRY', 'Group1']
combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can trydf = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1'])
before running this.
– cs95
Mar 25 at 21:34
If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.
– Tommy Lees
Mar 25 at 22:04
@TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.
– cs95
Mar 25 at 22:30
I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because['COUNTRY','AREA CODE']
have multipleGroup1
values over many timesteps. Can I edit the question or should i ask a new question?
– Tommy Lees
Apr 4 at 15:16
Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because:
ValueError: Index contains duplicate entries, cannot reshape
.– Tommy Lees
Mar 25 at 21:29
Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because:
ValueError: Index contains duplicate entries, cannot reshape
.– Tommy Lees
Mar 25 at 21:29
1
1
@TommyLees This works under the assumption that each
['GEOG_CODE', 'COUNTRY', 'Group1']
combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can try df = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1'])
before running this.– cs95
Mar 25 at 21:34
@TommyLees This works under the assumption that each
['GEOG_CODE', 'COUNTRY', 'Group1']
combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can try df = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1'])
before running this.– cs95
Mar 25 at 21:34
If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.
– Tommy Lees
Mar 25 at 22:04
If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.
– Tommy Lees
Mar 25 at 22:04
@TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.
– cs95
Mar 25 at 22:30
@TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.
– cs95
Mar 25 at 22:30
I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because
['COUNTRY','AREA CODE']
have multiple Group1
values over many timesteps. Can I edit the question or should i ask a new question?– Tommy Lees
Apr 4 at 15:16
I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because
['COUNTRY','AREA CODE']
have multiple Group1
values over many timesteps. Can I edit the question or should i ask a new question?– Tommy Lees
Apr 4 at 15:16
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55329764%2fpandas-stack-some-columns-unstack-some-others%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
have you looked in to the
pd.wide_to_long
function?– aws_apprentice
Mar 25 at 0:08
Yes but I can't see how that can help me directly because the values for
stubnames
are not columns, they're values in the columnGroup1
– Tommy Lees
Mar 25 at 0:11
1
Why do you have a "low_C" and where does it come from?
– cs95
Mar 25 at 0:27
1
@TommyLees Let me know if the answer below wasn't what you're looking for. Thanks!
– cs95
Mar 25 at 21:05
1
You are 100% a genius. Thank you for editing my title last night too
– Tommy Lees
Mar 25 at 21:16