Pandas, stack some columns, unstack some othersSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers

A life of PhD: is it feasible?

Is the first of the 10 Commandments considered a mitzvah?

Is there a radar system monitoring the UK mainland border?

What does BREAD stand for while drafting?

Changing the PK column of a data extension without completely recreating it

How to deal with an excess of white-space in a CRM UI?

A team managed by my peer is close to melting down

Can a 40amp breaker be used safely and without issue with a 40amp device on 6AWG wire?

What do I need to do, tax-wise, for a sudden windfall?

When editor does not respond to the request for withdrawal

What did the 8086 (and 8088) do upon encountering an illegal instruction?

Boss making me feel guilty for leaving the company at the end of my internship

ISP is not hashing the password I log in with online. Should I take any action?

Is all-caps blackletter no longer taboo?

Is fission/fusion to iron the most efficient way to convert mass to energy?

Must a CPU have a GPU if the motherboard provides a display port (when there isn't any separate video card)?

Are athlete's college degrees discounted by employers and graduate school admissions?

As easy as Three, Two, One... How fast can you go from Five to Four?

Do they make "karaoke" versions of concertos for solo practice?

What do you call the action of "describing events as they happen" like sports anchors do?

Why does there seem to be an extreme lack of public trashcans in Taiwan?

What's the difference between DHCP and NAT? Are they mutually exclusive?

Must I use my personal social media account for work?

Nth term of Van Eck Sequence



Pandas, stack some columns, unstack some others


Selecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








2















Pandas tidy data, spread variables from one column, gather from another



My Problem



I need to turn the dataframe below into a tidy format, where each row will be a unique ['GEOG_CODE','COUNTRY'] - 'YEAR' pairing, and there are two variables, defined by Group1.



Using Hadley Wickham's notation for tidy data:



  • The observations are defined by the Location-Time pairings.

  • The variables are defined by the column Group1

  • The values are currently stored for different Years in columns ['2016' '2017' '2018'].

Tidy Data Semantics



In R I would want to:




  • gather the values from the columns ['2016' '2017' '2018'].


  • spread the values from Group1.

  • see Garrett Grolemund's explanation here

For my problem:




  • Location is defined by the ['GEOG_CODE','COUNTRY'].


  • Values at different times are defined in the columns ['2016' '2017' '2018'].


  • Variables are defined by Group1 == A or Group1 == B.

I want to have each row as a Location-Time pair, with two variables. One for Group1 = A, one for Group1 = B



 I have this



toy_data = 
'GEOG_CODE':['123','234','567','901'],
'COUNTRY':['England' for _ in range(4)],
'Group1':['A','A','B','B'],
'2016':np.arange(0,4),
'2017':np.arange(0,4),
'2018':np.arange(0,4),

in_df = pd.DataFrame(toy_data)
in_df

Out[]:
GEOG_CODE COUNTRY Group1 2016 2017 2018
0 123 England A 0 0 0
1 234 England A 1 1 1
2 567 England B 2 2 2
3 901 England B 3 3 3



I want this



So I want the output to look like the dataframe below with columns for each of the values in 'Group1'



outcome_data = 
'GEOG_CODE': np.tile(['123','234','567','901'],3),
'COUNTRY':['England' for _ in range(4*3)],
'year':np.tile([2016,2017,2018],4),
'low_A':np.tile(np.arange(0,4),3),
'low_B':np.tile(np.arange(0,4),3),


out = pd.DataFrame(outcome_data)
out

Out[]:
GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 234 England 2017 1 1
2 567 England 2018 2 2
3 901 England 2016 3 3
4 123 England 2017 0 0
5 234 England 2018 1 1
6 567 England 2016 2 2
7 901 England 2017 3 3
8 123 England 2018 0 0
9 234 England 2016 1 1
10 567 England 2017 2 2
11 901 England 2018 3 3


 I tried df.melt()



I managed to get the data half of the way by using the melt functionality but then I don't know how to turn the groups into rows.



id_vars = ['GEOG_CODE', 'COUNTRY', 'Group1']
value_vars = ['2016', '2017', '2018']
var_name = 'Year'
value_name = 'low_Value'

melt = in_df.melt(id_vars=id_vars,value_vars=value_vars,var_name=var_name, value_name=value_name)
melt

Out[]:
GEOG_CODE COUNTRY Group1 Year low_Value
0 123 England A 2016 0
1 234 England A 2016 1
2 567 England B 2016 2
3 901 England B 2016 3
4 123 England A 2017 0
5 234 England A 2017 1
6 567 England B 2017 2
7 901 England B 2017 3
8 123 England A 2018 0
9 234 England A 2018 1
10 567 England B 2018 2
11 901 England B 2018 3









share|improve this question



















  • 1





    have you looked in to the pd.wide_to_long function?

    – aws_apprentice
    Mar 25 at 0:08











  • Yes but I can't see how that can help me directly because the values for stubnames are not columns, they're values in the column Group1

    – Tommy Lees
    Mar 25 at 0:11







  • 1





    Why do you have a "low_C" and where does it come from?

    – cs95
    Mar 25 at 0:27






  • 1





    @TommyLees Let me know if the answer below wasn't what you're looking for. Thanks!

    – cs95
    Mar 25 at 21:05






  • 1





    You are 100% a genius. Thank you for editing my title last night too

    – Tommy Lees
    Mar 25 at 21:16

















2















Pandas tidy data, spread variables from one column, gather from another



My Problem



I need to turn the dataframe below into a tidy format, where each row will be a unique ['GEOG_CODE','COUNTRY'] - 'YEAR' pairing, and there are two variables, defined by Group1.



Using Hadley Wickham's notation for tidy data:



  • The observations are defined by the Location-Time pairings.

  • The variables are defined by the column Group1

  • The values are currently stored for different Years in columns ['2016' '2017' '2018'].

Tidy Data Semantics



In R I would want to:




  • gather the values from the columns ['2016' '2017' '2018'].


  • spread the values from Group1.

  • see Garrett Grolemund's explanation here

For my problem:




  • Location is defined by the ['GEOG_CODE','COUNTRY'].


  • Values at different times are defined in the columns ['2016' '2017' '2018'].


  • Variables are defined by Group1 == A or Group1 == B.

I want to have each row as a Location-Time pair, with two variables. One for Group1 = A, one for Group1 = B



 I have this



toy_data = 
'GEOG_CODE':['123','234','567','901'],
'COUNTRY':['England' for _ in range(4)],
'Group1':['A','A','B','B'],
'2016':np.arange(0,4),
'2017':np.arange(0,4),
'2018':np.arange(0,4),

in_df = pd.DataFrame(toy_data)
in_df

Out[]:
GEOG_CODE COUNTRY Group1 2016 2017 2018
0 123 England A 0 0 0
1 234 England A 1 1 1
2 567 England B 2 2 2
3 901 England B 3 3 3



I want this



So I want the output to look like the dataframe below with columns for each of the values in 'Group1'



outcome_data = 
'GEOG_CODE': np.tile(['123','234','567','901'],3),
'COUNTRY':['England' for _ in range(4*3)],
'year':np.tile([2016,2017,2018],4),
'low_A':np.tile(np.arange(0,4),3),
'low_B':np.tile(np.arange(0,4),3),


out = pd.DataFrame(outcome_data)
out

Out[]:
GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 234 England 2017 1 1
2 567 England 2018 2 2
3 901 England 2016 3 3
4 123 England 2017 0 0
5 234 England 2018 1 1
6 567 England 2016 2 2
7 901 England 2017 3 3
8 123 England 2018 0 0
9 234 England 2016 1 1
10 567 England 2017 2 2
11 901 England 2018 3 3


 I tried df.melt()



I managed to get the data half of the way by using the melt functionality but then I don't know how to turn the groups into rows.



id_vars = ['GEOG_CODE', 'COUNTRY', 'Group1']
value_vars = ['2016', '2017', '2018']
var_name = 'Year'
value_name = 'low_Value'

melt = in_df.melt(id_vars=id_vars,value_vars=value_vars,var_name=var_name, value_name=value_name)
melt

Out[]:
GEOG_CODE COUNTRY Group1 Year low_Value
0 123 England A 2016 0
1 234 England A 2016 1
2 567 England B 2016 2
3 901 England B 2016 3
4 123 England A 2017 0
5 234 England A 2017 1
6 567 England B 2017 2
7 901 England B 2017 3
8 123 England A 2018 0
9 234 England A 2018 1
10 567 England B 2018 2
11 901 England B 2018 3









share|improve this question



















  • 1





    have you looked in to the pd.wide_to_long function?

    – aws_apprentice
    Mar 25 at 0:08











  • Yes but I can't see how that can help me directly because the values for stubnames are not columns, they're values in the column Group1

    – Tommy Lees
    Mar 25 at 0:11







  • 1





    Why do you have a "low_C" and where does it come from?

    – cs95
    Mar 25 at 0:27






  • 1





    @TommyLees Let me know if the answer below wasn't what you're looking for. Thanks!

    – cs95
    Mar 25 at 21:05






  • 1





    You are 100% a genius. Thank you for editing my title last night too

    – Tommy Lees
    Mar 25 at 21:16













2












2








2








Pandas tidy data, spread variables from one column, gather from another



My Problem



I need to turn the dataframe below into a tidy format, where each row will be a unique ['GEOG_CODE','COUNTRY'] - 'YEAR' pairing, and there are two variables, defined by Group1.



Using Hadley Wickham's notation for tidy data:



  • The observations are defined by the Location-Time pairings.

  • The variables are defined by the column Group1

  • The values are currently stored for different Years in columns ['2016' '2017' '2018'].

Tidy Data Semantics



In R I would want to:




  • gather the values from the columns ['2016' '2017' '2018'].


  • spread the values from Group1.

  • see Garrett Grolemund's explanation here

For my problem:




  • Location is defined by the ['GEOG_CODE','COUNTRY'].


  • Values at different times are defined in the columns ['2016' '2017' '2018'].


  • Variables are defined by Group1 == A or Group1 == B.

I want to have each row as a Location-Time pair, with two variables. One for Group1 = A, one for Group1 = B



 I have this



toy_data = 
'GEOG_CODE':['123','234','567','901'],
'COUNTRY':['England' for _ in range(4)],
'Group1':['A','A','B','B'],
'2016':np.arange(0,4),
'2017':np.arange(0,4),
'2018':np.arange(0,4),

in_df = pd.DataFrame(toy_data)
in_df

Out[]:
GEOG_CODE COUNTRY Group1 2016 2017 2018
0 123 England A 0 0 0
1 234 England A 1 1 1
2 567 England B 2 2 2
3 901 England B 3 3 3



I want this



So I want the output to look like the dataframe below with columns for each of the values in 'Group1'



outcome_data = 
'GEOG_CODE': np.tile(['123','234','567','901'],3),
'COUNTRY':['England' for _ in range(4*3)],
'year':np.tile([2016,2017,2018],4),
'low_A':np.tile(np.arange(0,4),3),
'low_B':np.tile(np.arange(0,4),3),


out = pd.DataFrame(outcome_data)
out

Out[]:
GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 234 England 2017 1 1
2 567 England 2018 2 2
3 901 England 2016 3 3
4 123 England 2017 0 0
5 234 England 2018 1 1
6 567 England 2016 2 2
7 901 England 2017 3 3
8 123 England 2018 0 0
9 234 England 2016 1 1
10 567 England 2017 2 2
11 901 England 2018 3 3


 I tried df.melt()



I managed to get the data half of the way by using the melt functionality but then I don't know how to turn the groups into rows.



id_vars = ['GEOG_CODE', 'COUNTRY', 'Group1']
value_vars = ['2016', '2017', '2018']
var_name = 'Year'
value_name = 'low_Value'

melt = in_df.melt(id_vars=id_vars,value_vars=value_vars,var_name=var_name, value_name=value_name)
melt

Out[]:
GEOG_CODE COUNTRY Group1 Year low_Value
0 123 England A 2016 0
1 234 England A 2016 1
2 567 England B 2016 2
3 901 England B 2016 3
4 123 England A 2017 0
5 234 England A 2017 1
6 567 England B 2017 2
7 901 England B 2017 3
8 123 England A 2018 0
9 234 England A 2018 1
10 567 England B 2018 2
11 901 England B 2018 3









share|improve this question
















Pandas tidy data, spread variables from one column, gather from another



My Problem



I need to turn the dataframe below into a tidy format, where each row will be a unique ['GEOG_CODE','COUNTRY'] - 'YEAR' pairing, and there are two variables, defined by Group1.



Using Hadley Wickham's notation for tidy data:



  • The observations are defined by the Location-Time pairings.

  • The variables are defined by the column Group1

  • The values are currently stored for different Years in columns ['2016' '2017' '2018'].

Tidy Data Semantics



In R I would want to:




  • gather the values from the columns ['2016' '2017' '2018'].


  • spread the values from Group1.

  • see Garrett Grolemund's explanation here

For my problem:




  • Location is defined by the ['GEOG_CODE','COUNTRY'].


  • Values at different times are defined in the columns ['2016' '2017' '2018'].


  • Variables are defined by Group1 == A or Group1 == B.

I want to have each row as a Location-Time pair, with two variables. One for Group1 = A, one for Group1 = B



 I have this



toy_data = 
'GEOG_CODE':['123','234','567','901'],
'COUNTRY':['England' for _ in range(4)],
'Group1':['A','A','B','B'],
'2016':np.arange(0,4),
'2017':np.arange(0,4),
'2018':np.arange(0,4),

in_df = pd.DataFrame(toy_data)
in_df

Out[]:
GEOG_CODE COUNTRY Group1 2016 2017 2018
0 123 England A 0 0 0
1 234 England A 1 1 1
2 567 England B 2 2 2
3 901 England B 3 3 3



I want this



So I want the output to look like the dataframe below with columns for each of the values in 'Group1'



outcome_data = 
'GEOG_CODE': np.tile(['123','234','567','901'],3),
'COUNTRY':['England' for _ in range(4*3)],
'year':np.tile([2016,2017,2018],4),
'low_A':np.tile(np.arange(0,4),3),
'low_B':np.tile(np.arange(0,4),3),


out = pd.DataFrame(outcome_data)
out

Out[]:
GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 234 England 2017 1 1
2 567 England 2018 2 2
3 901 England 2016 3 3
4 123 England 2017 0 0
5 234 England 2018 1 1
6 567 England 2016 2 2
7 901 England 2017 3 3
8 123 England 2018 0 0
9 234 England 2016 1 1
10 567 England 2017 2 2
11 901 England 2018 3 3


 I tried df.melt()



I managed to get the data half of the way by using the melt functionality but then I don't know how to turn the groups into rows.



id_vars = ['GEOG_CODE', 'COUNTRY', 'Group1']
value_vars = ['2016', '2017', '2018']
var_name = 'Year'
value_name = 'low_Value'

melt = in_df.melt(id_vars=id_vars,value_vars=value_vars,var_name=var_name, value_name=value_name)
melt

Out[]:
GEOG_CODE COUNTRY Group1 Year low_Value
0 123 England A 2016 0
1 234 England A 2016 1
2 567 England B 2016 2
3 901 England B 2016 3
4 123 England A 2017 0
5 234 England A 2017 1
6 567 England B 2017 2
7 901 England B 2017 3
8 123 England A 2018 0
9 234 England A 2018 1
10 567 England B 2018 2
11 901 England B 2018 3






python python-3.x pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 25 at 21:32







Tommy Lees

















asked Mar 25 at 0:04









Tommy LeesTommy Lees

33412




33412







  • 1





    have you looked in to the pd.wide_to_long function?

    – aws_apprentice
    Mar 25 at 0:08











  • Yes but I can't see how that can help me directly because the values for stubnames are not columns, they're values in the column Group1

    – Tommy Lees
    Mar 25 at 0:11







  • 1





    Why do you have a "low_C" and where does it come from?

    – cs95
    Mar 25 at 0:27






  • 1





    @TommyLees Let me know if the answer below wasn't what you're looking for. Thanks!

    – cs95
    Mar 25 at 21:05






  • 1





    You are 100% a genius. Thank you for editing my title last night too

    – Tommy Lees
    Mar 25 at 21:16












  • 1





    have you looked in to the pd.wide_to_long function?

    – aws_apprentice
    Mar 25 at 0:08











  • Yes but I can't see how that can help me directly because the values for stubnames are not columns, they're values in the column Group1

    – Tommy Lees
    Mar 25 at 0:11







  • 1





    Why do you have a "low_C" and where does it come from?

    – cs95
    Mar 25 at 0:27






  • 1





    @TommyLees Let me know if the answer below wasn't what you're looking for. Thanks!

    – cs95
    Mar 25 at 21:05






  • 1





    You are 100% a genius. Thank you for editing my title last night too

    – Tommy Lees
    Mar 25 at 21:16







1




1





have you looked in to the pd.wide_to_long function?

– aws_apprentice
Mar 25 at 0:08





have you looked in to the pd.wide_to_long function?

– aws_apprentice
Mar 25 at 0:08













Yes but I can't see how that can help me directly because the values for stubnames are not columns, they're values in the column Group1

– Tommy Lees
Mar 25 at 0:11






Yes but I can't see how that can help me directly because the values for stubnames are not columns, they're values in the column Group1

– Tommy Lees
Mar 25 at 0:11





1




1





Why do you have a "low_C" and where does it come from?

– cs95
Mar 25 at 0:27





Why do you have a "low_C" and where does it come from?

– cs95
Mar 25 at 0:27




1




1





@TommyLees Let me know if the answer below wasn't what you're looking for. Thanks!

– cs95
Mar 25 at 21:05





@TommyLees Let me know if the answer below wasn't what you're looking for. Thanks!

– cs95
Mar 25 at 21:05




1




1





You are 100% a genius. Thank you for editing my title last night too

– Tommy Lees
Mar 25 at 21:16





You are 100% a genius. Thank you for editing my title last night too

– Tommy Lees
Mar 25 at 21:16












1 Answer
1






active

oldest

votes


















1














Perhaps you're looking for stack instead of melt:



(df.set_index(['GEOG_CODE', 'COUNTRY', 'Group1'])
.stack()
.unstack(-2)
.ffill(axis=1)
.bfill(axis=1, downcast='infer')
.add_prefix('low_')
.reset_index()
.rename('level_2': 'year', axis=1))

Group1 GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 123 England 2017 0 0
2 123 England 2018 0 0
3 234 England 2016 1 1
4 234 England 2017 1 1
5 234 England 2018 1 1
6 567 England 2016 2 2
7 567 England 2017 2 2
8 567 England 2018 2 2
9 901 England 2016 3 3
10 901 England 2017 3 3
11 901 England 2018 3 3





share|improve this answer























  • Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because: ValueError: Index contains duplicate entries, cannot reshape.

    – Tommy Lees
    Mar 25 at 21:29






  • 1





    @TommyLees This works under the assumption that each ['GEOG_CODE', 'COUNTRY', 'Group1'] combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can try df = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1']) before running this.

    – cs95
    Mar 25 at 21:34











  • If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.

    – Tommy Lees
    Mar 25 at 22:04











  • @TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.

    – cs95
    Mar 25 at 22:30












  • I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because ['COUNTRY','AREA CODE'] have multiple Group1 values over many timesteps. Can I edit the question or should i ask a new question?

    – Tommy Lees
    Apr 4 at 15:16











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55329764%2fpandas-stack-some-columns-unstack-some-others%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Perhaps you're looking for stack instead of melt:



(df.set_index(['GEOG_CODE', 'COUNTRY', 'Group1'])
.stack()
.unstack(-2)
.ffill(axis=1)
.bfill(axis=1, downcast='infer')
.add_prefix('low_')
.reset_index()
.rename('level_2': 'year', axis=1))

Group1 GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 123 England 2017 0 0
2 123 England 2018 0 0
3 234 England 2016 1 1
4 234 England 2017 1 1
5 234 England 2018 1 1
6 567 England 2016 2 2
7 567 England 2017 2 2
8 567 England 2018 2 2
9 901 England 2016 3 3
10 901 England 2017 3 3
11 901 England 2018 3 3





share|improve this answer























  • Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because: ValueError: Index contains duplicate entries, cannot reshape.

    – Tommy Lees
    Mar 25 at 21:29






  • 1





    @TommyLees This works under the assumption that each ['GEOG_CODE', 'COUNTRY', 'Group1'] combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can try df = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1']) before running this.

    – cs95
    Mar 25 at 21:34











  • If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.

    – Tommy Lees
    Mar 25 at 22:04











  • @TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.

    – cs95
    Mar 25 at 22:30












  • I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because ['COUNTRY','AREA CODE'] have multiple Group1 values over many timesteps. Can I edit the question or should i ask a new question?

    – Tommy Lees
    Apr 4 at 15:16















1














Perhaps you're looking for stack instead of melt:



(df.set_index(['GEOG_CODE', 'COUNTRY', 'Group1'])
.stack()
.unstack(-2)
.ffill(axis=1)
.bfill(axis=1, downcast='infer')
.add_prefix('low_')
.reset_index()
.rename('level_2': 'year', axis=1))

Group1 GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 123 England 2017 0 0
2 123 England 2018 0 0
3 234 England 2016 1 1
4 234 England 2017 1 1
5 234 England 2018 1 1
6 567 England 2016 2 2
7 567 England 2017 2 2
8 567 England 2018 2 2
9 901 England 2016 3 3
10 901 England 2017 3 3
11 901 England 2018 3 3





share|improve this answer























  • Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because: ValueError: Index contains duplicate entries, cannot reshape.

    – Tommy Lees
    Mar 25 at 21:29






  • 1





    @TommyLees This works under the assumption that each ['GEOG_CODE', 'COUNTRY', 'Group1'] combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can try df = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1']) before running this.

    – cs95
    Mar 25 at 21:34











  • If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.

    – Tommy Lees
    Mar 25 at 22:04











  • @TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.

    – cs95
    Mar 25 at 22:30












  • I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because ['COUNTRY','AREA CODE'] have multiple Group1 values over many timesteps. Can I edit the question or should i ask a new question?

    – Tommy Lees
    Apr 4 at 15:16













1












1








1







Perhaps you're looking for stack instead of melt:



(df.set_index(['GEOG_CODE', 'COUNTRY', 'Group1'])
.stack()
.unstack(-2)
.ffill(axis=1)
.bfill(axis=1, downcast='infer')
.add_prefix('low_')
.reset_index()
.rename('level_2': 'year', axis=1))

Group1 GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 123 England 2017 0 0
2 123 England 2018 0 0
3 234 England 2016 1 1
4 234 England 2017 1 1
5 234 England 2018 1 1
6 567 England 2016 2 2
7 567 England 2017 2 2
8 567 England 2018 2 2
9 901 England 2016 3 3
10 901 England 2017 3 3
11 901 England 2018 3 3





share|improve this answer













Perhaps you're looking for stack instead of melt:



(df.set_index(['GEOG_CODE', 'COUNTRY', 'Group1'])
.stack()
.unstack(-2)
.ffill(axis=1)
.bfill(axis=1, downcast='infer')
.add_prefix('low_')
.reset_index()
.rename('level_2': 'year', axis=1))

Group1 GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 123 England 2017 0 0
2 123 England 2018 0 0
3 234 England 2016 1 1
4 234 England 2017 1 1
5 234 England 2018 1 1
6 567 England 2016 2 2
7 567 England 2017 2 2
8 567 England 2018 2 2
9 901 England 2016 3 3
10 901 England 2017 3 3
11 901 England 2018 3 3






share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 25 at 0:31









cs95cs95

151k26200270




151k26200270












  • Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because: ValueError: Index contains duplicate entries, cannot reshape.

    – Tommy Lees
    Mar 25 at 21:29






  • 1





    @TommyLees This works under the assumption that each ['GEOG_CODE', 'COUNTRY', 'Group1'] combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can try df = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1']) before running this.

    – cs95
    Mar 25 at 21:34











  • If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.

    – Tommy Lees
    Mar 25 at 22:04











  • @TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.

    – cs95
    Mar 25 at 22:30












  • I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because ['COUNTRY','AREA CODE'] have multiple Group1 values over many timesteps. Can I edit the question or should i ask a new question?

    – Tommy Lees
    Apr 4 at 15:16

















  • Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because: ValueError: Index contains duplicate entries, cannot reshape.

    – Tommy Lees
    Mar 25 at 21:29






  • 1





    @TommyLees This works under the assumption that each ['GEOG_CODE', 'COUNTRY', 'Group1'] combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can try df = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1']) before running this.

    – cs95
    Mar 25 at 21:34











  • If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.

    – Tommy Lees
    Mar 25 at 22:04











  • @TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.

    – cs95
    Mar 25 at 22:30












  • I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because ['COUNTRY','AREA CODE'] have multiple Group1 values over many timesteps. Can I edit the question or should i ask a new question?

    – Tommy Lees
    Apr 4 at 15:16
















Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because: ValueError: Index contains duplicate entries, cannot reshape.

– Tommy Lees
Mar 25 at 21:29





Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because: ValueError: Index contains duplicate entries, cannot reshape.

– Tommy Lees
Mar 25 at 21:29




1




1





@TommyLees This works under the assumption that each ['GEOG_CODE', 'COUNTRY', 'Group1'] combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can try df = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1']) before running this.

– cs95
Mar 25 at 21:34





@TommyLees This works under the assumption that each ['GEOG_CODE', 'COUNTRY', 'Group1'] combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can try df = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1']) before running this.

– cs95
Mar 25 at 21:34













If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.

– Tommy Lees
Mar 25 at 22:04





If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.

– Tommy Lees
Mar 25 at 22:04













@TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.

– cs95
Mar 25 at 22:30






@TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.

– cs95
Mar 25 at 22:30














I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because ['COUNTRY','AREA CODE'] have multiple Group1 values over many timesteps. Can I edit the question or should i ask a new question?

– Tommy Lees
Apr 4 at 15:16





I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because ['COUNTRY','AREA CODE'] have multiple Group1 values over many timesteps. Can I edit the question or should i ask a new question?

– Tommy Lees
Apr 4 at 15:16



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55329764%2fpandas-stack-some-columns-unstack-some-others%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript