Pandas, stack some columns, unstack some othersSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers

A life of PhD: is it feasible?

Is the first of the 10 Commandments considered a mitzvah?

Is there a radar system monitoring the UK mainland border?

What does BREAD stand for while drafting?

Changing the PK column of a data extension without completely recreating it

How to deal with an excess of white-space in a CRM UI?

A team managed by my peer is close to melting down

Can a 40amp breaker be used safely and without issue with a 40amp device on 6AWG wire?

What do I need to do, tax-wise, for a sudden windfall?

When editor does not respond to the request for withdrawal

What did the 8086 (and 8088) do upon encountering an illegal instruction?

Boss making me feel guilty for leaving the company at the end of my internship

ISP is not hashing the password I log in with online. Should I take any action?

Is all-caps blackletter no longer taboo?

Is fission/fusion to iron the most efficient way to convert mass to energy?

Must a CPU have a GPU if the motherboard provides a display port (when there isn't any separate video card)?

Are athlete's college degrees discounted by employers and graduate school admissions?

As easy as Three, Two, One... How fast can you go from Five to Four?

Do they make "karaoke" versions of concertos for solo practice?

What do you call the action of "describing events as they happen" like sports anchors do?

Why does there seem to be an extreme lack of public trashcans in Taiwan?

What's the difference between DHCP and NAT? Are they mutually exclusive?

Must I use my personal social media account for work?

Nth term of Van Eck Sequence

Pandas, stack some columns, unstack some others

Selecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrameHow to drop rows of Pandas DataFrame whose value in certain columns is NaN“Large data” work flows using pandasChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

Pandas tidy data, spread variables from one column, gather from another

My Problem

I need to turn the dataframe below into a tidy format, where each row will be a unique ['GEOG_CODE','COUNTRY'] - 'YEAR' pairing, and there are two variables, defined by Group1.

Using Hadley Wickham's notation for tidy data:

The observations are defined by the Location-Time pairings.

The variables are defined by the column Group1

The values are currently stored for different Years in columns ['2016' '2017' '2018'].

Tidy Data Semantics

In R I would want to:

gather the values from the columns ['2016' '2017' '2018'].

spread the values from Group1.

see Garrett Grolemund's explanation here

For my problem:

Location is defined by the ['GEOG_CODE','COUNTRY'].

Values at different times are defined in the columns ['2016' '2017' '2018'].

Variables are defined by Group1 == A or Group1 == B.

I want to have each row as a Location-Time pair, with two variables. One for Group1 = A, one for Group1 = B

I have this

toy_data = 
 'GEOG_CODE':['123','234','567','901'],
 'COUNTRY':['England' for _ in range(4)],
 'Group1':['A','A','B','B'],
 '2016':np.arange(0,4),
 '2017':np.arange(0,4),
 '2018':np.arange(0,4),

in_df = pd.DataFrame(toy_data)
in_df

Out[]:
GEOG_CODE COUNTRY Group1 2016 2017 2018
0 123 England A 0 0 0
1 234 England A 1 1 1
2 567 England B 2 2 2
3 901 England B 3 3 3

I want this

So I want the output to look like the dataframe below with columns for each of the values in 'Group1'

outcome_data = 
 'GEOG_CODE': np.tile(['123','234','567','901'],3),
 'COUNTRY':['England' for _ in range(4*3)],
 'year':np.tile([2016,2017,2018],4),
 'low_A':np.tile(np.arange(0,4),3),
 'low_B':np.tile(np.arange(0,4),3),


out = pd.DataFrame(outcome_data)
out

Out[]:
GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 234 England 2017 1 1
2 567 England 2018 2 2
3 901 England 2016 3 3
4 123 England 2017 0 0
5 234 England 2018 1 1
6 567 England 2016 2 2
7 901 England 2017 3 3
8 123 England 2018 0 0
9 234 England 2016 1 1
10 567 England 2017 2 2
11 901 England 2018 3 3

I tried `df.melt()`

I managed to get the data half of the way by using the melt functionality but then I don't know how to turn the groups into rows.

id_vars = ['GEOG_CODE', 'COUNTRY', 'Group1']
value_vars = ['2016', '2017', '2018']
var_name = 'Year'
value_name = 'low_Value'

melt = in_df.melt(id_vars=id_vars,value_vars=value_vars,var_name=var_name, value_name=value_name)
melt

Out[]:
GEOG_CODE COUNTRY Group1 Year low_Value
0 123 England A 2016 0
1 234 England A 2016 1
2 567 England B 2016 2
3 901 England B 2016 3
4 123 England A 2017 0
5 234 England A 2017 1
6 567 England B 2017 2
7 901 England B 2017 3
8 123 England A 2018 0
9 234 England A 2018 1
10 567 England B 2018 2
11 901 England B 2018 3

edited Mar 25 at 21:32

asked Mar 25 at 0:04

Tommy Lees

33412

1

have you looked in to the pd.wide_to_long function?

– aws_apprentice
Mar 25 at 0:08

Yes but I can't see how that can help me directly because the values for stubnames are not columns, they're values in the column Group1

– Tommy Lees
Mar 25 at 0:11

1

Why do you have a "low_C" and where does it come from?

– cs95
Mar 25 at 0:27

1

@TommyLees Let me know if the answer below wasn't what you're looking for. Thanks!

– cs95
Mar 25 at 21:05

1

You are 100% a genius. Thank you for editing my title last night too

– Tommy Lees
Mar 25 at 21:16

|
show 1 more comment

Pandas tidy data, spread variables from one column, gather from another

My Problem

I need to turn the dataframe below into a tidy format, where each row will be a unique ['GEOG_CODE','COUNTRY'] - 'YEAR' pairing, and there are two variables, defined by Group1.

Using Hadley Wickham's notation for tidy data:

The observations are defined by the Location-Time pairings.

The variables are defined by the column Group1

The values are currently stored for different Years in columns ['2016' '2017' '2018'].

Tidy Data Semantics

In R I would want to:

gather the values from the columns ['2016' '2017' '2018'].

spread the values from Group1.

see Garrett Grolemund's explanation here

For my problem:

Location is defined by the ['GEOG_CODE','COUNTRY'].

Values at different times are defined in the columns ['2016' '2017' '2018'].

Variables are defined by Group1 == A or Group1 == B.

I want to have each row as a Location-Time pair, with two variables. One for Group1 = A, one for Group1 = B

I have this

toy_data = 
 'GEOG_CODE':['123','234','567','901'],
 'COUNTRY':['England' for _ in range(4)],
 'Group1':['A','A','B','B'],
 '2016':np.arange(0,4),
 '2017':np.arange(0,4),
 '2018':np.arange(0,4),

in_df = pd.DataFrame(toy_data)
in_df

Out[]:
GEOG_CODE COUNTRY Group1 2016 2017 2018
0 123 England A 0 0 0
1 234 England A 1 1 1
2 567 England B 2 2 2
3 901 England B 3 3 3

I want this

So I want the output to look like the dataframe below with columns for each of the values in 'Group1'

outcome_data = 
 'GEOG_CODE': np.tile(['123','234','567','901'],3),
 'COUNTRY':['England' for _ in range(4*3)],
 'year':np.tile([2016,2017,2018],4),
 'low_A':np.tile(np.arange(0,4),3),
 'low_B':np.tile(np.arange(0,4),3),


out = pd.DataFrame(outcome_data)
out

Out[]:
GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 234 England 2017 1 1
2 567 England 2018 2 2
3 901 England 2016 3 3
4 123 England 2017 0 0
5 234 England 2018 1 1
6 567 England 2016 2 2
7 901 England 2017 3 3
8 123 England 2018 0 0
9 234 England 2016 1 1
10 567 England 2017 2 2
11 901 England 2018 3 3

I tried `df.melt()`

I managed to get the data half of the way by using the melt functionality but then I don't know how to turn the groups into rows.

id_vars = ['GEOG_CODE', 'COUNTRY', 'Group1']
value_vars = ['2016', '2017', '2018']
var_name = 'Year'
value_name = 'low_Value'

melt = in_df.melt(id_vars=id_vars,value_vars=value_vars,var_name=var_name, value_name=value_name)
melt

Out[]:
GEOG_CODE COUNTRY Group1 Year low_Value
0 123 England A 2016 0
1 234 England A 2016 1
2 567 England B 2016 2
3 901 England B 2016 3
4 123 England A 2017 0
5 234 England A 2017 1
6 567 England B 2017 2
7 901 England B 2017 3
8 123 England A 2018 0
9 234 England A 2018 1
10 567 England B 2018 2
11 901 England B 2018 3

edited Mar 25 at 21:32

asked Mar 25 at 0:04

Tommy Lees

33412

1

have you looked in to the pd.wide_to_long function?

– aws_apprentice
Mar 25 at 0:08

Yes but I can't see how that can help me directly because the values for stubnames are not columns, they're values in the column Group1

– Tommy Lees
Mar 25 at 0:11

1

Why do you have a "low_C" and where does it come from?

– cs95
Mar 25 at 0:27

1

@TommyLees Let me know if the answer below wasn't what you're looking for. Thanks!

– cs95
Mar 25 at 21:05

1

You are 100% a genius. Thank you for editing my title last night too

– Tommy Lees
Mar 25 at 21:16

|
show 1 more comment

Pandas tidy data, spread variables from one column, gather from another

My Problem

I need to turn the dataframe below into a tidy format, where each row will be a unique ['GEOG_CODE','COUNTRY'] - 'YEAR' pairing, and there are two variables, defined by Group1.

Using Hadley Wickham's notation for tidy data:

The observations are defined by the Location-Time pairings.

The variables are defined by the column Group1

The values are currently stored for different Years in columns ['2016' '2017' '2018'].

Tidy Data Semantics

In R I would want to:

gather the values from the columns ['2016' '2017' '2018'].

spread the values from Group1.

see Garrett Grolemund's explanation here

For my problem:

Location is defined by the ['GEOG_CODE','COUNTRY'].

Values at different times are defined in the columns ['2016' '2017' '2018'].

Variables are defined by Group1 == A or Group1 == B.

I want to have each row as a Location-Time pair, with two variables. One for Group1 = A, one for Group1 = B

I have this

toy_data = 
 'GEOG_CODE':['123','234','567','901'],
 'COUNTRY':['England' for _ in range(4)],
 'Group1':['A','A','B','B'],
 '2016':np.arange(0,4),
 '2017':np.arange(0,4),
 '2018':np.arange(0,4),

in_df = pd.DataFrame(toy_data)
in_df

Out[]:
GEOG_CODE COUNTRY Group1 2016 2017 2018
0 123 England A 0 0 0
1 234 England A 1 1 1
2 567 England B 2 2 2
3 901 England B 3 3 3

I want this

So I want the output to look like the dataframe below with columns for each of the values in 'Group1'

outcome_data = 
 'GEOG_CODE': np.tile(['123','234','567','901'],3),
 'COUNTRY':['England' for _ in range(4*3)],
 'year':np.tile([2016,2017,2018],4),
 'low_A':np.tile(np.arange(0,4),3),
 'low_B':np.tile(np.arange(0,4),3),


out = pd.DataFrame(outcome_data)
out

Out[]:
GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 234 England 2017 1 1
2 567 England 2018 2 2
3 901 England 2016 3 3
4 123 England 2017 0 0
5 234 England 2018 1 1
6 567 England 2016 2 2
7 901 England 2017 3 3
8 123 England 2018 0 0
9 234 England 2016 1 1
10 567 England 2017 2 2
11 901 England 2018 3 3

I tried `df.melt()`

I managed to get the data half of the way by using the melt functionality but then I don't know how to turn the groups into rows.

id_vars = ['GEOG_CODE', 'COUNTRY', 'Group1']
value_vars = ['2016', '2017', '2018']
var_name = 'Year'
value_name = 'low_Value'

melt = in_df.melt(id_vars=id_vars,value_vars=value_vars,var_name=var_name, value_name=value_name)
melt

Out[]:
GEOG_CODE COUNTRY Group1 Year low_Value
0 123 England A 2016 0
1 234 England A 2016 1
2 567 England B 2016 2
3 901 England B 2016 3
4 123 England A 2017 0
5 234 England A 2017 1
6 567 England B 2017 2
7 901 England B 2017 3
8 123 England A 2018 0
9 234 England A 2018 1
10 567 England B 2018 2
11 901 England B 2018 3

edited Mar 25 at 21:32

asked Mar 25 at 0:04

Tommy Lees

33412

Pandas tidy data, spread variables from one column, gather from another

My Problem

I need to turn the dataframe below into a tidy format, where each row will be a unique ['GEOG_CODE','COUNTRY'] - 'YEAR' pairing, and there are two variables, defined by Group1.

Using Hadley Wickham's notation for tidy data:

The observations are defined by the Location-Time pairings.

The variables are defined by the column Group1

The values are currently stored for different Years in columns ['2016' '2017' '2018'].

Tidy Data Semantics

In R I would want to:

gather the values from the columns ['2016' '2017' '2018'].

spread the values from Group1.

see Garrett Grolemund's explanation here

For my problem:

Location is defined by the ['GEOG_CODE','COUNTRY'].

Values at different times are defined in the columns ['2016' '2017' '2018'].

Variables are defined by Group1 == A or Group1 == B.

I want to have each row as a Location-Time pair, with two variables. One for Group1 = A, one for Group1 = B

I have this

toy_data = 
 'GEOG_CODE':['123','234','567','901'],
 'COUNTRY':['England' for _ in range(4)],
 'Group1':['A','A','B','B'],
 '2016':np.arange(0,4),
 '2017':np.arange(0,4),
 '2018':np.arange(0,4),

in_df = pd.DataFrame(toy_data)
in_df

Out[]:
GEOG_CODE COUNTRY Group1 2016 2017 2018
0 123 England A 0 0 0
1 234 England A 1 1 1
2 567 England B 2 2 2
3 901 England B 3 3 3

I want this

So I want the output to look like the dataframe below with columns for each of the values in 'Group1'

outcome_data = 
 'GEOG_CODE': np.tile(['123','234','567','901'],3),
 'COUNTRY':['England' for _ in range(4*3)],
 'year':np.tile([2016,2017,2018],4),
 'low_A':np.tile(np.arange(0,4),3),
 'low_B':np.tile(np.arange(0,4),3),


out = pd.DataFrame(outcome_data)
out

Out[]:
GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 234 England 2017 1 1
2 567 England 2018 2 2
3 901 England 2016 3 3
4 123 England 2017 0 0
5 234 England 2018 1 1
6 567 England 2016 2 2
7 901 England 2017 3 3
8 123 England 2018 0 0
9 234 England 2016 1 1
10 567 England 2017 2 2
11 901 England 2018 3 3

I tried `df.melt()`

I managed to get the data half of the way by using the melt functionality but then I don't know how to turn the groups into rows.

id_vars = ['GEOG_CODE', 'COUNTRY', 'Group1']
value_vars = ['2016', '2017', '2018']
var_name = 'Year'
value_name = 'low_Value'

melt = in_df.melt(id_vars=id_vars,value_vars=value_vars,var_name=var_name, value_name=value_name)
melt

Out[]:
GEOG_CODE COUNTRY Group1 Year low_Value
0 123 England A 2016 0
1 234 England A 2016 1
2 567 England B 2016 2
3 901 England B 2016 3
4 123 England A 2017 0
5 234 England A 2017 1
6 567 England B 2017 2
7 901 England B 2017 3
8 123 England A 2018 0
9 234 England A 2018 1
10 567 England B 2018 2
11 901 England B 2018 3

python python-3.x pandas

edited Mar 25 at 21:32

asked Mar 25 at 0:04

Tommy Lees

33412

edited Mar 25 at 21:32

asked Mar 25 at 0:04

Tommy Lees

33412

edited Mar 25 at 21:32

asked Mar 25 at 0:04

Tommy Lees

33412

asked Mar 25 at 0:04

Tommy Lees

33412

asked Mar 25 at 0:04

Tommy Lees

33412

1

have you looked in to the pd.wide_to_long function?

– aws_apprentice
Mar 25 at 0:08

Yes but I can't see how that can help me directly because the values for stubnames are not columns, they're values in the column Group1

– Tommy Lees
Mar 25 at 0:11

1

Why do you have a "low_C" and where does it come from?

– cs95
Mar 25 at 0:27

1

@TommyLees Let me know if the answer below wasn't what you're looking for. Thanks!

– cs95
Mar 25 at 21:05

1

You are 100% a genius. Thank you for editing my title last night too

– Tommy Lees
Mar 25 at 21:16

|
show 1 more comment

1

have you looked in to the pd.wide_to_long function?

– aws_apprentice
Mar 25 at 0:08

Yes but I can't see how that can help me directly because the values for stubnames are not columns, they're values in the column Group1

– Tommy Lees
Mar 25 at 0:11

1

Why do you have a "low_C" and where does it come from?

– cs95
Mar 25 at 0:27

1

@TommyLees Let me know if the answer below wasn't what you're looking for. Thanks!

– cs95
Mar 25 at 21:05

1

You are 100% a genius. Thank you for editing my title last night too

– Tommy Lees
Mar 25 at 21:16

have you looked in to the pd.wide_to_long function?

– aws_apprentice
Mar 25 at 0:08

Yes but I can't see how that can help me directly because the values for stubnames are not columns, they're values in the column Group1

– Tommy Lees
Mar 25 at 0:11

Why do you have a "low_C" and where does it come from?

– cs95
Mar 25 at 0:27

@TommyLees Let me know if the answer below wasn't what you're looking for. Thanks!

– cs95
Mar 25 at 21:05

You are 100% a genius. Thank you for editing my title last night too

– Tommy Lees
Mar 25 at 21:16

|
show 1 more comment

1 Answer
1

active

oldest

votes

Perhaps you're looking for stack instead of melt:

(df.set_index(['GEOG_CODE', 'COUNTRY', 'Group1'])
 .stack()
 .unstack(-2)
 .ffill(axis=1)
 .bfill(axis=1, downcast='infer')
 .add_prefix('low_')
 .reset_index()
 .rename('level_2': 'year', axis=1))

Group1 GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 123 England 2017 0 0
2 123 England 2018 0 0
3 234 England 2016 1 1
4 234 England 2017 1 1
5 234 England 2018 1 1
6 567 England 2016 2 2
7 567 England 2017 2 2
8 567 England 2018 2 2
9 901 England 2016 3 3
10 901 England 2017 3 3
11 901 England 2018 3 3

answered Mar 25 at 0:31

cs95

151k26200270

Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because: ValueError: Index contains duplicate entries, cannot reshape.

– Tommy Lees
Mar 25 at 21:29

1

@TommyLees This works under the assumption that each ['GEOG_CODE', 'COUNTRY', 'Group1'] combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can try df = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1']) before running this.

– cs95
Mar 25 at 21:34

If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.

– Tommy Lees
Mar 25 at 22:04

@TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.

– cs95
Mar 25 at 22:30

I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because ['COUNTRY','AREA CODE'] have multiple Group1 values over many timesteps. Can I edit the question or should i ask a new question?

– Tommy Lees
Apr 4 at 15:16

|
show 1 more comment

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55329764%2fpandas-stack-some-columns-unstack-some-others%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Perhaps you're looking for stack instead of melt:

(df.set_index(['GEOG_CODE', 'COUNTRY', 'Group1'])
 .stack()
 .unstack(-2)
 .ffill(axis=1)
 .bfill(axis=1, downcast='infer')
 .add_prefix('low_')
 .reset_index()
 .rename('level_2': 'year', axis=1))

Group1 GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 123 England 2017 0 0
2 123 England 2018 0 0
3 234 England 2016 1 1
4 234 England 2017 1 1
5 234 England 2018 1 1
6 567 England 2016 2 2
7 567 England 2017 2 2
8 567 England 2018 2 2
9 901 England 2016 3 3
10 901 England 2017 3 3
11 901 England 2018 3 3

answered Mar 25 at 0:31

cs95

151k26200270

Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because: ValueError: Index contains duplicate entries, cannot reshape.

– Tommy Lees
Mar 25 at 21:29

1

@TommyLees This works under the assumption that each ['GEOG_CODE', 'COUNTRY', 'Group1'] combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can try df = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1']) before running this.

– cs95
Mar 25 at 21:34

If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.

– Tommy Lees
Mar 25 at 22:04

@TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.

– cs95
Mar 25 at 22:30

I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because ['COUNTRY','AREA CODE'] have multiple Group1 values over many timesteps. Can I edit the question or should i ask a new question?

– Tommy Lees
Apr 4 at 15:16

|
show 1 more comment

Perhaps you're looking for stack instead of melt:

(df.set_index(['GEOG_CODE', 'COUNTRY', 'Group1'])
 .stack()
 .unstack(-2)
 .ffill(axis=1)
 .bfill(axis=1, downcast='infer')
 .add_prefix('low_')
 .reset_index()
 .rename('level_2': 'year', axis=1))

Group1 GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 123 England 2017 0 0
2 123 England 2018 0 0
3 234 England 2016 1 1
4 234 England 2017 1 1
5 234 England 2018 1 1
6 567 England 2016 2 2
7 567 England 2017 2 2
8 567 England 2018 2 2
9 901 England 2016 3 3
10 901 England 2017 3 3
11 901 England 2018 3 3

answered Mar 25 at 0:31

cs95

151k26200270

Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because: ValueError: Index contains duplicate entries, cannot reshape.

– Tommy Lees
Mar 25 at 21:29

1

@TommyLees This works under the assumption that each ['GEOG_CODE', 'COUNTRY', 'Group1'] combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can try df = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1']) before running this.

– cs95
Mar 25 at 21:34

If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.

– Tommy Lees
Mar 25 at 22:04

@TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.

– cs95
Mar 25 at 22:30

I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because ['COUNTRY','AREA CODE'] have multiple Group1 values over many timesteps. Can I edit the question or should i ask a new question?

– Tommy Lees
Apr 4 at 15:16

|
show 1 more comment

Perhaps you're looking for stack instead of melt:

(df.set_index(['GEOG_CODE', 'COUNTRY', 'Group1'])
 .stack()
 .unstack(-2)
 .ffill(axis=1)
 .bfill(axis=1, downcast='infer')
 .add_prefix('low_')
 .reset_index()
 .rename('level_2': 'year', axis=1))

Group1 GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 123 England 2017 0 0
2 123 England 2018 0 0
3 234 England 2016 1 1
4 234 England 2017 1 1
5 234 England 2018 1 1
6 567 England 2016 2 2
7 567 England 2017 2 2
8 567 England 2018 2 2
9 901 England 2016 3 3
10 901 England 2017 3 3
11 901 England 2018 3 3

answered Mar 25 at 0:31

cs95

151k26200270

Perhaps you're looking for stack instead of melt:

(df.set_index(['GEOG_CODE', 'COUNTRY', 'Group1'])
 .stack()
 .unstack(-2)
 .ffill(axis=1)
 .bfill(axis=1, downcast='infer')
 .add_prefix('low_')
 .reset_index()
 .rename('level_2': 'year', axis=1))

Group1 GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 123 England 2017 0 0
2 123 England 2018 0 0
3 234 England 2016 1 1
4 234 England 2017 1 1
5 234 England 2018 1 1
6 567 England 2016 2 2
7 567 England 2017 2 2
8 567 England 2018 2 2
9 901 England 2016 3 3
10 901 England 2017 3 3
11 901 England 2018 3 3

answered Mar 25 at 0:31

cs95

151k26200270

answered Mar 25 at 0:31

cs95

151k26200270

answered Mar 25 at 0:31

cs95

151k26200270

answered Mar 25 at 0:31

cs95

151k26200270

Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because: ValueError: Index contains duplicate entries, cannot reshape.

– Tommy Lees
Mar 25 at 21:29

1

@TommyLees This works under the assumption that each ['GEOG_CODE', 'COUNTRY', 'Group1'] combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can try df = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1']) before running this.

– cs95
Mar 25 at 21:34

If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.

– Tommy Lees
Mar 25 at 22:04

@TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.

– cs95
Mar 25 at 22:30

I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because ['COUNTRY','AREA CODE'] have multiple Group1 values over many timesteps. Can I edit the question or should i ask a new question?

– Tommy Lees
Apr 4 at 15:16

|
show 1 more comment

Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because: ValueError: Index contains duplicate entries, cannot reshape.

– Tommy Lees
Mar 25 at 21:29

1

@TommyLees This works under the assumption that each ['GEOG_CODE', 'COUNTRY', 'Group1'] combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can try df = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1']) before running this.

– cs95
Mar 25 at 21:34

If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.

– Tommy Lees
Mar 25 at 22:04

@TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.

– cs95
Mar 25 at 22:30

I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because ['COUNTRY','AREA CODE'] have multiple Group1 values over many timesteps. Can I edit the question or should i ask a new question?

– Tommy Lees
Apr 4 at 15:16

Could you help me interpret what the different lines do? Unfortunately the code doesn't work removing a line at a time because: ValueError: Index contains duplicate entries, cannot reshape.

– Tommy Lees
Mar 25 at 21:29

@TommyLees This works under the assumption that each ['GEOG_CODE', 'COUNTRY', 'Group1'] combination is unique. The idea is to stack all the columns, while unstacking "Group1". Perhaps you can try df = df.drop_duplicates(subset=['GEOG_CODE', 'COUNTRY', 'Group1']) before running this.

– cs95
Mar 25 at 21:34

If i turn it into a function with a function docstring, can I add it to your answer? Im happy to post as a new answer but don't want to take any of the credit. I just think it might be useful to others (& future me) who want to do something similar.

– Tommy Lees
Mar 25 at 22:04

@TommyLees Sure, feel free to propose an edit at the bottom of my answer and I'll approve it.

– cs95
Mar 25 at 22:30

I am still having problems with getting the code to work with a real problem. The issue that I do have non-unique combinations. Because ['COUNTRY','AREA CODE'] have multiple Group1 values over many timesteps. Can I edit the question or should i ask a new question?

– Tommy Lees
Apr 4 at 15:16

|
show 1 more comment

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

My Problem

I have this

I want this

I tried `df.melt()`

My Problem

I have this

I want this

I tried `df.melt()`

My Problem

I have this

I want this

I tried `df.melt()`

My Problem

I have this

I want this

I tried `df.melt()`

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

My Problem

I have this

I want this

I tried df.melt()

My Problem

I have this

I want this

I tried df.melt()

My Problem

I have this

I want this

I tried df.melt()

My Problem

I have this

I want this

I tried df.melt()

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

I tried `df.melt()`

I tried `df.melt()`

I tried `df.melt()`

I tried `df.melt()`

1 Answer
1

1 Answer
1

1 Answer
1