How to cycle a Pandas dataframe grouping by hierarchical multiindex from top to bottom and store resultsQuestion about Hierarchical dataBest way to work with hierarchal python data that needs to be aggregated at many levelsRead hierarchical (tree-like) XML into a pandas dataframe, preserving hierarchyMASE Extraction Hierarchical Data ('hts' and 'forecast' packages R)Grouping and Multiindexing a pandas dataframePandas dataframe of dataframes with hierarchical columnsHow to store the dataframe from the output from group byConvert dict constructor to Pandas MultiIndex dataframePandas groupby result into a dataframePandas DataFrame --> GroupBy --> MultiIndex Process

Is Having my Players Control Two Parties a Good Idea?

When was “sf” first used to describe science fiction?

Is fascism intrinsically violent?

What does "なかなか" mean here?

A tin of biscuits vs A biscuit tin, Is there any slight difference in meaning between that two phrases?

Skewer removal without quick release

A replacement for NextPermutation in Combinatorica

Distance vs a distance

Ginger Baker dead: Why is he been called the most dangerous drummer?

7 mentions of night in Gospel of John

If the music alphabet had more than 7 letters would octaves still sound like the same note?

Island of Knights, Knaves, Spies

How can a "proper" function have a vertical slope?

How to respond to "Why didn't you do a postdoc after your PhD?"

Does the warlock's Gift of the Ever-Living Ones eldritch invocation work with potions or healing spells cast on you by others?

Is it plausible that an interrupted Windows update can cause the motherboard to fail?

Can you take an Immortal Phoenix out of the game?

How does Firefox know my ISP login page?

Was Hakhel performed in separation of men and women?

What is the German word for: "It only works when I try to show you how it does not work"?

Can I get bubble tea at Taiyuan airport?

Is it realistic that an advanced species isn't good at war?

How to evaluate hindrances?

Is a light year a different distance if measured from a moving object?

How to cycle a Pandas dataframe grouping by hierarchical multiindex from top to bottom and store results

Question about Hierarchical dataBest way to work with hierarchal python data that needs to be aggregated at many levelsRead hierarchical (tree-like) XML into a pandas dataframe, preserving hierarchyMASE Extraction Hierarchical Data ('hts' and 'forecast' packages R)Grouping and Multiindexing a pandas dataframePandas dataframe of dataframes with hierarchical columnsHow to store the dataframe from the output from group byConvert dict constructor to Pandas MultiIndex dataframePandas groupby result into a dataframePandas DataFrame --> GroupBy --> MultiIndex Process

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;

I'm trying to create a forecasting process using hierarchical time series. My problem is that I can't find a way to create a for loop that hierarchically extracts daily time series from a pandas dataframe grouping the sum of quantities by date. The resulting daily time series should be passed to a function inside the loop, and the results stored in some other object.

Dataset

The initial dataset is a table that represents the daily sales data of 3 hierarchical levels: city, shop, product. The initial table has this structure:

+============+============+============+============+==========+
| Id_Level_1 | Id_Level_2 | Id_Level_3 | Date | Quantity |
+============+============+============+============+==========+
| Rome | Shop1 | Prod1 | 01/01/2015 | 50 |
+------------+------------+------------+------------+----------+
| Rome | Shop1 | Prod1 | 02/01/2015 | 25 |
+------------+------------+------------+------------+----------+
| Rome | Shop1 | Prod1 | 03/01/2015 | 73 |
+------------+------------+------------+------------+----------+
| Rome | Shop1 | Prod1 | 04/01/2015 | 62 |
+------------+------------+------------+------------+----------+
| ... | ... | ... | ... | ... |
+------------+------------+------------+------------+----------+
| Milan | Shop3 | Prod9 | 31/12/2018 | 185 |
+------------+------------+------------+------------+----------+
| Milan | Shop3 | Prod9 | 31/12/2018 | 147 |
+------------+------------+------------+------------+----------+
| Milan | Shop3 | Prod9 | 31/12/2018 | 206 |
+------------+------------+------------+------------+----------+

Each City (Id_Level_1) has many Shops (Id_Level_2), and each one has some Products (Id_Level_3). Each shop has a different mix of products (maybe shop1 and shop3 have product7, which is not available in other shops). All data are daily and the measure of interest is the quantity.

Hierarchical Index (MultiIndex)

I need to create a tree structure (hierarchical structure) to extract a time series for each "node" of the structure. I call a "node" a cobination of the hierarchical keys, i.e. "Rome" and "Milan" are nodes of Level 1, while "Rome|Shop1" and "Milan|Shop9" are nodes of level 2. In particulare, I need this on level 3, because each product (Id_Level_3) has different sales in each shop of each city. Here is the strict hierarchy.
Nodes of level 3 are "Rome, Shop1, Prod1", "Rome, Shop1, Prod2", "Rome, Shop2, Prod1", and so on. The key of the nodes is logically the concatenation of the ids.

For each node, the time series is composed by two columns: Date and Quantity.

# MultiIndex dataframe
Liv_Labels = ['Id_Level_1', 'Id_Level_2', 'Id_Level_3', 'Date']
df.set_index(Liv_Labels, drop=False, inplace=True)

The I need to extract the aggregated time series in order but keeping the hierarchical nodes.

Level 0:

Level_0 = df.groupby(level=['Data'])['Qta'].sum()

Level 1:

# Node Level 1 "Rome"
Level_1['Rome'] = df.loc[idx[['Rome'],:,:]].groupby(level=['Data']).sum()

# Node Level 1 "Milan"
Level_1['Milan'] = df.loc[idx[['Milan'],:,:]].groupby(level=['Data']).sum()

Level 2:

# Node Level 2 "Rome, Shop1"
Level_2['Rome',] = df.loc[idx[['Rome'],['Shop1'],:]].groupby(level=['Data']).sum()

... repeat for each level 2 node ...

# Node Level 2 "Milan, Shop9"
Level_2['Milan'] = df.loc[idx[['Milan'],['Shop9'],:]].groupby(level=['Data']).sum()

Attempts

I already tried creating dictionaries and multiindex, but my problem is that I can't get a proper "node" use inside the loop. I can't even extract the unique level nodes keys, so I can't collect a specific node time series.

# Get level labels
Level_Labels = ['Id_Liv'+str(n) for n in range(1, Liv_Num+1)]+['Data']

# Initialize dictionary
TimeSeries = 

# Get Level 0 time series
TimeSeries["Level_0"] = df.groupby(level=['Data'])['Qta'].sum()

# Get othe levels time series from 1 to Level_Num
for i in range(1, Liv_Num+1):
 TimeSeries["Level_"+str(i)] = df.groupby(level=Level_Labels[0:i]+['Data'])['Qta'].sum()

Desired result

I would like a loop the cycles my dataset with these actions:

Creates a structure of all the unique node keys

Extracts the node time series grouped by Date and Quantity

Store the time series in a structure for later use

Thanks in advance for any suggestion! Best regards.
FR

asked Mar 28 at 21:22

Federico Rizzello

112 bronze badges

add a comment
|

Dataset

The initial dataset is a table that represents the daily sales data of 3 hierarchical levels: city, shop, product. The initial table has this structure:

+============+============+============+============+==========+
| Id_Level_1 | Id_Level_2 | Id_Level_3 | Date | Quantity |
+============+============+============+============+==========+
| Rome | Shop1 | Prod1 | 01/01/2015 | 50 |
+------------+------------+------------+------------+----------+
| Rome | Shop1 | Prod1 | 02/01/2015 | 25 |
+------------+------------+------------+------------+----------+
| Rome | Shop1 | Prod1 | 03/01/2015 | 73 |
+------------+------------+------------+------------+----------+
| Rome | Shop1 | Prod1 | 04/01/2015 | 62 |
+------------+------------+------------+------------+----------+
| ... | ... | ... | ... | ... |
+------------+------------+------------+------------+----------+
| Milan | Shop3 | Prod9 | 31/12/2018 | 185 |
+------------+------------+------------+------------+----------+
| Milan | Shop3 | Prod9 | 31/12/2018 | 147 |
+------------+------------+------------+------------+----------+
| Milan | Shop3 | Prod9 | 31/12/2018 | 206 |
+------------+------------+------------+------------+----------+

Hierarchical Index (MultiIndex)

For each node, the time series is composed by two columns: Date and Quantity.

# MultiIndex dataframe
Liv_Labels = ['Id_Level_1', 'Id_Level_2', 'Id_Level_3', 'Date']
df.set_index(Liv_Labels, drop=False, inplace=True)

The I need to extract the aggregated time series in order but keeping the hierarchical nodes.

Level 0:

Level_0 = df.groupby(level=['Data'])['Qta'].sum()

Level 1:

# Node Level 1 "Rome"
Level_1['Rome'] = df.loc[idx[['Rome'],:,:]].groupby(level=['Data']).sum()

# Node Level 1 "Milan"
Level_1['Milan'] = df.loc[idx[['Milan'],:,:]].groupby(level=['Data']).sum()

Level 2:

# Node Level 2 "Rome, Shop1"
Level_2['Rome',] = df.loc[idx[['Rome'],['Shop1'],:]].groupby(level=['Data']).sum()

... repeat for each level 2 node ...

# Node Level 2 "Milan, Shop9"
Level_2['Milan'] = df.loc[idx[['Milan'],['Shop9'],:]].groupby(level=['Data']).sum()

Attempts

# Get level labels
Level_Labels = ['Id_Liv'+str(n) for n in range(1, Liv_Num+1)]+['Data']

# Initialize dictionary
TimeSeries = 

# Get Level 0 time series
TimeSeries["Level_0"] = df.groupby(level=['Data'])['Qta'].sum()

# Get othe levels time series from 1 to Level_Num
for i in range(1, Liv_Num+1):
 TimeSeries["Level_"+str(i)] = df.groupby(level=Level_Labels[0:i]+['Data'])['Qta'].sum()

Desired result

I would like a loop the cycles my dataset with these actions:

Creates a structure of all the unique node keys

Extracts the node time series grouped by Date and Quantity

Store the time series in a structure for later use

Thanks in advance for any suggestion! Best regards.
FR

asked Mar 28 at 21:22

Federico Rizzello

112 bronze badges

add a comment
|

Dataset

The initial dataset is a table that represents the daily sales data of 3 hierarchical levels: city, shop, product. The initial table has this structure:

+============+============+============+============+==========+
| Id_Level_1 | Id_Level_2 | Id_Level_3 | Date | Quantity |
+============+============+============+============+==========+
| Rome | Shop1 | Prod1 | 01/01/2015 | 50 |
+------------+------------+------------+------------+----------+
| Rome | Shop1 | Prod1 | 02/01/2015 | 25 |
+------------+------------+------------+------------+----------+
| Rome | Shop1 | Prod1 | 03/01/2015 | 73 |
+------------+------------+------------+------------+----------+
| Rome | Shop1 | Prod1 | 04/01/2015 | 62 |
+------------+------------+------------+------------+----------+
| ... | ... | ... | ... | ... |
+------------+------------+------------+------------+----------+
| Milan | Shop3 | Prod9 | 31/12/2018 | 185 |
+------------+------------+------------+------------+----------+
| Milan | Shop3 | Prod9 | 31/12/2018 | 147 |
+------------+------------+------------+------------+----------+
| Milan | Shop3 | Prod9 | 31/12/2018 | 206 |
+------------+------------+------------+------------+----------+

Hierarchical Index (MultiIndex)

For each node, the time series is composed by two columns: Date and Quantity.

# MultiIndex dataframe
Liv_Labels = ['Id_Level_1', 'Id_Level_2', 'Id_Level_3', 'Date']
df.set_index(Liv_Labels, drop=False, inplace=True)

The I need to extract the aggregated time series in order but keeping the hierarchical nodes.

Level 0:

Level_0 = df.groupby(level=['Data'])['Qta'].sum()

Level 1:

# Node Level 1 "Rome"
Level_1['Rome'] = df.loc[idx[['Rome'],:,:]].groupby(level=['Data']).sum()

# Node Level 1 "Milan"
Level_1['Milan'] = df.loc[idx[['Milan'],:,:]].groupby(level=['Data']).sum()

Level 2:

# Node Level 2 "Rome, Shop1"
Level_2['Rome',] = df.loc[idx[['Rome'],['Shop1'],:]].groupby(level=['Data']).sum()

... repeat for each level 2 node ...

# Node Level 2 "Milan, Shop9"
Level_2['Milan'] = df.loc[idx[['Milan'],['Shop9'],:]].groupby(level=['Data']).sum()

Attempts

# Get level labels
Level_Labels = ['Id_Liv'+str(n) for n in range(1, Liv_Num+1)]+['Data']

# Initialize dictionary
TimeSeries = 

# Get Level 0 time series
TimeSeries["Level_0"] = df.groupby(level=['Data'])['Qta'].sum()

# Get othe levels time series from 1 to Level_Num
for i in range(1, Liv_Num+1):
 TimeSeries["Level_"+str(i)] = df.groupby(level=Level_Labels[0:i]+['Data'])['Qta'].sum()

Desired result

I would like a loop the cycles my dataset with these actions:

Creates a structure of all the unique node keys

Extracts the node time series grouped by Date and Quantity

Store the time series in a structure for later use

Thanks in advance for any suggestion! Best regards.
FR

asked Mar 28 at 21:22

Federico Rizzello

112 bronze badges

Dataset

The initial dataset is a table that represents the daily sales data of 3 hierarchical levels: city, shop, product. The initial table has this structure:

+============+============+============+============+==========+
| Id_Level_1 | Id_Level_2 | Id_Level_3 | Date | Quantity |
+============+============+============+============+==========+
| Rome | Shop1 | Prod1 | 01/01/2015 | 50 |
+------------+------------+------------+------------+----------+
| Rome | Shop1 | Prod1 | 02/01/2015 | 25 |
+------------+------------+------------+------------+----------+
| Rome | Shop1 | Prod1 | 03/01/2015 | 73 |
+------------+------------+------------+------------+----------+
| Rome | Shop1 | Prod1 | 04/01/2015 | 62 |
+------------+------------+------------+------------+----------+
| ... | ... | ... | ... | ... |
+------------+------------+------------+------------+----------+
| Milan | Shop3 | Prod9 | 31/12/2018 | 185 |
+------------+------------+------------+------------+----------+
| Milan | Shop3 | Prod9 | 31/12/2018 | 147 |
+------------+------------+------------+------------+----------+
| Milan | Shop3 | Prod9 | 31/12/2018 | 206 |
+------------+------------+------------+------------+----------+

Hierarchical Index (MultiIndex)

For each node, the time series is composed by two columns: Date and Quantity.

# MultiIndex dataframe
Liv_Labels = ['Id_Level_1', 'Id_Level_2', 'Id_Level_3', 'Date']
df.set_index(Liv_Labels, drop=False, inplace=True)

The I need to extract the aggregated time series in order but keeping the hierarchical nodes.

Level 0:

Level_0 = df.groupby(level=['Data'])['Qta'].sum()

Level 1:

# Node Level 1 "Rome"
Level_1['Rome'] = df.loc[idx[['Rome'],:,:]].groupby(level=['Data']).sum()

# Node Level 1 "Milan"
Level_1['Milan'] = df.loc[idx[['Milan'],:,:]].groupby(level=['Data']).sum()

Level 2:

# Node Level 2 "Rome, Shop1"
Level_2['Rome',] = df.loc[idx[['Rome'],['Shop1'],:]].groupby(level=['Data']).sum()

... repeat for each level 2 node ...

# Node Level 2 "Milan, Shop9"
Level_2['Milan'] = df.loc[idx[['Milan'],['Shop9'],:]].groupby(level=['Data']).sum()

Attempts

# Get level labels
Level_Labels = ['Id_Liv'+str(n) for n in range(1, Liv_Num+1)]+['Data']

# Initialize dictionary
TimeSeries = 

# Get Level 0 time series
TimeSeries["Level_0"] = df.groupby(level=['Data'])['Qta'].sum()

# Get othe levels time series from 1 to Level_Num
for i in range(1, Liv_Num+1):
 TimeSeries["Level_"+str(i)] = df.groupby(level=Level_Labels[0:i]+['Data'])['Qta'].sum()

Desired result

I would like a loop the cycles my dataset with these actions:

Creates a structure of all the unique node keys

Extracts the node time series grouped by Date and Quantity

Store the time series in a structure for later use

Thanks in advance for any suggestion! Best regards.
FR

python-3.x for-loop pandas-groupby hierarchical-data

asked Mar 28 at 21:22

Federico Rizzello

112 bronze badges

asked Mar 28 at 21:22

Federico Rizzello

112 bronze badges

asked Mar 28 at 21:22

Federico Rizzello

112 bronze badges

asked Mar 28 at 21:22

Federico Rizzello

112 bronze badges

asked Mar 28 at 21:22

Federico Rizzello

112 bronze badges

add a comment
|

1 Answer
1

active

oldest

votes

I'm currently working on a switch dataset that I polled from an sql database where each port on the respective switch has a data frame which has a time series. So to access this time series information for each specific port I represented the switches by their IP addresses and the various number of ports on the switch, and to make sure I don't re-query what I already queried before I used the .unique() method to get unique queries of each.

I set my index to be the IP and Port indices and accessed the port information like so:

def yield_df(df):
for ip in df.index.get_level_values('ip').unique():
 for port in df.loc[ip].index.get_level_values('port').unique():
 yield df.loc[ip].loc[port]

Then I cycled the port data frames with a for loop like so:

for port_df in yield_df(adb_df):

I'm sure there are faster ways to carry out these procedures in pandas but I hope this helps you start solving your problem

answered May 29 at 9:13

Mitch

add a comment
|

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55407047%2fhow-to-cycle-a-pandas-dataframe-grouping-by-hierarchical-multiindex-from-top-to%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I set my index to be the IP and Port indices and accessed the port information like so:

def yield_df(df):
for ip in df.index.get_level_values('ip').unique():
 for port in df.loc[ip].index.get_level_values('port').unique():
 yield df.loc[ip].loc[port]

Then I cycled the port data frames with a for loop like so:

for port_df in yield_df(adb_df):

I'm sure there are faster ways to carry out these procedures in pandas but I hope this helps you start solving your problem

answered May 29 at 9:13

Mitch

add a comment
|

I set my index to be the IP and Port indices and accessed the port information like so:

def yield_df(df):
for ip in df.index.get_level_values('ip').unique():
 for port in df.loc[ip].index.get_level_values('port').unique():
 yield df.loc[ip].loc[port]

Then I cycled the port data frames with a for loop like so:

for port_df in yield_df(adb_df):

I'm sure there are faster ways to carry out these procedures in pandas but I hope this helps you start solving your problem

answered May 29 at 9:13

Mitch

add a comment
|

I set my index to be the IP and Port indices and accessed the port information like so:

def yield_df(df):
for ip in df.index.get_level_values('ip').unique():
 for port in df.loc[ip].index.get_level_values('port').unique():
 yield df.loc[ip].loc[port]

Then I cycled the port data frames with a for loop like so:

for port_df in yield_df(adb_df):

I'm sure there are faster ways to carry out these procedures in pandas but I hope this helps you start solving your problem

answered May 29 at 9:13

Mitch

I set my index to be the IP and Port indices and accessed the port information like so:

def yield_df(df):
for ip in df.index.get_level_values('ip').unique():
 for port in df.loc[ip].index.get_level_values('port').unique():
 yield df.loc[ip].loc[port]

Then I cycled the port data frames with a for loop like so:

for port_df in yield_df(adb_df):

I'm sure there are faster ways to carry out these procedures in pandas but I hope this helps you start solving your problem

answered May 29 at 9:13

Mitch

answered May 29 at 9:13

Mitch

answered May 29 at 9:13

Mitch

answered May 29 at 9:13

Mitch

add a comment
|

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

Dataset

Hierarchical Index (MultiIndex)

Attempts

Desired result

Dataset

Hierarchical Index (MultiIndex)

Attempts

Desired result

Dataset

Hierarchical Index (MultiIndex)

Attempts

Desired result

Dataset

Hierarchical Index (MultiIndex)

Attempts

Desired result

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

Dataset

Hierarchical Index (MultiIndex)

Attempts

Desired result

Dataset

Hierarchical Index (MultiIndex)

Attempts

Desired result

Dataset

Hierarchical Index (MultiIndex)

Attempts

Desired result

Dataset

Hierarchical Index (MultiIndex)

Attempts

Desired result

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer
1

1 Answer
1

1 Answer
1