pandas get mapping of categories to integer valueGet mapping of categorical variables in pandasHow to get the current time in PythonHow do I sort a dictionary by value?Converting integer to string in Python?How to access environment variable values?Renaming columns in pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers

How would you translate "grit" (personality trait) to Chinese?

Why doesn't Iron Man's action affect this person in Endgame?

Why would company (decision makers) wait for someone to retire, rather than lay them off, when their role is no longer needed?

I recently started my machine learning PhD and I have absolutely no idea what I'm doing

How to handle professionally if colleagues has referred his relative and asking to take easy while taking interview

Meaning of "legitimate" in Carl Jung's quote "Neurosis is always a substitute for legitimate suffering."

Were any of the books mentioned in this scene from the movie Hackers real?

Understanding Deutch's Algorithm

How will the lack of ground stations affect navigation?

How much outgoing traffic would a HTTP load balance use?

Why is the Advance Variation considered strong vs the Caro-Kann but not vs the Scandinavian?

What dog breeds survive the apocalypse for generations?

UUID type for NEWID()

c++ conditional uni-directional iterator

What do the "optional" resistor and capacitor do in this circuit?

labelled end points on logic diagram

Why commonly or frequently used fonts sizes are even numbers like 10px, 12px, 16px, 24px, or 32px?

Developers demotivated due to working on same project for more than 2 years

Why when I add jam to my tea it stops producing thin "membrane" on top?

Getting a similar picture (colours) on Manual Mode while using similar Auto Mode settings (T6 and 40D)

Would life always name the light from their sun "white"

Do people who work at research institutes consider themselves "academics"?

How to rename multiple files in a directory at the same time

How does a permutation act on a string?

pandas get mapping of categories to integer value

Get mapping of categorical variables in pandasHow to get the current time in PythonHow do I sort a dictionary by value?Converting integer to string in Python?How to access environment variable values?Renaming columns in pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I can transform categorical columns to their categorical code but how do i get an accurate picture of their mapping? Example:

df_labels = pd.DataFrame('col1':[1,2,3,4,5], 'col2':list('abcab'))
df_labels['col2'] = df_labels['col2'].astype('category')

df_labels looks like this:

 col1 col2
0 1 a
1 2 b
2 3 c
3 4 a
4 5 b

How do i get an accurate mapping of the cat codes to cat categories?
The stackoverflow response below says to enumerate the categories. However, I'm not sure if enumerating was the way cat.codes generated the integer values. Is there a more accurate way?

Get mapping of categorical variables in pandas

>>> dict( enumerate(df.five.cat.categories) )

0: 'bad', 1: 'good'

What is a good way to get the mapping in the above format but accurate?

edited Apr 7 at 13:52

JohnE

15.2k73762

asked Feb 13 '17 at 23:27

jxn

2,1411048104

FYI, I have since updated my answer (which you linked to) and added some explanation/verification. I believe it is accurate although I'm happy to improve it if you can elaborate about what you think is inaccurate about it.

– JohnE
Aug 25 '17 at 18:09

add a comment |

I can transform categorical columns to their categorical code but how do i get an accurate picture of their mapping? Example:

df_labels = pd.DataFrame('col1':[1,2,3,4,5], 'col2':list('abcab'))
df_labels['col2'] = df_labels['col2'].astype('category')

df_labels looks like this:

 col1 col2
0 1 a
1 2 b
2 3 c
3 4 a
4 5 b

Get mapping of categorical variables in pandas

>>> dict( enumerate(df.five.cat.categories) )

0: 'bad', 1: 'good'

What is a good way to get the mapping in the above format but accurate?

edited Apr 7 at 13:52

JohnE

15.2k73762

asked Feb 13 '17 at 23:27

jxn

2,1411048104

FYI, I have since updated my answer (which you linked to) and added some explanation/verification. I believe it is accurate although I'm happy to improve it if you can elaborate about what you think is inaccurate about it.

– JohnE
Aug 25 '17 at 18:09

add a comment |

I can transform categorical columns to their categorical code but how do i get an accurate picture of their mapping? Example:

df_labels = pd.DataFrame('col1':[1,2,3,4,5], 'col2':list('abcab'))
df_labels['col2'] = df_labels['col2'].astype('category')

df_labels looks like this:

 col1 col2
0 1 a
1 2 b
2 3 c
3 4 a
4 5 b

Get mapping of categorical variables in pandas

>>> dict( enumerate(df.five.cat.categories) )

0: 'bad', 1: 'good'

What is a good way to get the mapping in the above format but accurate?

edited Apr 7 at 13:52

JohnE

15.2k73762

asked Feb 13 '17 at 23:27

jxn

2,1411048104

I can transform categorical columns to their categorical code but how do i get an accurate picture of their mapping? Example:

df_labels = pd.DataFrame('col1':[1,2,3,4,5], 'col2':list('abcab'))
df_labels['col2'] = df_labels['col2'].astype('category')

df_labels looks like this:

 col1 col2
0 1 a
1 2 b
2 3 c
3 4 a
4 5 b

Get mapping of categorical variables in pandas

>>> dict( enumerate(df.five.cat.categories) )

0: 'bad', 1: 'good'

What is a good way to get the mapping in the above format but accurate?

python pandas

edited Apr 7 at 13:52

JohnE

15.2k73762

asked Feb 13 '17 at 23:27

jxn

2,1411048104

edited Apr 7 at 13:52

JohnE

15.2k73762

asked Feb 13 '17 at 23:27

jxn

2,1411048104

edited Apr 7 at 13:52

JohnE

15.2k73762

edited Apr 7 at 13:52

JohnE

15.2k73762

edited Apr 7 at 13:52

JohnE

15.2k73762

asked Feb 13 '17 at 23:27

jxn

2,1411048104

asked Feb 13 '17 at 23:27

jxn

2,1411048104

asked Feb 13 '17 at 23:27

jxn

2,1411048104

FYI, I have since updated my answer (which you linked to) and added some explanation/verification. I believe it is accurate although I'm happy to improve it if you can elaborate about what you think is inaccurate about it.

– JohnE
Aug 25 '17 at 18:09

add a comment |

FYI, I have since updated my answer (which you linked to) and added some explanation/verification. I believe it is accurate although I'm happy to improve it if you can elaborate about what you think is inaccurate about it.

– JohnE
Aug 25 '17 at 18:09

FYI, I have since updated my answer (which you linked to) and added some explanation/verification. I believe it is accurate although I'm happy to improve it if you can elaborate about what you think is inaccurate about it.

– JohnE
Aug 25 '17 at 18:09

add a comment |

4 Answers
4

active

oldest

votes

Edited answer (removed cat.categories and changed list to dict):

>>> dict(zip(df_labels.col2.cat.codes, df_labels.col2))

0: 'a', 1: 'b', 2: 'c'

The original answer which some of the comments are referring to:

>>> list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories))

[(0, 'a'), (1, 'b'), (2, 'c')]

As the comments note, the original answer works in this example because the first three values happend to be [a,b,c], but would fail if they were instead [c,b,a] or [b,c,a].

edited Apr 8 at 13:28

JohnE

15.2k73762

answered Feb 13 '17 at 23:44

Boud

19.7k74059

1

Yes thanks! needed to put set in the front as i just want the unique mappings: set(list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories)))

– jxn
Feb 15 '17 at 0:11

5

I think this answer only works because of the way col2 is ordered. len(cat.categories) is 3 while len(cat.codes) is 5.

– pomber
Jul 9 '17 at 1:22

3

This is an incorrect answer, because ser.cat.categories will return all the unique values in the category but not the corresponding label of the items in the series.

– Woods Chen
Jan 14 at 9:47

@JohnE feel free to edit. I cannot delete my answer for it is the accepted one

– Boud
Apr 7 at 5:12

Thanks, @boud, I edited it (while preserving the original with a note). Please add additional edits as you see fit.

– JohnE
Apr 7 at 14:02

add a comment |

I use:

dict([(category, code) for code, category in enumerate(df_labels.col2.cat.categories)])

# 'a': 0, 'b': 1, 'c': 2

edited Mar 23 at 15:16

JohnE

15.2k73762

answered Jul 9 '17 at 1:23

pomber

12.8k85572

Note that this is roughly equivalent to the answer rejected by the OP: dict(enumerate(df.five.cat.categories)) except that it switches keys and values from e.g. 0:'a' to 'a':0 which is a minor difference as both keys and values here are unique so the key/value order is in some sense irrelevant and it's also easy enough to reverse. (I think the answer (mine!) rejected by the OP is actually correct so I also think this one is correct too!)

– JohnE
Mar 23 at 15:24

add a comment |

If you want to convert each column/ data series from categorical back to original, you just need to reverse what you did in the for loop of the dataframe. There are two methods to do that:

To get back to the original Series or numpy array, use Series.astype(original_dtype) or np.asarray(categorical).

If you have already codes and categories, you can use the from_codes()constructor to save the factorize step during normal constructor mode.

See pandas: Categorical Data

Usage of from_codes

As on official documentation, it makes a Categorical type from codes and categories arrays.

splitter = np.random.choice([0,1], 5, p=[0.5,0.5])
s = pd.Series(pd.Categorical.from_codes(splitter, categories=["train", "test"]))
print splitter
print s

gives

[0 1 1 0 0]
0 train
1 test
2 test
3 train
4 train
dtype: category
Categories (2, object): [train, test]

For your codes

# after your previous conversion
print df['col2']
# apply from_codes, the 2nd argument is the categories from mapping dict
s = pd.Series(pd.Categorical.from_codes(df['col2'], list('abcde')))
print s

gives

0 0
1 1
2 2
3 0
4 1
Name: col2, dtype: int8
0 a
1 b
2 c
3 a
4 b
dtype: category
Categories (5, object): [a, b, c, d, e]

edited Feb 14 '17 at 0:11

answered Feb 13 '17 at 23:41

Neo X

70759

There is not much documentation about using from_codes(). Can you show me how i can apply it ?

– jxn
Feb 13 '17 at 23:48

As updated, hope it helps.

– Neo X
Feb 14 '17 at 0:11

I see, i just want the unique mapping values though, not the full mapping. For example 0 : 'a', 1 : 'b', 2 : 'c'

– jxn
Feb 14 '17 at 0:35

Then you can easily construct the map by yourself using codes and categories. Yet you cannot maintain the order by a Python dictionary, use two lists or a list of tuples in @Boud answer instead.

– Neo X
Feb 14 '17 at 0:41

add a comment |

OP asks for something "accurate" relative to the answer in the linked question:

dict(enumerate(df_labels.col2.cat.categories))

# 0: 'a', 1: 'b', 2: 'c'

I believe that the above answer is indeed accurate (full disclosure: it is my answer in the other question that I'm defending). Note also that it is roughly equivalent to @pomber's answer, except that the ordering of the keys and values is reversed. (Since both keys and values are unique, the ordering is in some sense irrelevant, and easy enough to reverse as a consequence).

However, the following way is arguably safer, or at least more transparent as to how it works:

dict(zip(df_labels.col2.cat.codes, df_labels.col2))

# 0: 'a', 1: 'b', 2: 'c'

This is similar in spirit to @boud's answer, but corrects an error by replacing df_labels.col2.cat.codes with df_labels.col2. It also replaces list() with dict() which seems more appropriate for a mapping and automatically gets rid of duplicates.

Note that the length of both arguments to zip() is len(df), whereas the length of df_labels.col2.cat.codes is a count of unique values which will generally be much shorter than len(df).

Also note that this method is quite inefficient as it maps 0 to 'a' twice, and similarly for 'b'. In large dataframes the difference in speed could be pretty big. But it won't cause any error because dict() will remove redundancies like this -- it's just that it will be much less efficient than the other method.

edited Mar 23 at 15:32

answered Mar 22 at 16:51

JohnE

15.2k73762

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f42215354%2fpandas-get-mapping-of-categories-to-integer-value%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Edited answer (removed cat.categories and changed list to dict):

>>> dict(zip(df_labels.col2.cat.codes, df_labels.col2))

0: 'a', 1: 'b', 2: 'c'

The original answer which some of the comments are referring to:

>>> list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories))

[(0, 'a'), (1, 'b'), (2, 'c')]

As the comments note, the original answer works in this example because the first three values happend to be [a,b,c], but would fail if they were instead [c,b,a] or [b,c,a].

edited Apr 8 at 13:28

JohnE

15.2k73762

answered Feb 13 '17 at 23:44

Boud

19.7k74059

1

Yes thanks! needed to put set in the front as i just want the unique mappings: set(list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories)))

– jxn
Feb 15 '17 at 0:11

5

I think this answer only works because of the way col2 is ordered. len(cat.categories) is 3 while len(cat.codes) is 5.

– pomber
Jul 9 '17 at 1:22

3

This is an incorrect answer, because ser.cat.categories will return all the unique values in the category but not the corresponding label of the items in the series.

– Woods Chen
Jan 14 at 9:47

@JohnE feel free to edit. I cannot delete my answer for it is the accepted one

– Boud
Apr 7 at 5:12

Thanks, @boud, I edited it (while preserving the original with a note). Please add additional edits as you see fit.

– JohnE
Apr 7 at 14:02

add a comment |

Edited answer (removed cat.categories and changed list to dict):

>>> dict(zip(df_labels.col2.cat.codes, df_labels.col2))

0: 'a', 1: 'b', 2: 'c'

The original answer which some of the comments are referring to:

>>> list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories))

[(0, 'a'), (1, 'b'), (2, 'c')]

As the comments note, the original answer works in this example because the first three values happend to be [a,b,c], but would fail if they were instead [c,b,a] or [b,c,a].

edited Apr 8 at 13:28

JohnE

15.2k73762

answered Feb 13 '17 at 23:44

Boud

19.7k74059

1

Yes thanks! needed to put set in the front as i just want the unique mappings: set(list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories)))

– jxn
Feb 15 '17 at 0:11

5

I think this answer only works because of the way col2 is ordered. len(cat.categories) is 3 while len(cat.codes) is 5.

– pomber
Jul 9 '17 at 1:22

3

This is an incorrect answer, because ser.cat.categories will return all the unique values in the category but not the corresponding label of the items in the series.

– Woods Chen
Jan 14 at 9:47

@JohnE feel free to edit. I cannot delete my answer for it is the accepted one

– Boud
Apr 7 at 5:12

Thanks, @boud, I edited it (while preserving the original with a note). Please add additional edits as you see fit.

– JohnE
Apr 7 at 14:02

add a comment |

Edited answer (removed cat.categories and changed list to dict):

>>> dict(zip(df_labels.col2.cat.codes, df_labels.col2))

0: 'a', 1: 'b', 2: 'c'

The original answer which some of the comments are referring to:

>>> list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories))

[(0, 'a'), (1, 'b'), (2, 'c')]

As the comments note, the original answer works in this example because the first three values happend to be [a,b,c], but would fail if they were instead [c,b,a] or [b,c,a].

edited Apr 8 at 13:28

JohnE

15.2k73762

answered Feb 13 '17 at 23:44

Boud

19.7k74059

Edited answer (removed cat.categories and changed list to dict):

>>> dict(zip(df_labels.col2.cat.codes, df_labels.col2))

0: 'a', 1: 'b', 2: 'c'

The original answer which some of the comments are referring to:

>>> list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories))

[(0, 'a'), (1, 'b'), (2, 'c')]

As the comments note, the original answer works in this example because the first three values happend to be [a,b,c], but would fail if they were instead [c,b,a] or [b,c,a].

edited Apr 8 at 13:28

JohnE

15.2k73762

answered Feb 13 '17 at 23:44

Boud

19.7k74059

edited Apr 8 at 13:28

JohnE

15.2k73762

edited Apr 8 at 13:28

JohnE

15.2k73762

edited Apr 8 at 13:28

JohnE

15.2k73762

answered Feb 13 '17 at 23:44

Boud

19.7k74059

answered Feb 13 '17 at 23:44

Boud

19.7k74059

answered Feb 13 '17 at 23:44

Boud

19.7k74059

1

Yes thanks! needed to put set in the front as i just want the unique mappings: set(list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories)))

– jxn
Feb 15 '17 at 0:11

5

I think this answer only works because of the way col2 is ordered. len(cat.categories) is 3 while len(cat.codes) is 5.

– pomber
Jul 9 '17 at 1:22

3

This is an incorrect answer, because ser.cat.categories will return all the unique values in the category but not the corresponding label of the items in the series.

– Woods Chen
Jan 14 at 9:47

@JohnE feel free to edit. I cannot delete my answer for it is the accepted one

– Boud
Apr 7 at 5:12

Thanks, @boud, I edited it (while preserving the original with a note). Please add additional edits as you see fit.

– JohnE
Apr 7 at 14:02

add a comment |

1

Yes thanks! needed to put set in the front as i just want the unique mappings: set(list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories)))

– jxn
Feb 15 '17 at 0:11

5

I think this answer only works because of the way col2 is ordered. len(cat.categories) is 3 while len(cat.codes) is 5.

– pomber
Jul 9 '17 at 1:22

3

This is an incorrect answer, because ser.cat.categories will return all the unique values in the category but not the corresponding label of the items in the series.

– Woods Chen
Jan 14 at 9:47

@JohnE feel free to edit. I cannot delete my answer for it is the accepted one

– Boud
Apr 7 at 5:12

Thanks, @boud, I edited it (while preserving the original with a note). Please add additional edits as you see fit.

– JohnE
Apr 7 at 14:02

Yes thanks! needed to put set in the front as i just want the unique mappings: set(list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories)))

– jxn
Feb 15 '17 at 0:11

I think this answer only works because of the way col2 is ordered. len(cat.categories) is 3 while len(cat.codes) is 5.

– pomber
Jul 9 '17 at 1:22

This is an incorrect answer, because ser.cat.categories will return all the unique values in the category but not the corresponding label of the items in the series.

– Woods Chen
Jan 14 at 9:47

@JohnE feel free to edit. I cannot delete my answer for it is the accepted one

– Boud
Apr 7 at 5:12

Thanks, @boud, I edited it (while preserving the original with a note). Please add additional edits as you see fit.

– JohnE
Apr 7 at 14:02

add a comment |

I use:

dict([(category, code) for code, category in enumerate(df_labels.col2.cat.categories)])

# 'a': 0, 'b': 1, 'c': 2

edited Mar 23 at 15:16

JohnE

15.2k73762

answered Jul 9 '17 at 1:23

pomber

12.8k85572

Note that this is roughly equivalent to the answer rejected by the OP: dict(enumerate(df.five.cat.categories)) except that it switches keys and values from e.g. 0:'a' to 'a':0 which is a minor difference as both keys and values here are unique so the key/value order is in some sense irrelevant and it's also easy enough to reverse. (I think the answer (mine!) rejected by the OP is actually correct so I also think this one is correct too!)

– JohnE
Mar 23 at 15:24

add a comment |

I use:

dict([(category, code) for code, category in enumerate(df_labels.col2.cat.categories)])

# 'a': 0, 'b': 1, 'c': 2

edited Mar 23 at 15:16

JohnE

15.2k73762

answered Jul 9 '17 at 1:23

pomber

12.8k85572

Note that this is roughly equivalent to the answer rejected by the OP: dict(enumerate(df.five.cat.categories)) except that it switches keys and values from e.g. 0:'a' to 'a':0 which is a minor difference as both keys and values here are unique so the key/value order is in some sense irrelevant and it's also easy enough to reverse. (I think the answer (mine!) rejected by the OP is actually correct so I also think this one is correct too!)

– JohnE
Mar 23 at 15:24

add a comment |

I use:

dict([(category, code) for code, category in enumerate(df_labels.col2.cat.categories)])

# 'a': 0, 'b': 1, 'c': 2

edited Mar 23 at 15:16

JohnE

15.2k73762

answered Jul 9 '17 at 1:23

pomber

12.8k85572

I use:

dict([(category, code) for code, category in enumerate(df_labels.col2.cat.categories)])

# 'a': 0, 'b': 1, 'c': 2

edited Mar 23 at 15:16

JohnE

15.2k73762

answered Jul 9 '17 at 1:23

pomber

12.8k85572

edited Mar 23 at 15:16

JohnE

15.2k73762

edited Mar 23 at 15:16

JohnE

15.2k73762

edited Mar 23 at 15:16

JohnE

15.2k73762

answered Jul 9 '17 at 1:23

pomber

12.8k85572

answered Jul 9 '17 at 1:23

pomber

12.8k85572

answered Jul 9 '17 at 1:23

pomber

12.8k85572

Note that this is roughly equivalent to the answer rejected by the OP: dict(enumerate(df.five.cat.categories)) except that it switches keys and values from e.g. 0:'a' to 'a':0 which is a minor difference as both keys and values here are unique so the key/value order is in some sense irrelevant and it's also easy enough to reverse. (I think the answer (mine!) rejected by the OP is actually correct so I also think this one is correct too!)

– JohnE
Mar 23 at 15:24

add a comment |

Note that this is roughly equivalent to the answer rejected by the OP: dict(enumerate(df.five.cat.categories)) except that it switches keys and values from e.g. 0:'a' to 'a':0 which is a minor difference as both keys and values here are unique so the key/value order is in some sense irrelevant and it's also easy enough to reverse. (I think the answer (mine!) rejected by the OP is actually correct so I also think this one is correct too!)

– JohnE
Mar 23 at 15:24

Note that this is roughly equivalent to the answer rejected by the OP: dict(enumerate(df.five.cat.categories)) except that it switches keys and values from e.g. 0:'a' to 'a':0 which is a minor difference as both keys and values here are unique so the key/value order is in some sense irrelevant and it's also easy enough to reverse. (I think the answer (mine!) rejected by the OP is actually correct so I also think this one is correct too!)

– JohnE
Mar 23 at 15:24

add a comment |

If you want to convert each column/ data series from categorical back to original, you just need to reverse what you did in the for loop of the dataframe. There are two methods to do that:

To get back to the original Series or numpy array, use Series.astype(original_dtype) or np.asarray(categorical).

If you have already codes and categories, you can use the from_codes()constructor to save the factorize step during normal constructor mode.

See pandas: Categorical Data

Usage of from_codes

As on official documentation, it makes a Categorical type from codes and categories arrays.

splitter = np.random.choice([0,1], 5, p=[0.5,0.5])
s = pd.Series(pd.Categorical.from_codes(splitter, categories=["train", "test"]))
print splitter
print s

gives

[0 1 1 0 0]
0 train
1 test
2 test
3 train
4 train
dtype: category
Categories (2, object): [train, test]

For your codes

# after your previous conversion
print df['col2']
# apply from_codes, the 2nd argument is the categories from mapping dict
s = pd.Series(pd.Categorical.from_codes(df['col2'], list('abcde')))
print s

gives

0 0
1 1
2 2
3 0
4 1
Name: col2, dtype: int8
0 a
1 b
2 c
3 a
4 b
dtype: category
Categories (5, object): [a, b, c, d, e]

edited Feb 14 '17 at 0:11

answered Feb 13 '17 at 23:41

Neo X

70759

There is not much documentation about using from_codes(). Can you show me how i can apply it ?

– jxn
Feb 13 '17 at 23:48

As updated, hope it helps.

– Neo X
Feb 14 '17 at 0:11

I see, i just want the unique mapping values though, not the full mapping. For example 0 : 'a', 1 : 'b', 2 : 'c'

– jxn
Feb 14 '17 at 0:35

Then you can easily construct the map by yourself using codes and categories. Yet you cannot maintain the order by a Python dictionary, use two lists or a list of tuples in @Boud answer instead.

– Neo X
Feb 14 '17 at 0:41

add a comment |

If you want to convert each column/ data series from categorical back to original, you just need to reverse what you did in the for loop of the dataframe. There are two methods to do that:

To get back to the original Series or numpy array, use Series.astype(original_dtype) or np.asarray(categorical).

If you have already codes and categories, you can use the from_codes()constructor to save the factorize step during normal constructor mode.

See pandas: Categorical Data

Usage of from_codes

As on official documentation, it makes a Categorical type from codes and categories arrays.

splitter = np.random.choice([0,1], 5, p=[0.5,0.5])
s = pd.Series(pd.Categorical.from_codes(splitter, categories=["train", "test"]))
print splitter
print s

gives

[0 1 1 0 0]
0 train
1 test
2 test
3 train
4 train
dtype: category
Categories (2, object): [train, test]

For your codes

# after your previous conversion
print df['col2']
# apply from_codes, the 2nd argument is the categories from mapping dict
s = pd.Series(pd.Categorical.from_codes(df['col2'], list('abcde')))
print s

gives

0 0
1 1
2 2
3 0
4 1
Name: col2, dtype: int8
0 a
1 b
2 c
3 a
4 b
dtype: category
Categories (5, object): [a, b, c, d, e]

edited Feb 14 '17 at 0:11

answered Feb 13 '17 at 23:41

Neo X

70759

There is not much documentation about using from_codes(). Can you show me how i can apply it ?

– jxn
Feb 13 '17 at 23:48

As updated, hope it helps.

– Neo X
Feb 14 '17 at 0:11

I see, i just want the unique mapping values though, not the full mapping. For example 0 : 'a', 1 : 'b', 2 : 'c'

– jxn
Feb 14 '17 at 0:35

Then you can easily construct the map by yourself using codes and categories. Yet you cannot maintain the order by a Python dictionary, use two lists or a list of tuples in @Boud answer instead.

– Neo X
Feb 14 '17 at 0:41

add a comment |

If you want to convert each column/ data series from categorical back to original, you just need to reverse what you did in the for loop of the dataframe. There are two methods to do that:

To get back to the original Series or numpy array, use Series.astype(original_dtype) or np.asarray(categorical).

If you have already codes and categories, you can use the from_codes()constructor to save the factorize step during normal constructor mode.

See pandas: Categorical Data

Usage of from_codes

As on official documentation, it makes a Categorical type from codes and categories arrays.

splitter = np.random.choice([0,1], 5, p=[0.5,0.5])
s = pd.Series(pd.Categorical.from_codes(splitter, categories=["train", "test"]))
print splitter
print s

gives

[0 1 1 0 0]
0 train
1 test
2 test
3 train
4 train
dtype: category
Categories (2, object): [train, test]

For your codes

# after your previous conversion
print df['col2']
# apply from_codes, the 2nd argument is the categories from mapping dict
s = pd.Series(pd.Categorical.from_codes(df['col2'], list('abcde')))
print s

gives

0 0
1 1
2 2
3 0
4 1
Name: col2, dtype: int8
0 a
1 b
2 c
3 a
4 b
dtype: category
Categories (5, object): [a, b, c, d, e]

edited Feb 14 '17 at 0:11

answered Feb 13 '17 at 23:41

Neo X

70759

If you want to convert each column/ data series from categorical back to original, you just need to reverse what you did in the for loop of the dataframe. There are two methods to do that:

To get back to the original Series or numpy array, use Series.astype(original_dtype) or np.asarray(categorical).

If you have already codes and categories, you can use the from_codes()constructor to save the factorize step during normal constructor mode.

See pandas: Categorical Data

Usage of from_codes

As on official documentation, it makes a Categorical type from codes and categories arrays.

splitter = np.random.choice([0,1], 5, p=[0.5,0.5])
s = pd.Series(pd.Categorical.from_codes(splitter, categories=["train", "test"]))
print splitter
print s

gives

[0 1 1 0 0]
0 train
1 test
2 test
3 train
4 train
dtype: category
Categories (2, object): [train, test]

For your codes

# after your previous conversion
print df['col2']
# apply from_codes, the 2nd argument is the categories from mapping dict
s = pd.Series(pd.Categorical.from_codes(df['col2'], list('abcde')))
print s

gives

0 0
1 1
2 2
3 0
4 1
Name: col2, dtype: int8
0 a
1 b
2 c
3 a
4 b
dtype: category
Categories (5, object): [a, b, c, d, e]

edited Feb 14 '17 at 0:11

answered Feb 13 '17 at 23:41

Neo X

70759

edited Feb 14 '17 at 0:11

answered Feb 13 '17 at 23:41

Neo X

70759

answered Feb 13 '17 at 23:41

Neo X

70759

answered Feb 13 '17 at 23:41

Neo X

70759

There is not much documentation about using from_codes(). Can you show me how i can apply it ?

– jxn
Feb 13 '17 at 23:48

As updated, hope it helps.

– Neo X
Feb 14 '17 at 0:11

I see, i just want the unique mapping values though, not the full mapping. For example 0 : 'a', 1 : 'b', 2 : 'c'

– jxn
Feb 14 '17 at 0:35

Then you can easily construct the map by yourself using codes and categories. Yet you cannot maintain the order by a Python dictionary, use two lists or a list of tuples in @Boud answer instead.

– Neo X
Feb 14 '17 at 0:41

add a comment |

There is not much documentation about using from_codes(). Can you show me how i can apply it ?

– jxn
Feb 13 '17 at 23:48

As updated, hope it helps.

– Neo X
Feb 14 '17 at 0:11

I see, i just want the unique mapping values though, not the full mapping. For example 0 : 'a', 1 : 'b', 2 : 'c'

– jxn
Feb 14 '17 at 0:35

Then you can easily construct the map by yourself using codes and categories. Yet you cannot maintain the order by a Python dictionary, use two lists or a list of tuples in @Boud answer instead.

– Neo X
Feb 14 '17 at 0:41

There is not much documentation about using from_codes(). Can you show me how i can apply it ?

– jxn
Feb 13 '17 at 23:48

As updated, hope it helps.

– Neo X
Feb 14 '17 at 0:11

I see, i just want the unique mapping values though, not the full mapping. For example 0 : 'a', 1 : 'b', 2 : 'c'

– jxn
Feb 14 '17 at 0:35

Then you can easily construct the map by yourself using codes and categories. Yet you cannot maintain the order by a Python dictionary, use two lists or a list of tuples in @Boud answer instead.

– Neo X
Feb 14 '17 at 0:41

add a comment |

OP asks for something "accurate" relative to the answer in the linked question:

dict(enumerate(df_labels.col2.cat.categories))

# 0: 'a', 1: 'b', 2: 'c'

However, the following way is arguably safer, or at least more transparent as to how it works:

dict(zip(df_labels.col2.cat.codes, df_labels.col2))

# 0: 'a', 1: 'b', 2: 'c'

Note that the length of both arguments to zip() is len(df), whereas the length of df_labels.col2.cat.codes is a count of unique values which will generally be much shorter than len(df).

edited Mar 23 at 15:32

answered Mar 22 at 16:51

JohnE

15.2k73762

add a comment |

OP asks for something "accurate" relative to the answer in the linked question:

dict(enumerate(df_labels.col2.cat.categories))

# 0: 'a', 1: 'b', 2: 'c'

However, the following way is arguably safer, or at least more transparent as to how it works:

dict(zip(df_labels.col2.cat.codes, df_labels.col2))

# 0: 'a', 1: 'b', 2: 'c'

Note that the length of both arguments to zip() is len(df), whereas the length of df_labels.col2.cat.codes is a count of unique values which will generally be much shorter than len(df).

edited Mar 23 at 15:32

answered Mar 22 at 16:51

JohnE

15.2k73762

add a comment |

OP asks for something "accurate" relative to the answer in the linked question:

dict(enumerate(df_labels.col2.cat.categories))

# 0: 'a', 1: 'b', 2: 'c'

However, the following way is arguably safer, or at least more transparent as to how it works:

dict(zip(df_labels.col2.cat.codes, df_labels.col2))

# 0: 'a', 1: 'b', 2: 'c'

Note that the length of both arguments to zip() is len(df), whereas the length of df_labels.col2.cat.codes is a count of unique values which will generally be much shorter than len(df).

edited Mar 23 at 15:32

answered Mar 22 at 16:51

JohnE

15.2k73762

OP asks for something "accurate" relative to the answer in the linked question:

dict(enumerate(df_labels.col2.cat.categories))

# 0: 'a', 1: 'b', 2: 'c'

However, the following way is arguably safer, or at least more transparent as to how it works:

dict(zip(df_labels.col2.cat.codes, df_labels.col2))

# 0: 'a', 1: 'b', 2: 'c'

Note that the length of both arguments to zip() is len(df), whereas the length of df_labels.col2.cat.codes is a count of unique values which will generally be much shorter than len(df).

edited Mar 23 at 15:32

answered Mar 22 at 16:51

JohnE

15.2k73762

edited Mar 23 at 15:32

answered Mar 22 at 16:51

JohnE

15.2k73762

answered Mar 22 at 16:51

JohnE

15.2k73762

answered Mar 22 at 16:51

JohnE

15.2k73762

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

4 Answers
4

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

4 Answers
4

4 Answers
4

4 Answers
4