pandas get mapping of categories to integer valueGet mapping of categorical variables in pandasHow to get the current time in PythonHow do I sort a dictionary by value?Converting integer to string in Python?How to access environment variable values?Renaming columns in pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers

How would you translate "grit" (personality trait) to Chinese?

Why doesn't Iron Man's action affect this person in Endgame?

Why would company (decision makers) wait for someone to retire, rather than lay them off, when their role is no longer needed?

I recently started my machine learning PhD and I have absolutely no idea what I'm doing

How to handle professionally if colleagues has referred his relative and asking to take easy while taking interview

Meaning of "legitimate" in Carl Jung's quote "Neurosis is always a substitute for legitimate suffering."

Were any of the books mentioned in this scene from the movie Hackers real?

Understanding Deutch's Algorithm

How will the lack of ground stations affect navigation?

How much outgoing traffic would a HTTP load balance use?

Why is the Advance Variation considered strong vs the Caro-Kann but not vs the Scandinavian?

What dog breeds survive the apocalypse for generations?

UUID type for NEWID()

c++ conditional uni-directional iterator

What do the "optional" resistor and capacitor do in this circuit?

labelled end points on logic diagram

Why commonly or frequently used fonts sizes are even numbers like 10px, 12px, 16px, 24px, or 32px?

Developers demotivated due to working on same project for more than 2 years

Why when I add jam to my tea it stops producing thin "membrane" on top?

Getting a similar picture (colours) on Manual Mode while using similar Auto Mode settings (T6 and 40D)

Would life always name the light from their sun "white"

Do people who work at research institutes consider themselves "academics"?

How to rename multiple files in a directory at the same time

How does a permutation act on a string?



pandas get mapping of categories to integer value


Get mapping of categorical variables in pandasHow to get the current time in PythonHow do I sort a dictionary by value?Converting integer to string in Python?How to access environment variable values?Renaming columns in pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








5















I can transform categorical columns to their categorical code but how do i get an accurate picture of their mapping? Example:



df_labels = pd.DataFrame('col1':[1,2,3,4,5], 'col2':list('abcab'))
df_labels['col2'] = df_labels['col2'].astype('category')


df_labels looks like this:



 col1 col2
0 1 a
1 2 b
2 3 c
3 4 a
4 5 b


How do i get an accurate mapping of the cat codes to cat categories?
The stackoverflow response below says to enumerate the categories. However, I'm not sure if enumerating was the way cat.codes generated the integer values. Is there a more accurate way?



Get mapping of categorical variables in pandas



>>> dict( enumerate(df.five.cat.categories) )

0: 'bad', 1: 'good'


What is a good way to get the mapping in the above format but accurate?










share|improve this question
























  • FYI, I have since updated my answer (which you linked to) and added some explanation/verification. I believe it is accurate although I'm happy to improve it if you can elaborate about what you think is inaccurate about it.

    – JohnE
    Aug 25 '17 at 18:09


















5















I can transform categorical columns to their categorical code but how do i get an accurate picture of their mapping? Example:



df_labels = pd.DataFrame('col1':[1,2,3,4,5], 'col2':list('abcab'))
df_labels['col2'] = df_labels['col2'].astype('category')


df_labels looks like this:



 col1 col2
0 1 a
1 2 b
2 3 c
3 4 a
4 5 b


How do i get an accurate mapping of the cat codes to cat categories?
The stackoverflow response below says to enumerate the categories. However, I'm not sure if enumerating was the way cat.codes generated the integer values. Is there a more accurate way?



Get mapping of categorical variables in pandas



>>> dict( enumerate(df.five.cat.categories) )

0: 'bad', 1: 'good'


What is a good way to get the mapping in the above format but accurate?










share|improve this question
























  • FYI, I have since updated my answer (which you linked to) and added some explanation/verification. I believe it is accurate although I'm happy to improve it if you can elaborate about what you think is inaccurate about it.

    – JohnE
    Aug 25 '17 at 18:09














5












5








5


1






I can transform categorical columns to their categorical code but how do i get an accurate picture of their mapping? Example:



df_labels = pd.DataFrame('col1':[1,2,3,4,5], 'col2':list('abcab'))
df_labels['col2'] = df_labels['col2'].astype('category')


df_labels looks like this:



 col1 col2
0 1 a
1 2 b
2 3 c
3 4 a
4 5 b


How do i get an accurate mapping of the cat codes to cat categories?
The stackoverflow response below says to enumerate the categories. However, I'm not sure if enumerating was the way cat.codes generated the integer values. Is there a more accurate way?



Get mapping of categorical variables in pandas



>>> dict( enumerate(df.five.cat.categories) )

0: 'bad', 1: 'good'


What is a good way to get the mapping in the above format but accurate?










share|improve this question
















I can transform categorical columns to their categorical code but how do i get an accurate picture of their mapping? Example:



df_labels = pd.DataFrame('col1':[1,2,3,4,5], 'col2':list('abcab'))
df_labels['col2'] = df_labels['col2'].astype('category')


df_labels looks like this:



 col1 col2
0 1 a
1 2 b
2 3 c
3 4 a
4 5 b


How do i get an accurate mapping of the cat codes to cat categories?
The stackoverflow response below says to enumerate the categories. However, I'm not sure if enumerating was the way cat.codes generated the integer values. Is there a more accurate way?



Get mapping of categorical variables in pandas



>>> dict( enumerate(df.five.cat.categories) )

0: 'bad', 1: 'good'


What is a good way to get the mapping in the above format but accurate?







python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 7 at 13:52









JohnE

15.2k73762




15.2k73762










asked Feb 13 '17 at 23:27









jxnjxn

2,1411048104




2,1411048104












  • FYI, I have since updated my answer (which you linked to) and added some explanation/verification. I believe it is accurate although I'm happy to improve it if you can elaborate about what you think is inaccurate about it.

    – JohnE
    Aug 25 '17 at 18:09


















  • FYI, I have since updated my answer (which you linked to) and added some explanation/verification. I believe it is accurate although I'm happy to improve it if you can elaborate about what you think is inaccurate about it.

    – JohnE
    Aug 25 '17 at 18:09

















FYI, I have since updated my answer (which you linked to) and added some explanation/verification. I believe it is accurate although I'm happy to improve it if you can elaborate about what you think is inaccurate about it.

– JohnE
Aug 25 '17 at 18:09






FYI, I have since updated my answer (which you linked to) and added some explanation/verification. I believe it is accurate although I'm happy to improve it if you can elaborate about what you think is inaccurate about it.

– JohnE
Aug 25 '17 at 18:09













4 Answers
4






active

oldest

votes


















2














Edited answer (removed cat.categories and changed list to dict):



>>> dict(zip(df_labels.col2.cat.codes, df_labels.col2))

0: 'a', 1: 'b', 2: 'c'


The original answer which some of the comments are referring to:



>>> list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories))

[(0, 'a'), (1, 'b'), (2, 'c')]


As the comments note, the original answer works in this example because the first three values happend to be [a,b,c], but would fail if they were instead [c,b,a] or [b,c,a].






share|improve this answer




















  • 1





    Yes thanks! needed to put set in the front as i just want the unique mappings: set(list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories)))

    – jxn
    Feb 15 '17 at 0:11






  • 5





    I think this answer only works because of the way col2 is ordered. len(cat.categories) is 3 while len(cat.codes) is 5.

    – pomber
    Jul 9 '17 at 1:22






  • 3





    This is an incorrect answer, because ser.cat.categories will return all the unique values in the category but not the corresponding label of the items in the series.

    – Woods Chen
    Jan 14 at 9:47











  • @JohnE feel free to edit. I cannot delete my answer for it is the accepted one

    – Boud
    Apr 7 at 5:12











  • Thanks, @boud, I edited it (while preserving the original with a note). Please add additional edits as you see fit.

    – JohnE
    Apr 7 at 14:02


















6














I use:



dict([(category, code) for code, category in enumerate(df_labels.col2.cat.categories)])

# 'a': 0, 'b': 1, 'c': 2





share|improve this answer

























  • Note that this is roughly equivalent to the answer rejected by the OP: dict(enumerate(df.five.cat.categories)) except that it switches keys and values from e.g. 0:'a' to 'a':0 which is a minor difference as both keys and values here are unique so the key/value order is in some sense irrelevant and it's also easy enough to reverse. (I think the answer (mine!) rejected by the OP is actually correct so I also think this one is correct too!)

    – JohnE
    Mar 23 at 15:24



















4














If you want to convert each column/ data series from categorical back to original, you just need to reverse what you did in the for loop of the dataframe. There are two methods to do that:



  1. To get back to the original Series or numpy array, use Series.astype(original_dtype) or np.asarray(categorical).


  2. If you have already codes and categories, you can use the from_codes()constructor to save the factorize step during normal constructor mode.


See pandas: Categorical Data




Usage of from_codes



As on official documentation, it makes a Categorical type from codes and categories arrays.



splitter = np.random.choice([0,1], 5, p=[0.5,0.5])
s = pd.Series(pd.Categorical.from_codes(splitter, categories=["train", "test"]))
print splitter
print s


gives



[0 1 1 0 0]
0 train
1 test
2 test
3 train
4 train
dtype: category
Categories (2, object): [train, test]


For your codes



# after your previous conversion
print df['col2']
# apply from_codes, the 2nd argument is the categories from mapping dict
s = pd.Series(pd.Categorical.from_codes(df['col2'], list('abcde')))
print s


gives



0 0
1 1
2 2
3 0
4 1
Name: col2, dtype: int8
0 a
1 b
2 c
3 a
4 b
dtype: category
Categories (5, object): [a, b, c, d, e]





share|improve this answer

























  • There is not much documentation about using from_codes(). Can you show me how i can apply it ?

    – jxn
    Feb 13 '17 at 23:48












  • As updated, hope it helps.

    – Neo X
    Feb 14 '17 at 0:11











  • I see, i just want the unique mapping values though, not the full mapping. For example 0 : 'a', 1 : 'b', 2 : 'c'

    – jxn
    Feb 14 '17 at 0:35











  • Then you can easily construct the map by yourself using codes and categories. Yet you cannot maintain the order by a Python dictionary, use two lists or a list of tuples in @Boud answer instead.

    – Neo X
    Feb 14 '17 at 0:41


















1














OP asks for something "accurate" relative to the answer in the linked question:



dict(enumerate(df_labels.col2.cat.categories))

# 0: 'a', 1: 'b', 2: 'c'


I believe that the above answer is indeed accurate (full disclosure: it is my answer in the other question that I'm defending). Note also that it is roughly equivalent to @pomber's answer, except that the ordering of the keys and values is reversed. (Since both keys and values are unique, the ordering is in some sense irrelevant, and easy enough to reverse as a consequence).



However, the following way is arguably safer, or at least more transparent as to how it works:



dict(zip(df_labels.col2.cat.codes, df_labels.col2))

# 0: 'a', 1: 'b', 2: 'c'


This is similar in spirit to @boud's answer, but corrects an error by replacing df_labels.col2.cat.codes with df_labels.col2. It also replaces list() with dict() which seems more appropriate for a mapping and automatically gets rid of duplicates.



Note that the length of both arguments to zip() is len(df), whereas the length of df_labels.col2.cat.codes is a count of unique values which will generally be much shorter than len(df).



Also note that this method is quite inefficient as it maps 0 to 'a' twice, and similarly for 'b'. In large dataframes the difference in speed could be pretty big. But it won't cause any error because dict() will remove redundancies like this -- it's just that it will be much less efficient than the other method.






share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f42215354%2fpandas-get-mapping-of-categories-to-integer-value%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    Edited answer (removed cat.categories and changed list to dict):



    >>> dict(zip(df_labels.col2.cat.codes, df_labels.col2))

    0: 'a', 1: 'b', 2: 'c'


    The original answer which some of the comments are referring to:



    >>> list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories))

    [(0, 'a'), (1, 'b'), (2, 'c')]


    As the comments note, the original answer works in this example because the first three values happend to be [a,b,c], but would fail if they were instead [c,b,a] or [b,c,a].






    share|improve this answer




















    • 1





      Yes thanks! needed to put set in the front as i just want the unique mappings: set(list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories)))

      – jxn
      Feb 15 '17 at 0:11






    • 5





      I think this answer only works because of the way col2 is ordered. len(cat.categories) is 3 while len(cat.codes) is 5.

      – pomber
      Jul 9 '17 at 1:22






    • 3





      This is an incorrect answer, because ser.cat.categories will return all the unique values in the category but not the corresponding label of the items in the series.

      – Woods Chen
      Jan 14 at 9:47











    • @JohnE feel free to edit. I cannot delete my answer for it is the accepted one

      – Boud
      Apr 7 at 5:12











    • Thanks, @boud, I edited it (while preserving the original with a note). Please add additional edits as you see fit.

      – JohnE
      Apr 7 at 14:02















    2














    Edited answer (removed cat.categories and changed list to dict):



    >>> dict(zip(df_labels.col2.cat.codes, df_labels.col2))

    0: 'a', 1: 'b', 2: 'c'


    The original answer which some of the comments are referring to:



    >>> list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories))

    [(0, 'a'), (1, 'b'), (2, 'c')]


    As the comments note, the original answer works in this example because the first three values happend to be [a,b,c], but would fail if they were instead [c,b,a] or [b,c,a].






    share|improve this answer




















    • 1





      Yes thanks! needed to put set in the front as i just want the unique mappings: set(list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories)))

      – jxn
      Feb 15 '17 at 0:11






    • 5





      I think this answer only works because of the way col2 is ordered. len(cat.categories) is 3 while len(cat.codes) is 5.

      – pomber
      Jul 9 '17 at 1:22






    • 3





      This is an incorrect answer, because ser.cat.categories will return all the unique values in the category but not the corresponding label of the items in the series.

      – Woods Chen
      Jan 14 at 9:47











    • @JohnE feel free to edit. I cannot delete my answer for it is the accepted one

      – Boud
      Apr 7 at 5:12











    • Thanks, @boud, I edited it (while preserving the original with a note). Please add additional edits as you see fit.

      – JohnE
      Apr 7 at 14:02













    2












    2








    2







    Edited answer (removed cat.categories and changed list to dict):



    >>> dict(zip(df_labels.col2.cat.codes, df_labels.col2))

    0: 'a', 1: 'b', 2: 'c'


    The original answer which some of the comments are referring to:



    >>> list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories))

    [(0, 'a'), (1, 'b'), (2, 'c')]


    As the comments note, the original answer works in this example because the first three values happend to be [a,b,c], but would fail if they were instead [c,b,a] or [b,c,a].






    share|improve this answer















    Edited answer (removed cat.categories and changed list to dict):



    >>> dict(zip(df_labels.col2.cat.codes, df_labels.col2))

    0: 'a', 1: 'b', 2: 'c'


    The original answer which some of the comments are referring to:



    >>> list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories))

    [(0, 'a'), (1, 'b'), (2, 'c')]


    As the comments note, the original answer works in this example because the first three values happend to be [a,b,c], but would fail if they were instead [c,b,a] or [b,c,a].







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Apr 8 at 13:28









    JohnE

    15.2k73762




    15.2k73762










    answered Feb 13 '17 at 23:44









    BoudBoud

    19.7k74059




    19.7k74059







    • 1





      Yes thanks! needed to put set in the front as i just want the unique mappings: set(list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories)))

      – jxn
      Feb 15 '17 at 0:11






    • 5





      I think this answer only works because of the way col2 is ordered. len(cat.categories) is 3 while len(cat.codes) is 5.

      – pomber
      Jul 9 '17 at 1:22






    • 3





      This is an incorrect answer, because ser.cat.categories will return all the unique values in the category but not the corresponding label of the items in the series.

      – Woods Chen
      Jan 14 at 9:47











    • @JohnE feel free to edit. I cannot delete my answer for it is the accepted one

      – Boud
      Apr 7 at 5:12











    • Thanks, @boud, I edited it (while preserving the original with a note). Please add additional edits as you see fit.

      – JohnE
      Apr 7 at 14:02












    • 1





      Yes thanks! needed to put set in the front as i just want the unique mappings: set(list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories)))

      – jxn
      Feb 15 '17 at 0:11






    • 5





      I think this answer only works because of the way col2 is ordered. len(cat.categories) is 3 while len(cat.codes) is 5.

      – pomber
      Jul 9 '17 at 1:22






    • 3





      This is an incorrect answer, because ser.cat.categories will return all the unique values in the category but not the corresponding label of the items in the series.

      – Woods Chen
      Jan 14 at 9:47











    • @JohnE feel free to edit. I cannot delete my answer for it is the accepted one

      – Boud
      Apr 7 at 5:12











    • Thanks, @boud, I edited it (while preserving the original with a note). Please add additional edits as you see fit.

      – JohnE
      Apr 7 at 14:02







    1




    1





    Yes thanks! needed to put set in the front as i just want the unique mappings: set(list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories)))

    – jxn
    Feb 15 '17 at 0:11





    Yes thanks! needed to put set in the front as i just want the unique mappings: set(list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories)))

    – jxn
    Feb 15 '17 at 0:11




    5




    5





    I think this answer only works because of the way col2 is ordered. len(cat.categories) is 3 while len(cat.codes) is 5.

    – pomber
    Jul 9 '17 at 1:22





    I think this answer only works because of the way col2 is ordered. len(cat.categories) is 3 while len(cat.codes) is 5.

    – pomber
    Jul 9 '17 at 1:22




    3




    3





    This is an incorrect answer, because ser.cat.categories will return all the unique values in the category but not the corresponding label of the items in the series.

    – Woods Chen
    Jan 14 at 9:47





    This is an incorrect answer, because ser.cat.categories will return all the unique values in the category but not the corresponding label of the items in the series.

    – Woods Chen
    Jan 14 at 9:47













    @JohnE feel free to edit. I cannot delete my answer for it is the accepted one

    – Boud
    Apr 7 at 5:12





    @JohnE feel free to edit. I cannot delete my answer for it is the accepted one

    – Boud
    Apr 7 at 5:12













    Thanks, @boud, I edited it (while preserving the original with a note). Please add additional edits as you see fit.

    – JohnE
    Apr 7 at 14:02





    Thanks, @boud, I edited it (while preserving the original with a note). Please add additional edits as you see fit.

    – JohnE
    Apr 7 at 14:02













    6














    I use:



    dict([(category, code) for code, category in enumerate(df_labels.col2.cat.categories)])

    # 'a': 0, 'b': 1, 'c': 2





    share|improve this answer

























    • Note that this is roughly equivalent to the answer rejected by the OP: dict(enumerate(df.five.cat.categories)) except that it switches keys and values from e.g. 0:'a' to 'a':0 which is a minor difference as both keys and values here are unique so the key/value order is in some sense irrelevant and it's also easy enough to reverse. (I think the answer (mine!) rejected by the OP is actually correct so I also think this one is correct too!)

      – JohnE
      Mar 23 at 15:24
















    6














    I use:



    dict([(category, code) for code, category in enumerate(df_labels.col2.cat.categories)])

    # 'a': 0, 'b': 1, 'c': 2





    share|improve this answer

























    • Note that this is roughly equivalent to the answer rejected by the OP: dict(enumerate(df.five.cat.categories)) except that it switches keys and values from e.g. 0:'a' to 'a':0 which is a minor difference as both keys and values here are unique so the key/value order is in some sense irrelevant and it's also easy enough to reverse. (I think the answer (mine!) rejected by the OP is actually correct so I also think this one is correct too!)

      – JohnE
      Mar 23 at 15:24














    6












    6








    6







    I use:



    dict([(category, code) for code, category in enumerate(df_labels.col2.cat.categories)])

    # 'a': 0, 'b': 1, 'c': 2





    share|improve this answer















    I use:



    dict([(category, code) for code, category in enumerate(df_labels.col2.cat.categories)])

    # 'a': 0, 'b': 1, 'c': 2






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Mar 23 at 15:16









    JohnE

    15.2k73762




    15.2k73762










    answered Jul 9 '17 at 1:23









    pomberpomber

    12.8k85572




    12.8k85572












    • Note that this is roughly equivalent to the answer rejected by the OP: dict(enumerate(df.five.cat.categories)) except that it switches keys and values from e.g. 0:'a' to 'a':0 which is a minor difference as both keys and values here are unique so the key/value order is in some sense irrelevant and it's also easy enough to reverse. (I think the answer (mine!) rejected by the OP is actually correct so I also think this one is correct too!)

      – JohnE
      Mar 23 at 15:24


















    • Note that this is roughly equivalent to the answer rejected by the OP: dict(enumerate(df.five.cat.categories)) except that it switches keys and values from e.g. 0:'a' to 'a':0 which is a minor difference as both keys and values here are unique so the key/value order is in some sense irrelevant and it's also easy enough to reverse. (I think the answer (mine!) rejected by the OP is actually correct so I also think this one is correct too!)

      – JohnE
      Mar 23 at 15:24

















    Note that this is roughly equivalent to the answer rejected by the OP: dict(enumerate(df.five.cat.categories)) except that it switches keys and values from e.g. 0:'a' to 'a':0 which is a minor difference as both keys and values here are unique so the key/value order is in some sense irrelevant and it's also easy enough to reverse. (I think the answer (mine!) rejected by the OP is actually correct so I also think this one is correct too!)

    – JohnE
    Mar 23 at 15:24






    Note that this is roughly equivalent to the answer rejected by the OP: dict(enumerate(df.five.cat.categories)) except that it switches keys and values from e.g. 0:'a' to 'a':0 which is a minor difference as both keys and values here are unique so the key/value order is in some sense irrelevant and it's also easy enough to reverse. (I think the answer (mine!) rejected by the OP is actually correct so I also think this one is correct too!)

    – JohnE
    Mar 23 at 15:24












    4














    If you want to convert each column/ data series from categorical back to original, you just need to reverse what you did in the for loop of the dataframe. There are two methods to do that:



    1. To get back to the original Series or numpy array, use Series.astype(original_dtype) or np.asarray(categorical).


    2. If you have already codes and categories, you can use the from_codes()constructor to save the factorize step during normal constructor mode.


    See pandas: Categorical Data




    Usage of from_codes



    As on official documentation, it makes a Categorical type from codes and categories arrays.



    splitter = np.random.choice([0,1], 5, p=[0.5,0.5])
    s = pd.Series(pd.Categorical.from_codes(splitter, categories=["train", "test"]))
    print splitter
    print s


    gives



    [0 1 1 0 0]
    0 train
    1 test
    2 test
    3 train
    4 train
    dtype: category
    Categories (2, object): [train, test]


    For your codes



    # after your previous conversion
    print df['col2']
    # apply from_codes, the 2nd argument is the categories from mapping dict
    s = pd.Series(pd.Categorical.from_codes(df['col2'], list('abcde')))
    print s


    gives



    0 0
    1 1
    2 2
    3 0
    4 1
    Name: col2, dtype: int8
    0 a
    1 b
    2 c
    3 a
    4 b
    dtype: category
    Categories (5, object): [a, b, c, d, e]





    share|improve this answer

























    • There is not much documentation about using from_codes(). Can you show me how i can apply it ?

      – jxn
      Feb 13 '17 at 23:48












    • As updated, hope it helps.

      – Neo X
      Feb 14 '17 at 0:11











    • I see, i just want the unique mapping values though, not the full mapping. For example 0 : 'a', 1 : 'b', 2 : 'c'

      – jxn
      Feb 14 '17 at 0:35











    • Then you can easily construct the map by yourself using codes and categories. Yet you cannot maintain the order by a Python dictionary, use two lists or a list of tuples in @Boud answer instead.

      – Neo X
      Feb 14 '17 at 0:41















    4














    If you want to convert each column/ data series from categorical back to original, you just need to reverse what you did in the for loop of the dataframe. There are two methods to do that:



    1. To get back to the original Series or numpy array, use Series.astype(original_dtype) or np.asarray(categorical).


    2. If you have already codes and categories, you can use the from_codes()constructor to save the factorize step during normal constructor mode.


    See pandas: Categorical Data




    Usage of from_codes



    As on official documentation, it makes a Categorical type from codes and categories arrays.



    splitter = np.random.choice([0,1], 5, p=[0.5,0.5])
    s = pd.Series(pd.Categorical.from_codes(splitter, categories=["train", "test"]))
    print splitter
    print s


    gives



    [0 1 1 0 0]
    0 train
    1 test
    2 test
    3 train
    4 train
    dtype: category
    Categories (2, object): [train, test]


    For your codes



    # after your previous conversion
    print df['col2']
    # apply from_codes, the 2nd argument is the categories from mapping dict
    s = pd.Series(pd.Categorical.from_codes(df['col2'], list('abcde')))
    print s


    gives



    0 0
    1 1
    2 2
    3 0
    4 1
    Name: col2, dtype: int8
    0 a
    1 b
    2 c
    3 a
    4 b
    dtype: category
    Categories (5, object): [a, b, c, d, e]





    share|improve this answer

























    • There is not much documentation about using from_codes(). Can you show me how i can apply it ?

      – jxn
      Feb 13 '17 at 23:48












    • As updated, hope it helps.

      – Neo X
      Feb 14 '17 at 0:11











    • I see, i just want the unique mapping values though, not the full mapping. For example 0 : 'a', 1 : 'b', 2 : 'c'

      – jxn
      Feb 14 '17 at 0:35











    • Then you can easily construct the map by yourself using codes and categories. Yet you cannot maintain the order by a Python dictionary, use two lists or a list of tuples in @Boud answer instead.

      – Neo X
      Feb 14 '17 at 0:41













    4












    4








    4







    If you want to convert each column/ data series from categorical back to original, you just need to reverse what you did in the for loop of the dataframe. There are two methods to do that:



    1. To get back to the original Series or numpy array, use Series.astype(original_dtype) or np.asarray(categorical).


    2. If you have already codes and categories, you can use the from_codes()constructor to save the factorize step during normal constructor mode.


    See pandas: Categorical Data




    Usage of from_codes



    As on official documentation, it makes a Categorical type from codes and categories arrays.



    splitter = np.random.choice([0,1], 5, p=[0.5,0.5])
    s = pd.Series(pd.Categorical.from_codes(splitter, categories=["train", "test"]))
    print splitter
    print s


    gives



    [0 1 1 0 0]
    0 train
    1 test
    2 test
    3 train
    4 train
    dtype: category
    Categories (2, object): [train, test]


    For your codes



    # after your previous conversion
    print df['col2']
    # apply from_codes, the 2nd argument is the categories from mapping dict
    s = pd.Series(pd.Categorical.from_codes(df['col2'], list('abcde')))
    print s


    gives



    0 0
    1 1
    2 2
    3 0
    4 1
    Name: col2, dtype: int8
    0 a
    1 b
    2 c
    3 a
    4 b
    dtype: category
    Categories (5, object): [a, b, c, d, e]





    share|improve this answer















    If you want to convert each column/ data series from categorical back to original, you just need to reverse what you did in the for loop of the dataframe. There are two methods to do that:



    1. To get back to the original Series or numpy array, use Series.astype(original_dtype) or np.asarray(categorical).


    2. If you have already codes and categories, you can use the from_codes()constructor to save the factorize step during normal constructor mode.


    See pandas: Categorical Data




    Usage of from_codes



    As on official documentation, it makes a Categorical type from codes and categories arrays.



    splitter = np.random.choice([0,1], 5, p=[0.5,0.5])
    s = pd.Series(pd.Categorical.from_codes(splitter, categories=["train", "test"]))
    print splitter
    print s


    gives



    [0 1 1 0 0]
    0 train
    1 test
    2 test
    3 train
    4 train
    dtype: category
    Categories (2, object): [train, test]


    For your codes



    # after your previous conversion
    print df['col2']
    # apply from_codes, the 2nd argument is the categories from mapping dict
    s = pd.Series(pd.Categorical.from_codes(df['col2'], list('abcde')))
    print s


    gives



    0 0
    1 1
    2 2
    3 0
    4 1
    Name: col2, dtype: int8
    0 a
    1 b
    2 c
    3 a
    4 b
    dtype: category
    Categories (5, object): [a, b, c, d, e]






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Feb 14 '17 at 0:11

























    answered Feb 13 '17 at 23:41









    Neo XNeo X

    70759




    70759












    • There is not much documentation about using from_codes(). Can you show me how i can apply it ?

      – jxn
      Feb 13 '17 at 23:48












    • As updated, hope it helps.

      – Neo X
      Feb 14 '17 at 0:11











    • I see, i just want the unique mapping values though, not the full mapping. For example 0 : 'a', 1 : 'b', 2 : 'c'

      – jxn
      Feb 14 '17 at 0:35











    • Then you can easily construct the map by yourself using codes and categories. Yet you cannot maintain the order by a Python dictionary, use two lists or a list of tuples in @Boud answer instead.

      – Neo X
      Feb 14 '17 at 0:41

















    • There is not much documentation about using from_codes(). Can you show me how i can apply it ?

      – jxn
      Feb 13 '17 at 23:48












    • As updated, hope it helps.

      – Neo X
      Feb 14 '17 at 0:11











    • I see, i just want the unique mapping values though, not the full mapping. For example 0 : 'a', 1 : 'b', 2 : 'c'

      – jxn
      Feb 14 '17 at 0:35











    • Then you can easily construct the map by yourself using codes and categories. Yet you cannot maintain the order by a Python dictionary, use two lists or a list of tuples in @Boud answer instead.

      – Neo X
      Feb 14 '17 at 0:41
















    There is not much documentation about using from_codes(). Can you show me how i can apply it ?

    – jxn
    Feb 13 '17 at 23:48






    There is not much documentation about using from_codes(). Can you show me how i can apply it ?

    – jxn
    Feb 13 '17 at 23:48














    As updated, hope it helps.

    – Neo X
    Feb 14 '17 at 0:11





    As updated, hope it helps.

    – Neo X
    Feb 14 '17 at 0:11













    I see, i just want the unique mapping values though, not the full mapping. For example 0 : 'a', 1 : 'b', 2 : 'c'

    – jxn
    Feb 14 '17 at 0:35





    I see, i just want the unique mapping values though, not the full mapping. For example 0 : 'a', 1 : 'b', 2 : 'c'

    – jxn
    Feb 14 '17 at 0:35













    Then you can easily construct the map by yourself using codes and categories. Yet you cannot maintain the order by a Python dictionary, use two lists or a list of tuples in @Boud answer instead.

    – Neo X
    Feb 14 '17 at 0:41





    Then you can easily construct the map by yourself using codes and categories. Yet you cannot maintain the order by a Python dictionary, use two lists or a list of tuples in @Boud answer instead.

    – Neo X
    Feb 14 '17 at 0:41











    1














    OP asks for something "accurate" relative to the answer in the linked question:



    dict(enumerate(df_labels.col2.cat.categories))

    # 0: 'a', 1: 'b', 2: 'c'


    I believe that the above answer is indeed accurate (full disclosure: it is my answer in the other question that I'm defending). Note also that it is roughly equivalent to @pomber's answer, except that the ordering of the keys and values is reversed. (Since both keys and values are unique, the ordering is in some sense irrelevant, and easy enough to reverse as a consequence).



    However, the following way is arguably safer, or at least more transparent as to how it works:



    dict(zip(df_labels.col2.cat.codes, df_labels.col2))

    # 0: 'a', 1: 'b', 2: 'c'


    This is similar in spirit to @boud's answer, but corrects an error by replacing df_labels.col2.cat.codes with df_labels.col2. It also replaces list() with dict() which seems more appropriate for a mapping and automatically gets rid of duplicates.



    Note that the length of both arguments to zip() is len(df), whereas the length of df_labels.col2.cat.codes is a count of unique values which will generally be much shorter than len(df).



    Also note that this method is quite inefficient as it maps 0 to 'a' twice, and similarly for 'b'. In large dataframes the difference in speed could be pretty big. But it won't cause any error because dict() will remove redundancies like this -- it's just that it will be much less efficient than the other method.






    share|improve this answer





























      1














      OP asks for something "accurate" relative to the answer in the linked question:



      dict(enumerate(df_labels.col2.cat.categories))

      # 0: 'a', 1: 'b', 2: 'c'


      I believe that the above answer is indeed accurate (full disclosure: it is my answer in the other question that I'm defending). Note also that it is roughly equivalent to @pomber's answer, except that the ordering of the keys and values is reversed. (Since both keys and values are unique, the ordering is in some sense irrelevant, and easy enough to reverse as a consequence).



      However, the following way is arguably safer, or at least more transparent as to how it works:



      dict(zip(df_labels.col2.cat.codes, df_labels.col2))

      # 0: 'a', 1: 'b', 2: 'c'


      This is similar in spirit to @boud's answer, but corrects an error by replacing df_labels.col2.cat.codes with df_labels.col2. It also replaces list() with dict() which seems more appropriate for a mapping and automatically gets rid of duplicates.



      Note that the length of both arguments to zip() is len(df), whereas the length of df_labels.col2.cat.codes is a count of unique values which will generally be much shorter than len(df).



      Also note that this method is quite inefficient as it maps 0 to 'a' twice, and similarly for 'b'. In large dataframes the difference in speed could be pretty big. But it won't cause any error because dict() will remove redundancies like this -- it's just that it will be much less efficient than the other method.






      share|improve this answer



























        1












        1








        1







        OP asks for something "accurate" relative to the answer in the linked question:



        dict(enumerate(df_labels.col2.cat.categories))

        # 0: 'a', 1: 'b', 2: 'c'


        I believe that the above answer is indeed accurate (full disclosure: it is my answer in the other question that I'm defending). Note also that it is roughly equivalent to @pomber's answer, except that the ordering of the keys and values is reversed. (Since both keys and values are unique, the ordering is in some sense irrelevant, and easy enough to reverse as a consequence).



        However, the following way is arguably safer, or at least more transparent as to how it works:



        dict(zip(df_labels.col2.cat.codes, df_labels.col2))

        # 0: 'a', 1: 'b', 2: 'c'


        This is similar in spirit to @boud's answer, but corrects an error by replacing df_labels.col2.cat.codes with df_labels.col2. It also replaces list() with dict() which seems more appropriate for a mapping and automatically gets rid of duplicates.



        Note that the length of both arguments to zip() is len(df), whereas the length of df_labels.col2.cat.codes is a count of unique values which will generally be much shorter than len(df).



        Also note that this method is quite inefficient as it maps 0 to 'a' twice, and similarly for 'b'. In large dataframes the difference in speed could be pretty big. But it won't cause any error because dict() will remove redundancies like this -- it's just that it will be much less efficient than the other method.






        share|improve this answer















        OP asks for something "accurate" relative to the answer in the linked question:



        dict(enumerate(df_labels.col2.cat.categories))

        # 0: 'a', 1: 'b', 2: 'c'


        I believe that the above answer is indeed accurate (full disclosure: it is my answer in the other question that I'm defending). Note also that it is roughly equivalent to @pomber's answer, except that the ordering of the keys and values is reversed. (Since both keys and values are unique, the ordering is in some sense irrelevant, and easy enough to reverse as a consequence).



        However, the following way is arguably safer, or at least more transparent as to how it works:



        dict(zip(df_labels.col2.cat.codes, df_labels.col2))

        # 0: 'a', 1: 'b', 2: 'c'


        This is similar in spirit to @boud's answer, but corrects an error by replacing df_labels.col2.cat.codes with df_labels.col2. It also replaces list() with dict() which seems more appropriate for a mapping and automatically gets rid of duplicates.



        Note that the length of both arguments to zip() is len(df), whereas the length of df_labels.col2.cat.codes is a count of unique values which will generally be much shorter than len(df).



        Also note that this method is quite inefficient as it maps 0 to 'a' twice, and similarly for 'b'. In large dataframes the difference in speed could be pretty big. But it won't cause any error because dict() will remove redundancies like this -- it's just that it will be much less efficient than the other method.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Mar 23 at 15:32

























        answered Mar 22 at 16:51









        JohnEJohnE

        15.2k73762




        15.2k73762



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f42215354%2fpandas-get-mapping-of-categories-to-integer-value%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

            Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

            Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript