One hot encoding using sklearn preprocessing Label BinarizerOne hot encoder confusionLabel encoding across multiple columns in scikit-learnWhy can't I one-hot encode my labels with TensorFlow? (Bad slice index None of type <type 'NoneType'>)Is there any way to visualize decision tree (sklearn) with categorical features consolidated from one hot encoded features?One hot encoding and its combination with DecisionTreeClassifierDo scikit-learn classifiers automatically one-hot encode?tensorflow TFRecord k-hot encodingsklearn - How to generate proper labels with multiple valuesOne-hot-encoding multiple columns in sklearn and naming columns

Contact Search Results Address Type

Is it ethical to tell my teaching assistant that I like him?

What is "ass door"?

What is the best word describing the nature of expiring in a short amount of time, connoting "losing public attention"?

Was US film used in Luna 3?

Are there any English words pronounced with sounds/syllables that aren't part of the spelling?

Strange LED behavior: Why is there a voltage over the LED with only one wire connected to it?

Host telling me to cancel my booking in exchange for a discount?

How to run a substitute command on only a certain part of the line

Wiring IKEA light fixture into old fixture

Why do people say "I am broke" instead of "I am broken"?

Where can I find maps and other historical resources / references of Calcutta / Kolkata in the Victorian era?

What is a plausible power source to indefinitely sustain a space station?

"It is what it is" in French

Would using carbon dioxide as fuel work to reduce the greenhouse effect?

List of Casimir elements of low dimensional Lie algebras

Is it better to merge "often" or only after completion do a big merge of feature branches?

An Italian table, is it in fact Arabic?

How often should alkaline batteries be checked when they are in a device?

Adding gears to my grandson's 12" bike

What is the significance of numbers(2,3) mentioned in SOT23?

German phrase for 'suited and booted'

Storyboard broken after updating Xcode to version 10.3 (10G8) & app no longer is running

Is there a way to shorten this while condition?



One hot encoding using sklearn preprocessing Label Binarizer


One hot encoder confusionLabel encoding across multiple columns in scikit-learnWhy can't I one-hot encode my labels with TensorFlow? (Bad slice index None of type <type 'NoneType'>)Is there any way to visualize decision tree (sklearn) with categorical features consolidated from one hot encoded features?One hot encoding and its combination with DecisionTreeClassifierDo scikit-learn classifiers automatically one-hot encode?tensorflow TFRecord k-hot encodingsklearn - How to generate proper labels with multiple valuesOne-hot-encoding multiple columns in sklearn and naming columns






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








2















I am trying to use sklearn.preprocessing.LabelBinarizer() to create a one hot encoding of only a two-column labels, i.e. I only want to categorize two set of objects. In this case, when I use fit(range(0,2)), it just returns a one dimensional array, instead of 2x1. This is fine, but when I want to use them in Tensorflow, the shape should really be (2,1) for dimensional consistency. Please advise how I can resolve it.



Here is the code:



from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
lb.fit(range(0, 3))


Calling lb.transform([1, 0]), the result is:



[[0 1 0]
[1 0 0]]


whereas when we change 3 to 2, i.e. lb.fit(range(0, 2)), the result would be



[[1]
[0]]


instead of



[[0 1]
[1 0]]


This will create problems in the algorithms that work consistently with arrays with n dimensions. Is it any way to resolve this issue?










share|improve this question
























  • Can you explain which method you call to get the result? lb.fit() does not return anything,

    – Eskapp
    Mar 26 at 14:19






  • 1





    Sorry to miss to include it. Here is the code: print(lb.transform([1, 0]))

    – HamidReza Mirkhani
    Mar 26 at 14:53












  • First thing, this is not an issue of the method. According to the documentation: Binary targets transform to a column vector (scikit-learn.org/stable/modules/generated/…) You can build the array you want from the colomn vector result, in the case the dimension is 2. I'll try to write an answer if this is unclear.

    – Eskapp
    Mar 26 at 15:50











  • Thanks, I wouldn't call it as an issue of the method too, however, to me, a better implementation would allow developers to control the output types to make them consistent. As you highlighted, I have to write another customized method just in case n=2 for example.

    – HamidReza Mirkhani
    Mar 26 at 16:01

















2















I am trying to use sklearn.preprocessing.LabelBinarizer() to create a one hot encoding of only a two-column labels, i.e. I only want to categorize two set of objects. In this case, when I use fit(range(0,2)), it just returns a one dimensional array, instead of 2x1. This is fine, but when I want to use them in Tensorflow, the shape should really be (2,1) for dimensional consistency. Please advise how I can resolve it.



Here is the code:



from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
lb.fit(range(0, 3))


Calling lb.transform([1, 0]), the result is:



[[0 1 0]
[1 0 0]]


whereas when we change 3 to 2, i.e. lb.fit(range(0, 2)), the result would be



[[1]
[0]]


instead of



[[0 1]
[1 0]]


This will create problems in the algorithms that work consistently with arrays with n dimensions. Is it any way to resolve this issue?










share|improve this question
























  • Can you explain which method you call to get the result? lb.fit() does not return anything,

    – Eskapp
    Mar 26 at 14:19






  • 1





    Sorry to miss to include it. Here is the code: print(lb.transform([1, 0]))

    – HamidReza Mirkhani
    Mar 26 at 14:53












  • First thing, this is not an issue of the method. According to the documentation: Binary targets transform to a column vector (scikit-learn.org/stable/modules/generated/…) You can build the array you want from the colomn vector result, in the case the dimension is 2. I'll try to write an answer if this is unclear.

    – Eskapp
    Mar 26 at 15:50











  • Thanks, I wouldn't call it as an issue of the method too, however, to me, a better implementation would allow developers to control the output types to make them consistent. As you highlighted, I have to write another customized method just in case n=2 for example.

    – HamidReza Mirkhani
    Mar 26 at 16:01













2












2








2








I am trying to use sklearn.preprocessing.LabelBinarizer() to create a one hot encoding of only a two-column labels, i.e. I only want to categorize two set of objects. In this case, when I use fit(range(0,2)), it just returns a one dimensional array, instead of 2x1. This is fine, but when I want to use them in Tensorflow, the shape should really be (2,1) for dimensional consistency. Please advise how I can resolve it.



Here is the code:



from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
lb.fit(range(0, 3))


Calling lb.transform([1, 0]), the result is:



[[0 1 0]
[1 0 0]]


whereas when we change 3 to 2, i.e. lb.fit(range(0, 2)), the result would be



[[1]
[0]]


instead of



[[0 1]
[1 0]]


This will create problems in the algorithms that work consistently with arrays with n dimensions. Is it any way to resolve this issue?










share|improve this question
















I am trying to use sklearn.preprocessing.LabelBinarizer() to create a one hot encoding of only a two-column labels, i.e. I only want to categorize two set of objects. In this case, when I use fit(range(0,2)), it just returns a one dimensional array, instead of 2x1. This is fine, but when I want to use them in Tensorflow, the shape should really be (2,1) for dimensional consistency. Please advise how I can resolve it.



Here is the code:



from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
lb.fit(range(0, 3))


Calling lb.transform([1, 0]), the result is:



[[0 1 0]
[1 0 0]]


whereas when we change 3 to 2, i.e. lb.fit(range(0, 2)), the result would be



[[1]
[0]]


instead of



[[0 1]
[1 0]]


This will create problems in the algorithms that work consistently with arrays with n dimensions. Is it any way to resolve this issue?







python scikit-learn






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 26 at 18:29









Eskapp

1,91514 silver badges27 bronze badges




1,91514 silver badges27 bronze badges










asked Mar 26 at 14:10









HamidReza MirkhaniHamidReza Mirkhani

305 bronze badges




305 bronze badges












  • Can you explain which method you call to get the result? lb.fit() does not return anything,

    – Eskapp
    Mar 26 at 14:19






  • 1





    Sorry to miss to include it. Here is the code: print(lb.transform([1, 0]))

    – HamidReza Mirkhani
    Mar 26 at 14:53












  • First thing, this is not an issue of the method. According to the documentation: Binary targets transform to a column vector (scikit-learn.org/stable/modules/generated/…) You can build the array you want from the colomn vector result, in the case the dimension is 2. I'll try to write an answer if this is unclear.

    – Eskapp
    Mar 26 at 15:50











  • Thanks, I wouldn't call it as an issue of the method too, however, to me, a better implementation would allow developers to control the output types to make them consistent. As you highlighted, I have to write another customized method just in case n=2 for example.

    – HamidReza Mirkhani
    Mar 26 at 16:01

















  • Can you explain which method you call to get the result? lb.fit() does not return anything,

    – Eskapp
    Mar 26 at 14:19






  • 1





    Sorry to miss to include it. Here is the code: print(lb.transform([1, 0]))

    – HamidReza Mirkhani
    Mar 26 at 14:53












  • First thing, this is not an issue of the method. According to the documentation: Binary targets transform to a column vector (scikit-learn.org/stable/modules/generated/…) You can build the array you want from the colomn vector result, in the case the dimension is 2. I'll try to write an answer if this is unclear.

    – Eskapp
    Mar 26 at 15:50











  • Thanks, I wouldn't call it as an issue of the method too, however, to me, a better implementation would allow developers to control the output types to make them consistent. As you highlighted, I have to write another customized method just in case n=2 for example.

    – HamidReza Mirkhani
    Mar 26 at 16:01
















Can you explain which method you call to get the result? lb.fit() does not return anything,

– Eskapp
Mar 26 at 14:19





Can you explain which method you call to get the result? lb.fit() does not return anything,

– Eskapp
Mar 26 at 14:19




1




1





Sorry to miss to include it. Here is the code: print(lb.transform([1, 0]))

– HamidReza Mirkhani
Mar 26 at 14:53






Sorry to miss to include it. Here is the code: print(lb.transform([1, 0]))

– HamidReza Mirkhani
Mar 26 at 14:53














First thing, this is not an issue of the method. According to the documentation: Binary targets transform to a column vector (scikit-learn.org/stable/modules/generated/…) You can build the array you want from the colomn vector result, in the case the dimension is 2. I'll try to write an answer if this is unclear.

– Eskapp
Mar 26 at 15:50





First thing, this is not an issue of the method. According to the documentation: Binary targets transform to a column vector (scikit-learn.org/stable/modules/generated/…) You can build the array you want from the colomn vector result, in the case the dimension is 2. I'll try to write an answer if this is unclear.

– Eskapp
Mar 26 at 15:50













Thanks, I wouldn't call it as an issue of the method too, however, to me, a better implementation would allow developers to control the output types to make them consistent. As you highlighted, I have to write another customized method just in case n=2 for example.

– HamidReza Mirkhani
Mar 26 at 16:01





Thanks, I wouldn't call it as an issue of the method too, however, to me, a better implementation would allow developers to control the output types to make them consistent. As you highlighted, I have to write another customized method just in case n=2 for example.

– HamidReza Mirkhani
Mar 26 at 16:01












2 Answers
2






active

oldest

votes


















0














labelBinarizer()'s purpose according to the documentation is




Binarize labels in a one-vs-all fashion



Several regression and binary classification algorithms are available in scikit-learn.
A simple way to extend these algorithms to the multi-class classification case is to use > the so-called one-vs-all scheme.




If your data has only two types of labels, then you can directly feed that to binary classifier. Hence, one column is good enough to capture two classes in One-Vs-Rest fashion.



Binary targets transform to a column vector



>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit_transform(['yes', 'no', 'no', 'yes'])
array([[1],
[0],
[0],
[1]])


If your intention is just creating one-hot encoding, use the following method.



from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder()
>>> enc.fit_transform([['yes'], ['no'], ['no'], ['yes']]).toarray()
array([[0., 1.],
[1., 0.],
[1., 0.],
[0., 1.]])


Hope this clarifies, your question of why Sklearn labelBinarizer() does not convert the 2 class data into two column output.






share|improve this answer






























    1














    As already said as a comment, this is not an issue of the method. According to the documentation: Binary targets transform to a column vector. You can build the array you want from the colomn vector result, in the case the dimension is 2.



    A direct and simple way to do this is:



    from sklearn import preprocessing
    lb = preprocessing.LabelBinarizer()
    lb.fit(range(2) # range(0, 2) is the same as range(2)
    a = lb.transform([1, 0])
    result_2d = np.array([[item[0], 0 if item[0] else 1] for item in a])





    share|improve this answer

























      Your Answer






      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "1"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55359235%2fone-hot-encoding-using-sklearn-preprocessing-label-binarizer%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0














      labelBinarizer()'s purpose according to the documentation is




      Binarize labels in a one-vs-all fashion



      Several regression and binary classification algorithms are available in scikit-learn.
      A simple way to extend these algorithms to the multi-class classification case is to use > the so-called one-vs-all scheme.




      If your data has only two types of labels, then you can directly feed that to binary classifier. Hence, one column is good enough to capture two classes in One-Vs-Rest fashion.



      Binary targets transform to a column vector



      >>> lb = preprocessing.LabelBinarizer()
      >>> lb.fit_transform(['yes', 'no', 'no', 'yes'])
      array([[1],
      [0],
      [0],
      [1]])


      If your intention is just creating one-hot encoding, use the following method.



      from sklearn.preprocessing import OneHotEncoder
      >>> enc = OneHotEncoder()
      >>> enc.fit_transform([['yes'], ['no'], ['no'], ['yes']]).toarray()
      array([[0., 1.],
      [1., 0.],
      [1., 0.],
      [0., 1.]])


      Hope this clarifies, your question of why Sklearn labelBinarizer() does not convert the 2 class data into two column output.






      share|improve this answer



























        0














        labelBinarizer()'s purpose according to the documentation is




        Binarize labels in a one-vs-all fashion



        Several regression and binary classification algorithms are available in scikit-learn.
        A simple way to extend these algorithms to the multi-class classification case is to use > the so-called one-vs-all scheme.




        If your data has only two types of labels, then you can directly feed that to binary classifier. Hence, one column is good enough to capture two classes in One-Vs-Rest fashion.



        Binary targets transform to a column vector



        >>> lb = preprocessing.LabelBinarizer()
        >>> lb.fit_transform(['yes', 'no', 'no', 'yes'])
        array([[1],
        [0],
        [0],
        [1]])


        If your intention is just creating one-hot encoding, use the following method.



        from sklearn.preprocessing import OneHotEncoder
        >>> enc = OneHotEncoder()
        >>> enc.fit_transform([['yes'], ['no'], ['no'], ['yes']]).toarray()
        array([[0., 1.],
        [1., 0.],
        [1., 0.],
        [0., 1.]])


        Hope this clarifies, your question of why Sklearn labelBinarizer() does not convert the 2 class data into two column output.






        share|improve this answer

























          0












          0








          0







          labelBinarizer()'s purpose according to the documentation is




          Binarize labels in a one-vs-all fashion



          Several regression and binary classification algorithms are available in scikit-learn.
          A simple way to extend these algorithms to the multi-class classification case is to use > the so-called one-vs-all scheme.




          If your data has only two types of labels, then you can directly feed that to binary classifier. Hence, one column is good enough to capture two classes in One-Vs-Rest fashion.



          Binary targets transform to a column vector



          >>> lb = preprocessing.LabelBinarizer()
          >>> lb.fit_transform(['yes', 'no', 'no', 'yes'])
          array([[1],
          [0],
          [0],
          [1]])


          If your intention is just creating one-hot encoding, use the following method.



          from sklearn.preprocessing import OneHotEncoder
          >>> enc = OneHotEncoder()
          >>> enc.fit_transform([['yes'], ['no'], ['no'], ['yes']]).toarray()
          array([[0., 1.],
          [1., 0.],
          [1., 0.],
          [0., 1.]])


          Hope this clarifies, your question of why Sklearn labelBinarizer() does not convert the 2 class data into two column output.






          share|improve this answer













          labelBinarizer()'s purpose according to the documentation is




          Binarize labels in a one-vs-all fashion



          Several regression and binary classification algorithms are available in scikit-learn.
          A simple way to extend these algorithms to the multi-class classification case is to use > the so-called one-vs-all scheme.




          If your data has only two types of labels, then you can directly feed that to binary classifier. Hence, one column is good enough to capture two classes in One-Vs-Rest fashion.



          Binary targets transform to a column vector



          >>> lb = preprocessing.LabelBinarizer()
          >>> lb.fit_transform(['yes', 'no', 'no', 'yes'])
          array([[1],
          [0],
          [0],
          [1]])


          If your intention is just creating one-hot encoding, use the following method.



          from sklearn.preprocessing import OneHotEncoder
          >>> enc = OneHotEncoder()
          >>> enc.fit_transform([['yes'], ['no'], ['no'], ['yes']]).toarray()
          array([[0., 1.],
          [1., 0.],
          [1., 0.],
          [0., 1.]])


          Hope this clarifies, your question of why Sklearn labelBinarizer() does not convert the 2 class data into two column output.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 26 at 17:32









          ai_learningai_learning

          6,1775 gold badges15 silver badges41 bronze badges




          6,1775 gold badges15 silver badges41 bronze badges























              1














              As already said as a comment, this is not an issue of the method. According to the documentation: Binary targets transform to a column vector. You can build the array you want from the colomn vector result, in the case the dimension is 2.



              A direct and simple way to do this is:



              from sklearn import preprocessing
              lb = preprocessing.LabelBinarizer()
              lb.fit(range(2) # range(0, 2) is the same as range(2)
              a = lb.transform([1, 0])
              result_2d = np.array([[item[0], 0 if item[0] else 1] for item in a])





              share|improve this answer



























                1














                As already said as a comment, this is not an issue of the method. According to the documentation: Binary targets transform to a column vector. You can build the array you want from the colomn vector result, in the case the dimension is 2.



                A direct and simple way to do this is:



                from sklearn import preprocessing
                lb = preprocessing.LabelBinarizer()
                lb.fit(range(2) # range(0, 2) is the same as range(2)
                a = lb.transform([1, 0])
                result_2d = np.array([[item[0], 0 if item[0] else 1] for item in a])





                share|improve this answer

























                  1












                  1








                  1







                  As already said as a comment, this is not an issue of the method. According to the documentation: Binary targets transform to a column vector. You can build the array you want from the colomn vector result, in the case the dimension is 2.



                  A direct and simple way to do this is:



                  from sklearn import preprocessing
                  lb = preprocessing.LabelBinarizer()
                  lb.fit(range(2) # range(0, 2) is the same as range(2)
                  a = lb.transform([1, 0])
                  result_2d = np.array([[item[0], 0 if item[0] else 1] for item in a])





                  share|improve this answer













                  As already said as a comment, this is not an issue of the method. According to the documentation: Binary targets transform to a column vector. You can build the array you want from the colomn vector result, in the case the dimension is 2.



                  A direct and simple way to do this is:



                  from sklearn import preprocessing
                  lb = preprocessing.LabelBinarizer()
                  lb.fit(range(2) # range(0, 2) is the same as range(2)
                  a = lb.transform([1, 0])
                  result_2d = np.array([[item[0], 0 if item[0] else 1] for item in a])






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Mar 26 at 17:36









                  EskappEskapp

                  1,91514 silver badges27 bronze badges




                  1,91514 silver badges27 bronze badges



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55359235%2fone-hot-encoding-using-sklearn-preprocessing-label-binarizer%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

                      용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh

                      155 수학 과학 기타 둘러보기 메뉴eh추가해eh문서를 완성해