Decision Tree status column & related numerical value column The Next CEO of Stack OverflowHow to handle catagorical data while training decision tree using scikit-learn/ sklearn?Feature Importance extraction of Decision Trees (scikit-learn)decision trees from features of multiple datatypesDecision Tree produces different outputsPassing categorical data to Sklearn Decision TreeDecision tree algorithm for mixed numeric and nominal dataUse of one-hot encoder to build decision treesIn sklearn, how can one-hot encoding help when building decision tree with categorical features?Using OneHotEncoder for categorical features in decision tree classifierWhat is relation between R-squared and numerical data in case of Decision Tree?

Are there any unintended negative consequences to allowing PCs to gain multiple levels at once in a short milestone-XP game?

What benefits would be gained by using human laborers instead of drones in deep sea mining?

What connection does MS Office have to Netscape Navigator?

Unreliable Magic - Is it worth it?

What can we do to stop prior company from asking us questions?

How does the Z80 determine which peripheral sent an interrupt?

Is it my responsibility to learn a new technology in my own time my employer wants to implement?

Grabbing quick drinks

What happened in Rome, when the western empire "fell"?

What does convergence in distribution "in the Gromov–Hausdorff" sense mean?

How to prevent changing the value of variable?

Can I run my washing machine drain line into a condensate pump so it drains better?

Which tube will fit a -(700 x 25c) wheel?

Inappropriate reference requests from Journal reviewers

Would a completely good Muggle be able to use a wand?

Multiple labels for a single equation

Is there a difference between "Fahrstuhl" and "Aufzug"

How to invert MapIndexed on a ragged structure? How to construct a tree from rules?

Is it possible to search for a directory/file combination?

Why has the US not been more assertive in confronting Russia in recent years?

Is it professional to write unrelated content in an almost-empty email?

Written every which way

What is the result of assigning to std::vector<T>::begin()?

sp_blitzCache results Memory grants



Decision Tree status column & related numerical value column



The Next CEO of Stack OverflowHow to handle catagorical data while training decision tree using scikit-learn/ sklearn?Feature Importance extraction of Decision Trees (scikit-learn)decision trees from features of multiple datatypesDecision Tree produces different outputsPassing categorical data to Sklearn Decision TreeDecision tree algorithm for mixed numeric and nominal dataUse of one-hot encoder to build decision treesIn sklearn, how can one-hot encoding help when building decision tree with categorical features?Using OneHotEncoder for categorical features in decision tree classifierWhat is relation between R-squared and numerical data in case of Decision Tree?










-1















I have a data including two columns where one is categorically shows the status of the feature & the other one numerically shows the related value. Just like below:



Status & Value columns



I want to run a decision tree algorithm via scikit learn on this data. I am not sure how to deal with these two columns because conceptually I cannot figure out how to bond these tho very correlated features. Basically, we are not supposed to leave null data, however, this one is supposed to be null in numerical column by nature. If we make it "0", it has another meaning.



So, how should I pre-process this data to have the decision tree algorithm work properly?










share|improve this question






















  • Please share what you have tried so far, and what specific programming issues you face; SO is not a code design service, I kindly suggest you re-read How to Ask and What topics can I ask about here?.

    – desertnaut
    Mar 21 at 18:03











  • Thanks for the insight.

    – BTurkeli
    Mar 22 at 5:04















-1















I have a data including two columns where one is categorically shows the status of the feature & the other one numerically shows the related value. Just like below:



Status & Value columns



I want to run a decision tree algorithm via scikit learn on this data. I am not sure how to deal with these two columns because conceptually I cannot figure out how to bond these tho very correlated features. Basically, we are not supposed to leave null data, however, this one is supposed to be null in numerical column by nature. If we make it "0", it has another meaning.



So, how should I pre-process this data to have the decision tree algorithm work properly?










share|improve this question






















  • Please share what you have tried so far, and what specific programming issues you face; SO is not a code design service, I kindly suggest you re-read How to Ask and What topics can I ask about here?.

    – desertnaut
    Mar 21 at 18:03











  • Thanks for the insight.

    – BTurkeli
    Mar 22 at 5:04













-1












-1








-1








I have a data including two columns where one is categorically shows the status of the feature & the other one numerically shows the related value. Just like below:



Status & Value columns



I want to run a decision tree algorithm via scikit learn on this data. I am not sure how to deal with these two columns because conceptually I cannot figure out how to bond these tho very correlated features. Basically, we are not supposed to leave null data, however, this one is supposed to be null in numerical column by nature. If we make it "0", it has another meaning.



So, how should I pre-process this data to have the decision tree algorithm work properly?










share|improve this question














I have a data including two columns where one is categorically shows the status of the feature & the other one numerically shows the related value. Just like below:



Status & Value columns



I want to run a decision tree algorithm via scikit learn on this data. I am not sure how to deal with these two columns because conceptually I cannot figure out how to bond these tho very correlated features. Basically, we are not supposed to leave null data, however, this one is supposed to be null in numerical column by nature. If we make it "0", it has another meaning.



So, how should I pre-process this data to have the decision tree algorithm work properly?







scikit-learn numeric decision-tree categorical-data






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 21 at 16:58









BTurkeliBTurkeli

37118




37118












  • Please share what you have tried so far, and what specific programming issues you face; SO is not a code design service, I kindly suggest you re-read How to Ask and What topics can I ask about here?.

    – desertnaut
    Mar 21 at 18:03











  • Thanks for the insight.

    – BTurkeli
    Mar 22 at 5:04

















  • Please share what you have tried so far, and what specific programming issues you face; SO is not a code design service, I kindly suggest you re-read How to Ask and What topics can I ask about here?.

    – desertnaut
    Mar 21 at 18:03











  • Thanks for the insight.

    – BTurkeli
    Mar 22 at 5:04
















Please share what you have tried so far, and what specific programming issues you face; SO is not a code design service, I kindly suggest you re-read How to Ask and What topics can I ask about here?.

– desertnaut
Mar 21 at 18:03





Please share what you have tried so far, and what specific programming issues you face; SO is not a code design service, I kindly suggest you re-read How to Ask and What topics can I ask about here?.

– desertnaut
Mar 21 at 18:03













Thanks for the insight.

– BTurkeli
Mar 22 at 5:04





Thanks for the insight.

– BTurkeli
Mar 22 at 5:04












1 Answer
1






active

oldest

votes


















0














My prefossor provides a reasonable answer as below.



First, fill the null cells with "0".
If you plug the data into decision tree algorithms with these two features, we have two cases:



  • If "Status" comes first:
    The tree will split 0's and 1's into two branches. Under 0, all Amount values will be already 0, hence this feature will not be chosen. Under 1, there will not be any 0 Status.


  • If "Amount" comes first: All Status 0's will go under only one branch and they will get together with the ones that are very small amounts.


So, If the Amount data is noisy, it might be helpful to keep the Status column. Otherwise, I would remove the Status column.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55285581%2fdecision-tree-status-column-related-numerical-value-column%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    My prefossor provides a reasonable answer as below.



    First, fill the null cells with "0".
    If you plug the data into decision tree algorithms with these two features, we have two cases:



    • If "Status" comes first:
      The tree will split 0's and 1's into two branches. Under 0, all Amount values will be already 0, hence this feature will not be chosen. Under 1, there will not be any 0 Status.


    • If "Amount" comes first: All Status 0's will go under only one branch and they will get together with the ones that are very small amounts.


    So, If the Amount data is noisy, it might be helpful to keep the Status column. Otherwise, I would remove the Status column.






    share|improve this answer



























      0














      My prefossor provides a reasonable answer as below.



      First, fill the null cells with "0".
      If you plug the data into decision tree algorithms with these two features, we have two cases:



      • If "Status" comes first:
        The tree will split 0's and 1's into two branches. Under 0, all Amount values will be already 0, hence this feature will not be chosen. Under 1, there will not be any 0 Status.


      • If "Amount" comes first: All Status 0's will go under only one branch and they will get together with the ones that are very small amounts.


      So, If the Amount data is noisy, it might be helpful to keep the Status column. Otherwise, I would remove the Status column.






      share|improve this answer

























        0












        0








        0







        My prefossor provides a reasonable answer as below.



        First, fill the null cells with "0".
        If you plug the data into decision tree algorithms with these two features, we have two cases:



        • If "Status" comes first:
          The tree will split 0's and 1's into two branches. Under 0, all Amount values will be already 0, hence this feature will not be chosen. Under 1, there will not be any 0 Status.


        • If "Amount" comes first: All Status 0's will go under only one branch and they will get together with the ones that are very small amounts.


        So, If the Amount data is noisy, it might be helpful to keep the Status column. Otherwise, I would remove the Status column.






        share|improve this answer













        My prefossor provides a reasonable answer as below.



        First, fill the null cells with "0".
        If you plug the data into decision tree algorithms with these two features, we have two cases:



        • If "Status" comes first:
          The tree will split 0's and 1's into two branches. Under 0, all Amount values will be already 0, hence this feature will not be chosen. Under 1, there will not be any 0 Status.


        • If "Amount" comes first: All Status 0's will go under only one branch and they will get together with the ones that are very small amounts.


        So, If the Amount data is noisy, it might be helpful to keep the Status column. Otherwise, I would remove the Status column.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 22 at 12:06









        BTurkeliBTurkeli

        37118




        37118





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55285581%2fdecision-tree-status-column-related-numerical-value-column%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

            SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

            은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현