Decision Tree status column & related numerical value column The Next CEO of Stack OverflowHow to handle catagorical data while training decision tree using scikit-learn/ sklearn?Feature Importance extraction of Decision Trees (scikit-learn)decision trees from features of multiple datatypesDecision Tree produces different outputsPassing categorical data to Sklearn Decision TreeDecision tree algorithm for mixed numeric and nominal dataUse of one-hot encoder to build decision treesIn sklearn, how can one-hot encoding help when building decision tree with categorical features?Using OneHotEncoder for categorical features in decision tree classifierWhat is relation between R-squared and numerical data in case of Decision Tree?

Are there any unintended negative consequences to allowing PCs to gain multiple levels at once in a short milestone-XP game?

What benefits would be gained by using human laborers instead of drones in deep sea mining?

What connection does MS Office have to Netscape Navigator?

Unreliable Magic - Is it worth it?

What can we do to stop prior company from asking us questions?

How does the Z80 determine which peripheral sent an interrupt?

Is it my responsibility to learn a new technology in my own time my employer wants to implement?

Grabbing quick drinks

What happened in Rome, when the western empire "fell"?

What does convergence in distribution "in the Gromov–Hausdorff" sense mean?

How to prevent changing the value of variable?

Can I run my washing machine drain line into a condensate pump so it drains better?

Which tube will fit a -(700 x 25c) wheel?

Inappropriate reference requests from Journal reviewers

Would a completely good Muggle be able to use a wand?

Multiple labels for a single equation

Is there a difference between "Fahrstuhl" and "Aufzug"

How to invert MapIndexed on a ragged structure? How to construct a tree from rules?

Is it possible to search for a directory/file combination?

Why has the US not been more assertive in confronting Russia in recent years?

Is it professional to write unrelated content in an almost-empty email?

Written every which way

What is the result of assigning to std::vector<T>::begin()?

sp_blitzCache results Memory grants



Decision Tree status column & related numerical value column



The Next CEO of Stack OverflowHow to handle catagorical data while training decision tree using scikit-learn/ sklearn?Feature Importance extraction of Decision Trees (scikit-learn)decision trees from features of multiple datatypesDecision Tree produces different outputsPassing categorical data to Sklearn Decision TreeDecision tree algorithm for mixed numeric and nominal dataUse of one-hot encoder to build decision treesIn sklearn, how can one-hot encoding help when building decision tree with categorical features?Using OneHotEncoder for categorical features in decision tree classifierWhat is relation between R-squared and numerical data in case of Decision Tree?










-1















I have a data including two columns where one is categorically shows the status of the feature & the other one numerically shows the related value. Just like below:



Status & Value columns



I want to run a decision tree algorithm via scikit learn on this data. I am not sure how to deal with these two columns because conceptually I cannot figure out how to bond these tho very correlated features. Basically, we are not supposed to leave null data, however, this one is supposed to be null in numerical column by nature. If we make it "0", it has another meaning.



So, how should I pre-process this data to have the decision tree algorithm work properly?










share|improve this question






















  • Please share what you have tried so far, and what specific programming issues you face; SO is not a code design service, I kindly suggest you re-read How to Ask and What topics can I ask about here?.

    – desertnaut
    Mar 21 at 18:03











  • Thanks for the insight.

    – BTurkeli
    Mar 22 at 5:04















-1















I have a data including two columns where one is categorically shows the status of the feature & the other one numerically shows the related value. Just like below:



Status & Value columns



I want to run a decision tree algorithm via scikit learn on this data. I am not sure how to deal with these two columns because conceptually I cannot figure out how to bond these tho very correlated features. Basically, we are not supposed to leave null data, however, this one is supposed to be null in numerical column by nature. If we make it "0", it has another meaning.



So, how should I pre-process this data to have the decision tree algorithm work properly?










share|improve this question






















  • Please share what you have tried so far, and what specific programming issues you face; SO is not a code design service, I kindly suggest you re-read How to Ask and What topics can I ask about here?.

    – desertnaut
    Mar 21 at 18:03











  • Thanks for the insight.

    – BTurkeli
    Mar 22 at 5:04













-1












-1








-1








I have a data including two columns where one is categorically shows the status of the feature & the other one numerically shows the related value. Just like below:



Status & Value columns



I want to run a decision tree algorithm via scikit learn on this data. I am not sure how to deal with these two columns because conceptually I cannot figure out how to bond these tho very correlated features. Basically, we are not supposed to leave null data, however, this one is supposed to be null in numerical column by nature. If we make it "0", it has another meaning.



So, how should I pre-process this data to have the decision tree algorithm work properly?










share|improve this question














I have a data including two columns where one is categorically shows the status of the feature & the other one numerically shows the related value. Just like below:



Status & Value columns



I want to run a decision tree algorithm via scikit learn on this data. I am not sure how to deal with these two columns because conceptually I cannot figure out how to bond these tho very correlated features. Basically, we are not supposed to leave null data, however, this one is supposed to be null in numerical column by nature. If we make it "0", it has another meaning.



So, how should I pre-process this data to have the decision tree algorithm work properly?







scikit-learn numeric decision-tree categorical-data






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 21 at 16:58









BTurkeliBTurkeli

37118




37118












  • Please share what you have tried so far, and what specific programming issues you face; SO is not a code design service, I kindly suggest you re-read How to Ask and What topics can I ask about here?.

    – desertnaut
    Mar 21 at 18:03











  • Thanks for the insight.

    – BTurkeli
    Mar 22 at 5:04

















  • Please share what you have tried so far, and what specific programming issues you face; SO is not a code design service, I kindly suggest you re-read How to Ask and What topics can I ask about here?.

    – desertnaut
    Mar 21 at 18:03











  • Thanks for the insight.

    – BTurkeli
    Mar 22 at 5:04
















Please share what you have tried so far, and what specific programming issues you face; SO is not a code design service, I kindly suggest you re-read How to Ask and What topics can I ask about here?.

– desertnaut
Mar 21 at 18:03





Please share what you have tried so far, and what specific programming issues you face; SO is not a code design service, I kindly suggest you re-read How to Ask and What topics can I ask about here?.

– desertnaut
Mar 21 at 18:03













Thanks for the insight.

– BTurkeli
Mar 22 at 5:04





Thanks for the insight.

– BTurkeli
Mar 22 at 5:04












1 Answer
1






active

oldest

votes


















0














My prefossor provides a reasonable answer as below.



First, fill the null cells with "0".
If you plug the data into decision tree algorithms with these two features, we have two cases:



  • If "Status" comes first:
    The tree will split 0's and 1's into two branches. Under 0, all Amount values will be already 0, hence this feature will not be chosen. Under 1, there will not be any 0 Status.


  • If "Amount" comes first: All Status 0's will go under only one branch and they will get together with the ones that are very small amounts.


So, If the Amount data is noisy, it might be helpful to keep the Status column. Otherwise, I would remove the Status column.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55285581%2fdecision-tree-status-column-related-numerical-value-column%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    My prefossor provides a reasonable answer as below.



    First, fill the null cells with "0".
    If you plug the data into decision tree algorithms with these two features, we have two cases:



    • If "Status" comes first:
      The tree will split 0's and 1's into two branches. Under 0, all Amount values will be already 0, hence this feature will not be chosen. Under 1, there will not be any 0 Status.


    • If "Amount" comes first: All Status 0's will go under only one branch and they will get together with the ones that are very small amounts.


    So, If the Amount data is noisy, it might be helpful to keep the Status column. Otherwise, I would remove the Status column.






    share|improve this answer



























      0














      My prefossor provides a reasonable answer as below.



      First, fill the null cells with "0".
      If you plug the data into decision tree algorithms with these two features, we have two cases:



      • If "Status" comes first:
        The tree will split 0's and 1's into two branches. Under 0, all Amount values will be already 0, hence this feature will not be chosen. Under 1, there will not be any 0 Status.


      • If "Amount" comes first: All Status 0's will go under only one branch and they will get together with the ones that are very small amounts.


      So, If the Amount data is noisy, it might be helpful to keep the Status column. Otherwise, I would remove the Status column.






      share|improve this answer

























        0












        0








        0







        My prefossor provides a reasonable answer as below.



        First, fill the null cells with "0".
        If you plug the data into decision tree algorithms with these two features, we have two cases:



        • If "Status" comes first:
          The tree will split 0's and 1's into two branches. Under 0, all Amount values will be already 0, hence this feature will not be chosen. Under 1, there will not be any 0 Status.


        • If "Amount" comes first: All Status 0's will go under only one branch and they will get together with the ones that are very small amounts.


        So, If the Amount data is noisy, it might be helpful to keep the Status column. Otherwise, I would remove the Status column.






        share|improve this answer













        My prefossor provides a reasonable answer as below.



        First, fill the null cells with "0".
        If you plug the data into decision tree algorithms with these two features, we have two cases:



        • If "Status" comes first:
          The tree will split 0's and 1's into two branches. Under 0, all Amount values will be already 0, hence this feature will not be chosen. Under 1, there will not be any 0 Status.


        • If "Amount" comes first: All Status 0's will go under only one branch and they will get together with the ones that are very small amounts.


        So, If the Amount data is noisy, it might be helpful to keep the Status column. Otherwise, I would remove the Status column.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 22 at 12:06









        BTurkeliBTurkeli

        37118




        37118





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55285581%2fdecision-tree-status-column-related-numerical-value-column%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

            Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

            Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript