What columns should use in order to train Random Forest?R - Random Forest and more than 53 categoriesR Random Forests Variable ImportanceSuggestions for speeding up Random ForestsR random forest - training set using target column for predictionCreating a loop for different random forest training algorithmsTrain a random forest algorithm using various columnsRandom forest bootstrap training and forest generationDeleting rows in training dataset for Random ForestTraining a Random Forest on TensorflowError in Bagging with party::cforestWhy would different random forest implementations in R yield different results?

Most elegant way to write a one shot IF

Spicket or spigot?

What is a macro? Difference between macro and function?

Is this hogweed?

In native German words, is Q always followed by U, as in English?

Prime parity peregrination

How can I convince my reader that I will not use a certain trope?

What is the line crossing the Pacific Ocean that is shown on maps?

Using aluminium busbar/cables in an aircraft instead of copper

How to formulate maximum function in a constraint?

Generate and graph the Recamán Sequence

How did researchers use to find articles before the Internet and the computer era?

Averting Real Women Don’t Wear Dresses

What is "oversubscription" in Networking?

Can another character physically take something that Mage Hand is carrying/holding?

Who gets an Apparition licence?

Plotting the gradient descent

How exactly is a normal force exerted, at the molecular level?

Why was Mal so quick to drop Bester in favour of Kaylee?

What exactly is a fey/fiend/celestial spirit?

Why isn’t the tax system continuous rather than bracketed?

"Plugged in" or "Plugged in in"

Can a police officer film me on their personal device in my own home?

Different budgets within roommate group



What columns should use in order to train Random Forest?


R - Random Forest and more than 53 categoriesR Random Forests Variable ImportanceSuggestions for speeding up Random ForestsR random forest - training set using target column for predictionCreating a loop for different random forest training algorithmsTrain a random forest algorithm using various columnsRandom forest bootstrap training and forest generationDeleting rows in training dataset for Random ForestTraining a Random Forest on TensorflowError in Bagging with party::cforestWhy would different random forest implementations in R yield different results?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















Background



I am new in machine learning. I want to train model by using Random Forest algorithm. I have database which includes total 9 columns, in which 8 are independent variables and last (9th) variable 'Class' is dependent variable. Dependent variable is predictor variable which contains 3 categories i.e. - S, N, R. All independent variables (except 2) contain more categories than 53. Code shows error when categories become more than 53. I want to train the model in order to identify whether the database line is Suspicious (S), Normal (N), Robot (R). Column numbers 4 and 7 contain more than 19k categories/levels. These are the important columns because they contain attack entries/features etc. How to derive other variables from them becomes complicated.



Code



library('ROCR')
library('randomForest')
library('caret')
library('ranger')

database<-read.csv('data1.csv')
set.seed(1000)
train<-sample(1:310341,217239,replace = FALSE)
traindata<-database[train,]
testdata<-database[-train,]
# fit <- train(database$Class ~ ., data = traindata, method = "ranger")
fit<-randomForest(Class~.,data = traindata, ntree=500, importance= TRUE, proximity = TRUE, na.action = na.roughfix)


Efforts taken



I have tried above code, but due to 4th, and 7th column, it shows error of "more than 53 columns can not be used".



Any help would be appreciated to resolve this issue...










share|improve this question



















  • 1





    Possible duplicate of R - Random Forest and more than 53 categories

    – divibisan
    Apr 2 at 17:49

















0















Background



I am new in machine learning. I want to train model by using Random Forest algorithm. I have database which includes total 9 columns, in which 8 are independent variables and last (9th) variable 'Class' is dependent variable. Dependent variable is predictor variable which contains 3 categories i.e. - S, N, R. All independent variables (except 2) contain more categories than 53. Code shows error when categories become more than 53. I want to train the model in order to identify whether the database line is Suspicious (S), Normal (N), Robot (R). Column numbers 4 and 7 contain more than 19k categories/levels. These are the important columns because they contain attack entries/features etc. How to derive other variables from them becomes complicated.



Code



library('ROCR')
library('randomForest')
library('caret')
library('ranger')

database<-read.csv('data1.csv')
set.seed(1000)
train<-sample(1:310341,217239,replace = FALSE)
traindata<-database[train,]
testdata<-database[-train,]
# fit <- train(database$Class ~ ., data = traindata, method = "ranger")
fit<-randomForest(Class~.,data = traindata, ntree=500, importance= TRUE, proximity = TRUE, na.action = na.roughfix)


Efforts taken



I have tried above code, but due to 4th, and 7th column, it shows error of "more than 53 columns can not be used".



Any help would be appreciated to resolve this issue...










share|improve this question



















  • 1





    Possible duplicate of R - Random Forest and more than 53 categories

    – divibisan
    Apr 2 at 17:49













0












0








0








Background



I am new in machine learning. I want to train model by using Random Forest algorithm. I have database which includes total 9 columns, in which 8 are independent variables and last (9th) variable 'Class' is dependent variable. Dependent variable is predictor variable which contains 3 categories i.e. - S, N, R. All independent variables (except 2) contain more categories than 53. Code shows error when categories become more than 53. I want to train the model in order to identify whether the database line is Suspicious (S), Normal (N), Robot (R). Column numbers 4 and 7 contain more than 19k categories/levels. These are the important columns because they contain attack entries/features etc. How to derive other variables from them becomes complicated.



Code



library('ROCR')
library('randomForest')
library('caret')
library('ranger')

database<-read.csv('data1.csv')
set.seed(1000)
train<-sample(1:310341,217239,replace = FALSE)
traindata<-database[train,]
testdata<-database[-train,]
# fit <- train(database$Class ~ ., data = traindata, method = "ranger")
fit<-randomForest(Class~.,data = traindata, ntree=500, importance= TRUE, proximity = TRUE, na.action = na.roughfix)


Efforts taken



I have tried above code, but due to 4th, and 7th column, it shows error of "more than 53 columns can not be used".



Any help would be appreciated to resolve this issue...










share|improve this question
















Background



I am new in machine learning. I want to train model by using Random Forest algorithm. I have database which includes total 9 columns, in which 8 are independent variables and last (9th) variable 'Class' is dependent variable. Dependent variable is predictor variable which contains 3 categories i.e. - S, N, R. All independent variables (except 2) contain more categories than 53. Code shows error when categories become more than 53. I want to train the model in order to identify whether the database line is Suspicious (S), Normal (N), Robot (R). Column numbers 4 and 7 contain more than 19k categories/levels. These are the important columns because they contain attack entries/features etc. How to derive other variables from them becomes complicated.



Code



library('ROCR')
library('randomForest')
library('caret')
library('ranger')

database<-read.csv('data1.csv')
set.seed(1000)
train<-sample(1:310341,217239,replace = FALSE)
traindata<-database[train,]
testdata<-database[-train,]
# fit <- train(database$Class ~ ., data = traindata, method = "ranger")
fit<-randomForest(Class~.,data = traindata, ntree=500, importance= TRUE, proximity = TRUE, na.action = na.roughfix)


Efforts taken



I have tried above code, but due to 4th, and 7th column, it shows error of "more than 53 columns can not be used".



Any help would be appreciated to resolve this issue...







r random-forest






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 26 at 18:22







DSD

















asked Mar 25 at 11:06









DSDDSD

12 bronze badges




12 bronze badges







  • 1





    Possible duplicate of R - Random Forest and more than 53 categories

    – divibisan
    Apr 2 at 17:49












  • 1





    Possible duplicate of R - Random Forest and more than 53 categories

    – divibisan
    Apr 2 at 17:49







1




1





Possible duplicate of R - Random Forest and more than 53 categories

– divibisan
Apr 2 at 17:49





Possible duplicate of R - Random Forest and more than 53 categories

– divibisan
Apr 2 at 17:49












0






active

oldest

votes










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55336405%2fwhat-columns-should-use-in-order-to-train-random-forest%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes




Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.








Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.




















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55336405%2fwhat-columns-should-use-in-order-to-train-random-forest%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript