Handling categorical variables in sklearn with one-hot encoding The 2019 Stack Overflow Developer Survey Results Are InAre static class variables possible?Using global variables in a functionHow do I pass a variable by reference?How to access environment variable values?Possible ways to do one hot encoding in scikit-learn?Pandas sklearn one-hot encoding dataframe or numpy?One hot encoding categorical features - Sparse form onlyOne-hot-encoding with missing categoriesOneHotEncoder - encoding only some of categorical variable columnsUsing “one hot” encoded dependent variable in random forest

If a Druid sees an animal’s corpse, can they Wild Shape into that animal?

What is the motivation for a law requiring 2 parties to consent for recording a conversation

Falsification in Math vs Science

How technical should a Scrum Master be to effectively remove impediments?

Why is the Constellation's nose gear so long?

Can we generate random numbers using irrational numbers like π and e?

What tool would a Roman-age civilization have for the breaking of silver and other metals into dust?

Does a dangling wire really electrocute me if I'm standing in water?

Why do we hear so much about the Trump administration deciding to impose and then remove tariffs?

What is the meaning of the verb "bear" in this context?

How come people say “Would of”?

What to do when moving next to a bird sanctuary with a loosely-domesticated cat?

Origin of "cooter" meaning "vagina"

What do the Banks children have against barley water?

What is the closest word meaning "respect for time / mindful"

Looking for Correct Greek Translation for Heraclitus

What does Linus Torvalds mean when he says that Git "never ever" tracks a file?

Delete all lines which don't have n characters before delimiter

Why is the maximum length of OpenWrt’s root password 8 characters?

Are there any other methods to apply to solving simultaneous equations?

For what reasons would an animal species NOT cross a *horizontal* land bridge?

Is a "Democratic" Oligarchy-Style System Possible?

Can someone be penalized for an "unlawful" act if no penalty is specified?

Who coined the term "madman theory"?

Handling categorical variables in sklearn with one-hot encoding

The 2019 Stack Overflow Developer Survey Results Are InAre static class variables possible?Using global variables in a functionHow do I pass a variable by reference?How to access environment variable values?Possible ways to do one hot encoding in scikit-learn?Pandas sklearn one-hot encoding dataframe or numpy?One hot encoding categorical features - Sparse form onlyOne-hot-encoding with missing categoriesOneHotEncoder - encoding only some of categorical variable columnsUsing “one hot” encoded dependent variable in random forest

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

Can someone help with any existing Python class for categorical encoder for sklearn that ticks the following checkboxes?

pandas friendly - option to return a dataframe

should be able to drop 1 column in one-hot encoding

handling of unseens categories in test data.

compatible with sklearn Pipeline object.

edited Mar 22 at 5:46

asked Mar 22 at 3:58

solver149

30529

Such a thing does not exist natively in pandas or sklearn. However, with a little coding, you can wrap OneHotEncoder to do what you want.

– gmds
Mar 22 at 5:02

yes. i couldn't find something on these lines..

– solver149
Mar 22 at 5:04

add a comment |

Can someone help with any existing Python class for categorical encoder for sklearn that ticks the following checkboxes?

pandas friendly - option to return a dataframe

should be able to drop 1 column in one-hot encoding

handling of unseens categories in test data.

compatible with sklearn Pipeline object.

edited Mar 22 at 5:46

asked Mar 22 at 3:58

solver149

30529

Such a thing does not exist natively in pandas or sklearn. However, with a little coding, you can wrap OneHotEncoder to do what you want.

– gmds
Mar 22 at 5:02

yes. i couldn't find something on these lines..

– solver149
Mar 22 at 5:04

add a comment |

Can someone help with any existing Python class for categorical encoder for sklearn that ticks the following checkboxes?

pandas friendly - option to return a dataframe

should be able to drop 1 column in one-hot encoding

handling of unseens categories in test data.

compatible with sklearn Pipeline object.

edited Mar 22 at 5:46

asked Mar 22 at 3:58

solver149

30529

Can someone help with any existing Python class for categorical encoder for sklearn that ticks the following checkboxes?

pandas friendly - option to return a dataframe

should be able to drop 1 column in one-hot encoding

handling of unseens categories in test data.

compatible with sklearn Pipeline object.

python pandas dataframe machine-learning scikit-learn

edited Mar 22 at 5:46

asked Mar 22 at 3:58

solver149

30529

edited Mar 22 at 5:46

asked Mar 22 at 3:58

solver149

30529

edited Mar 22 at 5:46

asked Mar 22 at 3:58

solver149

30529

asked Mar 22 at 3:58

solver149

30529

asked Mar 22 at 3:58

solver149

30529

Such a thing does not exist natively in pandas or sklearn. However, with a little coding, you can wrap OneHotEncoder to do what you want.

– gmds
Mar 22 at 5:02

yes. i couldn't find something on these lines..

– solver149
Mar 22 at 5:04

add a comment |

Such a thing does not exist natively in pandas or sklearn. However, with a little coding, you can wrap OneHotEncoder to do what you want.

– gmds
Mar 22 at 5:02

yes. i couldn't find something on these lines..

– solver149
Mar 22 at 5:04

Such a thing does not exist natively in pandas or sklearn. However, with a little coding, you can wrap OneHotEncoder to do what you want.

– gmds
Mar 22 at 5:02

yes. i couldn't find something on these lines..

– solver149
Mar 22 at 5:04

add a comment |

1 Answer
1

active

oldest

votes

I think you're looking for pandas.get_dummies

See the following example.

df = pd.DataFrame("col_a":['cat','dog','cat','mouse','mouse','cat'], 'col_b':[10,14,16,18,20,22], 'col_c':['a','a','a','b','b','a'])

# `drop_first` parameter will drop the one categorical column
df = pd.get_dummies(df, columns=['col_a','col_c'], drop_first=True)
print(df)

Output:

 col_b col_a_dog col_a_mouse col_c_b 
0 10 0 0 0 
1 14 1 0 0 
2 16 0 0 0 
3 18 0 1 1 
4 20 0 1 1 
5 22 0 0 0

It covers first 2 conditions that you mentioned.

For 3rd condition you can do the following.

create the dummies on the training data
dummy_train = pd.get_dummies(train)

create the dummies in the new (unseen data)
dummy_new = pd.get_dummies(new_data)

re-index the new data to the columns of the training data, filling the missing values with 0
dummy_new.reindex(columns = dummy_train.columns, fill_value=0)

Effectively any new features which are categorical will not go into the classifier, but I think that should not cause problems as it would not know what to do with them.

answered Mar 22 at 4:35

AkshayNevrekar

6,10792042

Sorry. I am aware of this. Looking for something in sklearn standards that can fit into pipelines.

– solver149
Mar 22 at 4:44

1

@AkshayNevrekar I believe OP means a sklearn.pipeline.Pipeline object.

– gmds
Mar 22 at 5:01

yes you are right

– solver149
Mar 22 at 5:05

@solver149 you should add that info in your question.

– AkshayNevrekar
Mar 22 at 5:14

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55292706%2fhandling-categorical-variables-in-sklearn-with-one-hot-encoding%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I think you're looking for pandas.get_dummies

See the following example.

df = pd.DataFrame("col_a":['cat','dog','cat','mouse','mouse','cat'], 'col_b':[10,14,16,18,20,22], 'col_c':['a','a','a','b','b','a'])

# `drop_first` parameter will drop the one categorical column
df = pd.get_dummies(df, columns=['col_a','col_c'], drop_first=True)
print(df)

Output:

 col_b col_a_dog col_a_mouse col_c_b 
0 10 0 0 0 
1 14 1 0 0 
2 16 0 0 0 
3 18 0 1 1 
4 20 0 1 1 
5 22 0 0 0

It covers first 2 conditions that you mentioned.

For 3rd condition you can do the following.

create the dummies on the training data
dummy_train = pd.get_dummies(train)

create the dummies in the new (unseen data)
dummy_new = pd.get_dummies(new_data)

re-index the new data to the columns of the training data, filling the missing values with 0
dummy_new.reindex(columns = dummy_train.columns, fill_value=0)

Effectively any new features which are categorical will not go into the classifier, but I think that should not cause problems as it would not know what to do with them.

answered Mar 22 at 4:35

AkshayNevrekar

6,10792042

Sorry. I am aware of this. Looking for something in sklearn standards that can fit into pipelines.

– solver149
Mar 22 at 4:44

1

@AkshayNevrekar I believe OP means a sklearn.pipeline.Pipeline object.

– gmds
Mar 22 at 5:01

yes you are right

– solver149
Mar 22 at 5:05

@solver149 you should add that info in your question.

– AkshayNevrekar
Mar 22 at 5:14

add a comment |

I think you're looking for pandas.get_dummies

See the following example.

df = pd.DataFrame("col_a":['cat','dog','cat','mouse','mouse','cat'], 'col_b':[10,14,16,18,20,22], 'col_c':['a','a','a','b','b','a'])

# `drop_first` parameter will drop the one categorical column
df = pd.get_dummies(df, columns=['col_a','col_c'], drop_first=True)
print(df)

Output:

 col_b col_a_dog col_a_mouse col_c_b 
0 10 0 0 0 
1 14 1 0 0 
2 16 0 0 0 
3 18 0 1 1 
4 20 0 1 1 
5 22 0 0 0

It covers first 2 conditions that you mentioned.

For 3rd condition you can do the following.

create the dummies on the training data
dummy_train = pd.get_dummies(train)

create the dummies in the new (unseen data)
dummy_new = pd.get_dummies(new_data)

re-index the new data to the columns of the training data, filling the missing values with 0
dummy_new.reindex(columns = dummy_train.columns, fill_value=0)

Effectively any new features which are categorical will not go into the classifier, but I think that should not cause problems as it would not know what to do with them.

answered Mar 22 at 4:35

AkshayNevrekar

6,10792042

Sorry. I am aware of this. Looking for something in sklearn standards that can fit into pipelines.

– solver149
Mar 22 at 4:44

1

@AkshayNevrekar I believe OP means a sklearn.pipeline.Pipeline object.

– gmds
Mar 22 at 5:01

yes you are right

– solver149
Mar 22 at 5:05

@solver149 you should add that info in your question.

– AkshayNevrekar
Mar 22 at 5:14

add a comment |

I think you're looking for pandas.get_dummies

See the following example.

df = pd.DataFrame("col_a":['cat','dog','cat','mouse','mouse','cat'], 'col_b':[10,14,16,18,20,22], 'col_c':['a','a','a','b','b','a'])

# `drop_first` parameter will drop the one categorical column
df = pd.get_dummies(df, columns=['col_a','col_c'], drop_first=True)
print(df)

Output:

 col_b col_a_dog col_a_mouse col_c_b 
0 10 0 0 0 
1 14 1 0 0 
2 16 0 0 0 
3 18 0 1 1 
4 20 0 1 1 
5 22 0 0 0

It covers first 2 conditions that you mentioned.

For 3rd condition you can do the following.

create the dummies on the training data
dummy_train = pd.get_dummies(train)

create the dummies in the new (unseen data)
dummy_new = pd.get_dummies(new_data)

re-index the new data to the columns of the training data, filling the missing values with 0
dummy_new.reindex(columns = dummy_train.columns, fill_value=0)

Effectively any new features which are categorical will not go into the classifier, but I think that should not cause problems as it would not know what to do with them.

answered Mar 22 at 4:35

AkshayNevrekar

6,10792042

I think you're looking for pandas.get_dummies

See the following example.

df = pd.DataFrame("col_a":['cat','dog','cat','mouse','mouse','cat'], 'col_b':[10,14,16,18,20,22], 'col_c':['a','a','a','b','b','a'])

# `drop_first` parameter will drop the one categorical column
df = pd.get_dummies(df, columns=['col_a','col_c'], drop_first=True)
print(df)

Output:

 col_b col_a_dog col_a_mouse col_c_b 
0 10 0 0 0 
1 14 1 0 0 
2 16 0 0 0 
3 18 0 1 1 
4 20 0 1 1 
5 22 0 0 0

It covers first 2 conditions that you mentioned.

For 3rd condition you can do the following.

create the dummies on the training data
dummy_train = pd.get_dummies(train)

create the dummies in the new (unseen data)
dummy_new = pd.get_dummies(new_data)

re-index the new data to the columns of the training data, filling the missing values with 0
dummy_new.reindex(columns = dummy_train.columns, fill_value=0)

Effectively any new features which are categorical will not go into the classifier, but I think that should not cause problems as it would not know what to do with them.

answered Mar 22 at 4:35

AkshayNevrekar

6,10792042

answered Mar 22 at 4:35

AkshayNevrekar

6,10792042

answered Mar 22 at 4:35

AkshayNevrekar

6,10792042

answered Mar 22 at 4:35

AkshayNevrekar

6,10792042

Sorry. I am aware of this. Looking for something in sklearn standards that can fit into pipelines.

– solver149
Mar 22 at 4:44

1

@AkshayNevrekar I believe OP means a sklearn.pipeline.Pipeline object.

– gmds
Mar 22 at 5:01

yes you are right

– solver149
Mar 22 at 5:05

@solver149 you should add that info in your question.

– AkshayNevrekar
Mar 22 at 5:14

add a comment |

Sorry. I am aware of this. Looking for something in sklearn standards that can fit into pipelines.

– solver149
Mar 22 at 4:44

1

@AkshayNevrekar I believe OP means a sklearn.pipeline.Pipeline object.

– gmds
Mar 22 at 5:01

yes you are right

– solver149
Mar 22 at 5:05

@solver149 you should add that info in your question.

– AkshayNevrekar
Mar 22 at 5:14

Sorry. I am aware of this. Looking for something in sklearn standards that can fit into pipelines.

– solver149
Mar 22 at 4:44

@AkshayNevrekar I believe OP means a sklearn.pipeline.Pipeline object.

– gmds
Mar 22 at 5:01

yes you are right

– solver149
Mar 22 at 5:05

@solver149 you should add that info in your question.

– AkshayNevrekar
Mar 22 at 5:14

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer
1

1 Answer
1

1 Answer
1