TargetEncoder (`from category_encoders`) in scikit-learn pipeline is causing `GridSearchCV` index errorSave classifier to disk in scikit-learnHow to tune parameters of nested Pipelines by GridSearchCV in scikit-learn?Label encoding across multiple columns in scikit-learnscikit learn GridSearchCV on KNeighborsError while using scikit-learn Pipeline and GridSearchCVscikit-learn: StandardScaler() freezes in comb. with Pipeline and GridSearchCVKMeans in pipeline with GridSearchCV scikit-learntrain_test_split not splitting dataHow to use Scikit Learn Wrapper around Keras Bi-directional LSTM ModelGridSearchCV on a working pipeline returns ValueError

Counterfeit check

writing a function between sets vertically

How did the European Union reach the figure of 3% as a maximum allowed deficit?

Is swap gate equivalent to just exchanging the wire of the two qubits?

How would Japanese people react to someone refusing to say “itadakimasu” for religious reasons?

Got a new frameset, don't know why I need this split ring collar?

Basic power tool set for Home repair and simple projects

I have found ports on my Samsung smart tv running a display service. What can I do with it?

How "fast" do astronomical events occur?

In windows systems, is renaming files functionally similar to deleting them?

Justifying Affordable Bespoke Spaceships

Having some issue with notation in a Hilbert space

What is the context for Napoleon's quote "[the Austrians] did not know the value of five minutes"?

How could I create a situation in which a PC has to make a saving throw or be forced to pet a dog?

Co-worker is now managing my team. Does this mean that I'm being demoted?

Is this a valid proof that A = B given A ∩ B = A ∪ B?

Why was New Asgard established at this place?

Why swap space doesn't get filesystem check at boot time?

Operator currying: how to convert f[a,b][c,d] to a+c,b+d?

Is it a bad idea to have a pen name with only an initial for a surname?

King or Queen-Which piece is which?

Does cooling a potato change the nature of its carbohydrates?

What is "dot" sign in •NO?

Explicit song lyrics checker

TargetEncoder (`from category_encoders`) in scikit-learn pipeline is causing `GridSearchCV` index error

Save classifier to disk in scikit-learnHow to tune parameters of nested Pipelines by GridSearchCV in scikit-learn?Label encoding across multiple columns in scikit-learnscikit learn GridSearchCV on KNeighborsError while using scikit-learn Pipeline and GridSearchCVscikit-learn: StandardScaler() freezes in comb. with Pipeline and GridSearchCVKMeans in pipeline with GridSearchCV scikit-learntrain_test_split not splitting dataHow to use Scikit Learn Wrapper around Keras Bi-directional LSTM ModelGridSearchCV on a working pipeline returns ValueError

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I'm using target encoding on some features in my dataset. My full pipeline is as such:

from sklearn.compose import ColumnTransformer

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler

from category_encoders.target_encoder import TargetEncoder

from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split

numeric_features = ['feature_1']
numeric_pipeline = Pipeline(steps=[('scaler', StandardScaler())])

ohe_features = ['feature_2', 'feature_3', 'feature_4']
ohe_pipeline = Pipeline(steps=[('ohe', OneHotEncoder())])

te_features = ['feature_5', 'feature_6']
te_pipeline = TargetEncoder()

preprocessor = ColumnTransformer(transformers=[
 ('numeric', numeric_pipeline, numeric_features), 
 ('ohe_features', ohe_pipeline, ohe_features), 
 ('te_features', te_pipeline, te_features)
 ]
 )

clf_lr = Pipeline(steps=[
 ('preprocessor', preprocessor), 
 ('classifier', LogisticRegression())
 ]
 )

X_train, X_test, y_train, y_test = train_test_split(df_testing.drop(columns='target'), 
 df_testing['target'], 
 stratify=df_testing['target'])

params = 'classifier__C': [0.001, 0.01, 0.05, 0.1, 1]

gs = GridSearchCV(clf_lr, params, cv=3)
gs.fit(X_train, y_train)

The problem is that the call to the fit method in GridSearchCV is failing because of the TargetEncoder step in the pipeline. Specifically, it is throwing

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

Even when I call reset_index(drop=True) on both X_train and y_train, I get this error.

If I just call:

clf_lr.fit(X_train.reset_index(drop=True), y_train.reset_index(drop=True))
clf_lr.score(X_test.reset_index(drop=True), y_train.reset_index(drop=True)) # both calls to reset_index required otherwise the same IndexingError is thrown

the code works. However, I need the cross validation to find the best parameter C for LogisticRegression. The same would apply for cross validation on any other model I wish to try.

Could anyone please let me know if this is a known issue with TargetEncoder or if I've implemented or fitted my pipeline incorrectly?

edited Mar 25 at 7:58

Gad

2,4631834

asked Mar 25 at 5:14

Duke Kong

163

add a comment |

I'm using target encoding on some features in my dataset. My full pipeline is as such:

from sklearn.compose import ColumnTransformer

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler

from category_encoders.target_encoder import TargetEncoder

from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split

numeric_features = ['feature_1']
numeric_pipeline = Pipeline(steps=[('scaler', StandardScaler())])

ohe_features = ['feature_2', 'feature_3', 'feature_4']
ohe_pipeline = Pipeline(steps=[('ohe', OneHotEncoder())])

te_features = ['feature_5', 'feature_6']
te_pipeline = TargetEncoder()

preprocessor = ColumnTransformer(transformers=[
 ('numeric', numeric_pipeline, numeric_features), 
 ('ohe_features', ohe_pipeline, ohe_features), 
 ('te_features', te_pipeline, te_features)
 ]
 )

clf_lr = Pipeline(steps=[
 ('preprocessor', preprocessor), 
 ('classifier', LogisticRegression())
 ]
 )

X_train, X_test, y_train, y_test = train_test_split(df_testing.drop(columns='target'), 
 df_testing['target'], 
 stratify=df_testing['target'])

params = 'classifier__C': [0.001, 0.01, 0.05, 0.1, 1]

gs = GridSearchCV(clf_lr, params, cv=3)
gs.fit(X_train, y_train)

The problem is that the call to the fit method in GridSearchCV is failing because of the TargetEncoder step in the pipeline. Specifically, it is throwing

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

Even when I call reset_index(drop=True) on both X_train and y_train, I get this error.

If I just call:

clf_lr.fit(X_train.reset_index(drop=True), y_train.reset_index(drop=True))
clf_lr.score(X_test.reset_index(drop=True), y_train.reset_index(drop=True)) # both calls to reset_index required otherwise the same IndexingError is thrown

the code works. However, I need the cross validation to find the best parameter C for LogisticRegression. The same would apply for cross validation on any other model I wish to try.

Could anyone please let me know if this is a known issue with TargetEncoder or if I've implemented or fitted my pipeline incorrectly?

edited Mar 25 at 7:58

Gad

2,4631834

asked Mar 25 at 5:14

Duke Kong

163

add a comment |

I'm using target encoding on some features in my dataset. My full pipeline is as such:

from sklearn.compose import ColumnTransformer

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler

from category_encoders.target_encoder import TargetEncoder

from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split

numeric_features = ['feature_1']
numeric_pipeline = Pipeline(steps=[('scaler', StandardScaler())])

ohe_features = ['feature_2', 'feature_3', 'feature_4']
ohe_pipeline = Pipeline(steps=[('ohe', OneHotEncoder())])

te_features = ['feature_5', 'feature_6']
te_pipeline = TargetEncoder()

preprocessor = ColumnTransformer(transformers=[
 ('numeric', numeric_pipeline, numeric_features), 
 ('ohe_features', ohe_pipeline, ohe_features), 
 ('te_features', te_pipeline, te_features)
 ]
 )

clf_lr = Pipeline(steps=[
 ('preprocessor', preprocessor), 
 ('classifier', LogisticRegression())
 ]
 )

X_train, X_test, y_train, y_test = train_test_split(df_testing.drop(columns='target'), 
 df_testing['target'], 
 stratify=df_testing['target'])

params = 'classifier__C': [0.001, 0.01, 0.05, 0.1, 1]

gs = GridSearchCV(clf_lr, params, cv=3)
gs.fit(X_train, y_train)

The problem is that the call to the fit method in GridSearchCV is failing because of the TargetEncoder step in the pipeline. Specifically, it is throwing

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

Even when I call reset_index(drop=True) on both X_train and y_train, I get this error.

If I just call:

clf_lr.fit(X_train.reset_index(drop=True), y_train.reset_index(drop=True))
clf_lr.score(X_test.reset_index(drop=True), y_train.reset_index(drop=True)) # both calls to reset_index required otherwise the same IndexingError is thrown

the code works. However, I need the cross validation to find the best parameter C for LogisticRegression. The same would apply for cross validation on any other model I wish to try.

Could anyone please let me know if this is a known issue with TargetEncoder or if I've implemented or fitted my pipeline incorrectly?

edited Mar 25 at 7:58

Gad

2,4631834

asked Mar 25 at 5:14

Duke Kong

163

I'm using target encoding on some features in my dataset. My full pipeline is as such:

from sklearn.compose import ColumnTransformer

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler

from category_encoders.target_encoder import TargetEncoder

from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split

numeric_features = ['feature_1']
numeric_pipeline = Pipeline(steps=[('scaler', StandardScaler())])

ohe_features = ['feature_2', 'feature_3', 'feature_4']
ohe_pipeline = Pipeline(steps=[('ohe', OneHotEncoder())])

te_features = ['feature_5', 'feature_6']
te_pipeline = TargetEncoder()

preprocessor = ColumnTransformer(transformers=[
 ('numeric', numeric_pipeline, numeric_features), 
 ('ohe_features', ohe_pipeline, ohe_features), 
 ('te_features', te_pipeline, te_features)
 ]
 )

clf_lr = Pipeline(steps=[
 ('preprocessor', preprocessor), 
 ('classifier', LogisticRegression())
 ]
 )

X_train, X_test, y_train, y_test = train_test_split(df_testing.drop(columns='target'), 
 df_testing['target'], 
 stratify=df_testing['target'])

params = 'classifier__C': [0.001, 0.01, 0.05, 0.1, 1]

gs = GridSearchCV(clf_lr, params, cv=3)
gs.fit(X_train, y_train)

The problem is that the call to the fit method in GridSearchCV is failing because of the TargetEncoder step in the pipeline. Specifically, it is throwing

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

Even when I call reset_index(drop=True) on both X_train and y_train, I get this error.

If I just call:

clf_lr.fit(X_train.reset_index(drop=True), y_train.reset_index(drop=True))
clf_lr.score(X_test.reset_index(drop=True), y_train.reset_index(drop=True)) # both calls to reset_index required otherwise the same IndexingError is thrown

the code works. However, I need the cross validation to find the best parameter C for LogisticRegression. The same would apply for cross validation on any other model I wish to try.

Could anyone please let me know if this is a known issue with TargetEncoder or if I've implemented or fitted my pipeline incorrectly?

python-3.x pandas scikit-learn

edited Mar 25 at 7:58

Gad

2,4631834

asked Mar 25 at 5:14

Duke Kong

163

edited Mar 25 at 7:58

Gad

2,4631834

asked Mar 25 at 5:14

Duke Kong

163

edited Mar 25 at 7:58

Gad

2,4631834

edited Mar 25 at 7:58

Gad

2,4631834

edited Mar 25 at 7:58

Gad

2,4631834

asked Mar 25 at 5:14

Duke Kong

163

asked Mar 25 at 5:14

Duke Kong

163

asked Mar 25 at 5:14

Duke Kong

163

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331582%2ftargetencoder-from-category-encoders-in-scikit-learn-pipeline-is-causing-gr%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh