TargetEncoder (`from category_encoders`) in scikit-learn pipeline is causing `GridSearchCV` index errorSave classifier to disk in scikit-learnHow to tune parameters of nested Pipelines by GridSearchCV in scikit-learn?Label encoding across multiple columns in scikit-learnscikit learn GridSearchCV on KNeighborsError while using scikit-learn Pipeline and GridSearchCVscikit-learn: StandardScaler() freezes in comb. with Pipeline and GridSearchCVKMeans in pipeline with GridSearchCV scikit-learntrain_test_split not splitting dataHow to use Scikit Learn Wrapper around Keras Bi-directional LSTM ModelGridSearchCV on a working pipeline returns ValueError
Counterfeit check
writing a function between sets vertically
How did the European Union reach the figure of 3% as a maximum allowed deficit?
Is swap gate equivalent to just exchanging the wire of the two qubits?
How would Japanese people react to someone refusing to say “itadakimasu” for religious reasons?
Got a new frameset, don't know why I need this split ring collar?
Basic power tool set for Home repair and simple projects
I have found ports on my Samsung smart tv running a display service. What can I do with it?
How "fast" do astronomical events occur?
In windows systems, is renaming files functionally similar to deleting them?
Justifying Affordable Bespoke Spaceships
Having some issue with notation in a Hilbert space
What is the context for Napoleon's quote "[the Austrians] did not know the value of five minutes"?
How could I create a situation in which a PC has to make a saving throw or be forced to pet a dog?
Co-worker is now managing my team. Does this mean that I'm being demoted?
Is this a valid proof that A = B given A ∩ B = A ∪ B?
Why was New Asgard established at this place?
Why swap space doesn't get filesystem check at boot time?
Operator currying: how to convert f[a,b][c,d] to a+c,b+d?
Is it a bad idea to have a pen name with only an initial for a surname?
King or Queen-Which piece is which?
Does cooling a potato change the nature of its carbohydrates?
What is "dot" sign in •NO?
Explicit song lyrics checker
TargetEncoder (`from category_encoders`) in scikit-learn pipeline is causing `GridSearchCV` index error
Save classifier to disk in scikit-learnHow to tune parameters of nested Pipelines by GridSearchCV in scikit-learn?Label encoding across multiple columns in scikit-learnscikit learn GridSearchCV on KNeighborsError while using scikit-learn Pipeline and GridSearchCVscikit-learn: StandardScaler() freezes in comb. with Pipeline and GridSearchCVKMeans in pipeline with GridSearchCV scikit-learntrain_test_split not splitting dataHow to use Scikit Learn Wrapper around Keras Bi-directional LSTM ModelGridSearchCV on a working pipeline returns ValueError
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I'm using target encoding on some features in my dataset. My full pipeline is as such:
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from category_encoders.target_encoder import TargetEncoder
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
numeric_features = ['feature_1']
numeric_pipeline = Pipeline(steps=[('scaler', StandardScaler())])
ohe_features = ['feature_2', 'feature_3', 'feature_4']
ohe_pipeline = Pipeline(steps=[('ohe', OneHotEncoder())])
te_features = ['feature_5', 'feature_6']
te_pipeline = TargetEncoder()
preprocessor = ColumnTransformer(transformers=[
('numeric', numeric_pipeline, numeric_features),
('ohe_features', ohe_pipeline, ohe_features),
('te_features', te_pipeline, te_features)
]
)
clf_lr = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', LogisticRegression())
]
)
X_train, X_test, y_train, y_test = train_test_split(df_testing.drop(columns='target'),
df_testing['target'],
stratify=df_testing['target'])
params = 'classifier__C': [0.001, 0.01, 0.05, 0.1, 1]
gs = GridSearchCV(clf_lr, params, cv=3)
gs.fit(X_train, y_train)
The problem is that the call to the fit method in GridSearchCV is failing because of the TargetEncoder step in the pipeline. Specifically, it is throwing
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
Even when I call reset_index(drop=True)
on both X_train and y_train, I get this error.
If I just call:
clf_lr.fit(X_train.reset_index(drop=True), y_train.reset_index(drop=True))
clf_lr.score(X_test.reset_index(drop=True), y_train.reset_index(drop=True)) # both calls to reset_index required otherwise the same IndexingError is thrown
the code works. However, I need the cross validation to find the best parameter C for LogisticRegression. The same would apply for cross validation on any other model I wish to try.
Could anyone please let me know if this is a known issue with TargetEncoder or if I've implemented or fitted my pipeline incorrectly?
python-3.x pandas scikit-learn
add a comment |
I'm using target encoding on some features in my dataset. My full pipeline is as such:
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from category_encoders.target_encoder import TargetEncoder
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
numeric_features = ['feature_1']
numeric_pipeline = Pipeline(steps=[('scaler', StandardScaler())])
ohe_features = ['feature_2', 'feature_3', 'feature_4']
ohe_pipeline = Pipeline(steps=[('ohe', OneHotEncoder())])
te_features = ['feature_5', 'feature_6']
te_pipeline = TargetEncoder()
preprocessor = ColumnTransformer(transformers=[
('numeric', numeric_pipeline, numeric_features),
('ohe_features', ohe_pipeline, ohe_features),
('te_features', te_pipeline, te_features)
]
)
clf_lr = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', LogisticRegression())
]
)
X_train, X_test, y_train, y_test = train_test_split(df_testing.drop(columns='target'),
df_testing['target'],
stratify=df_testing['target'])
params = 'classifier__C': [0.001, 0.01, 0.05, 0.1, 1]
gs = GridSearchCV(clf_lr, params, cv=3)
gs.fit(X_train, y_train)
The problem is that the call to the fit method in GridSearchCV is failing because of the TargetEncoder step in the pipeline. Specifically, it is throwing
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
Even when I call reset_index(drop=True)
on both X_train and y_train, I get this error.
If I just call:
clf_lr.fit(X_train.reset_index(drop=True), y_train.reset_index(drop=True))
clf_lr.score(X_test.reset_index(drop=True), y_train.reset_index(drop=True)) # both calls to reset_index required otherwise the same IndexingError is thrown
the code works. However, I need the cross validation to find the best parameter C for LogisticRegression. The same would apply for cross validation on any other model I wish to try.
Could anyone please let me know if this is a known issue with TargetEncoder or if I've implemented or fitted my pipeline incorrectly?
python-3.x pandas scikit-learn
add a comment |
I'm using target encoding on some features in my dataset. My full pipeline is as such:
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from category_encoders.target_encoder import TargetEncoder
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
numeric_features = ['feature_1']
numeric_pipeline = Pipeline(steps=[('scaler', StandardScaler())])
ohe_features = ['feature_2', 'feature_3', 'feature_4']
ohe_pipeline = Pipeline(steps=[('ohe', OneHotEncoder())])
te_features = ['feature_5', 'feature_6']
te_pipeline = TargetEncoder()
preprocessor = ColumnTransformer(transformers=[
('numeric', numeric_pipeline, numeric_features),
('ohe_features', ohe_pipeline, ohe_features),
('te_features', te_pipeline, te_features)
]
)
clf_lr = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', LogisticRegression())
]
)
X_train, X_test, y_train, y_test = train_test_split(df_testing.drop(columns='target'),
df_testing['target'],
stratify=df_testing['target'])
params = 'classifier__C': [0.001, 0.01, 0.05, 0.1, 1]
gs = GridSearchCV(clf_lr, params, cv=3)
gs.fit(X_train, y_train)
The problem is that the call to the fit method in GridSearchCV is failing because of the TargetEncoder step in the pipeline. Specifically, it is throwing
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
Even when I call reset_index(drop=True)
on both X_train and y_train, I get this error.
If I just call:
clf_lr.fit(X_train.reset_index(drop=True), y_train.reset_index(drop=True))
clf_lr.score(X_test.reset_index(drop=True), y_train.reset_index(drop=True)) # both calls to reset_index required otherwise the same IndexingError is thrown
the code works. However, I need the cross validation to find the best parameter C for LogisticRegression. The same would apply for cross validation on any other model I wish to try.
Could anyone please let me know if this is a known issue with TargetEncoder or if I've implemented or fitted my pipeline incorrectly?
python-3.x pandas scikit-learn
I'm using target encoding on some features in my dataset. My full pipeline is as such:
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from category_encoders.target_encoder import TargetEncoder
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
numeric_features = ['feature_1']
numeric_pipeline = Pipeline(steps=[('scaler', StandardScaler())])
ohe_features = ['feature_2', 'feature_3', 'feature_4']
ohe_pipeline = Pipeline(steps=[('ohe', OneHotEncoder())])
te_features = ['feature_5', 'feature_6']
te_pipeline = TargetEncoder()
preprocessor = ColumnTransformer(transformers=[
('numeric', numeric_pipeline, numeric_features),
('ohe_features', ohe_pipeline, ohe_features),
('te_features', te_pipeline, te_features)
]
)
clf_lr = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', LogisticRegression())
]
)
X_train, X_test, y_train, y_test = train_test_split(df_testing.drop(columns='target'),
df_testing['target'],
stratify=df_testing['target'])
params = 'classifier__C': [0.001, 0.01, 0.05, 0.1, 1]
gs = GridSearchCV(clf_lr, params, cv=3)
gs.fit(X_train, y_train)
The problem is that the call to the fit method in GridSearchCV is failing because of the TargetEncoder step in the pipeline. Specifically, it is throwing
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
Even when I call reset_index(drop=True)
on both X_train and y_train, I get this error.
If I just call:
clf_lr.fit(X_train.reset_index(drop=True), y_train.reset_index(drop=True))
clf_lr.score(X_test.reset_index(drop=True), y_train.reset_index(drop=True)) # both calls to reset_index required otherwise the same IndexingError is thrown
the code works. However, I need the cross validation to find the best parameter C for LogisticRegression. The same would apply for cross validation on any other model I wish to try.
Could anyone please let me know if this is a known issue with TargetEncoder or if I've implemented or fitted my pipeline incorrectly?
python-3.x pandas scikit-learn
python-3.x pandas scikit-learn
edited Mar 25 at 7:58
Gad
2,4631834
2,4631834
asked Mar 25 at 5:14
Duke KongDuke Kong
163
163
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331582%2ftargetencoder-from-category-encoders-in-scikit-learn-pipeline-is-causing-gr%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331582%2ftargetencoder-from-category-encoders-in-scikit-learn-pipeline-is-causing-gr%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown