TargetEncoder (`from category_encoders`) in scikit-learn pipeline is causing `GridSearchCV` index errorSave classifier to disk in scikit-learnHow to tune parameters of nested Pipelines by GridSearchCV in scikit-learn?Label encoding across multiple columns in scikit-learnscikit learn GridSearchCV on KNeighborsError while using scikit-learn Pipeline and GridSearchCVscikit-learn: StandardScaler() freezes in comb. with Pipeline and GridSearchCVKMeans in pipeline with GridSearchCV scikit-learntrain_test_split not splitting dataHow to use Scikit Learn Wrapper around Keras Bi-directional LSTM ModelGridSearchCV on a working pipeline returns ValueError

Counterfeit check

writing a function between sets vertically

How did the European Union reach the figure of 3% as a maximum allowed deficit?

Is swap gate equivalent to just exchanging the wire of the two qubits?

How would Japanese people react to someone refusing to say “itadakimasu” for religious reasons?

Got a new frameset, don't know why I need this split ring collar?

Basic power tool set for Home repair and simple projects

I have found ports on my Samsung smart tv running a display service. What can I do with it?

How "fast" do astronomical events occur?

In windows systems, is renaming files functionally similar to deleting them?

Justifying Affordable Bespoke Spaceships

Having some issue with notation in a Hilbert space

What is the context for Napoleon's quote "[the Austrians] did not know the value of five minutes"?

How could I create a situation in which a PC has to make a saving throw or be forced to pet a dog?

Co-worker is now managing my team. Does this mean that I'm being demoted?

Is this a valid proof that A = B given A ∩ B = A ∪ B?

Why was New Asgard established at this place?

Why swap space doesn't get filesystem check at boot time?

Operator currying: how to convert f[a,b][c,d] to a+c,b+d?

Is it a bad idea to have a pen name with only an initial for a surname?

King or Queen-Which piece is which?

Does cooling a potato change the nature of its carbohydrates?

What is "dot" sign in •NO?

Explicit song lyrics checker



TargetEncoder (`from category_encoders`) in scikit-learn pipeline is causing `GridSearchCV` index error


Save classifier to disk in scikit-learnHow to tune parameters of nested Pipelines by GridSearchCV in scikit-learn?Label encoding across multiple columns in scikit-learnscikit learn GridSearchCV on KNeighborsError while using scikit-learn Pipeline and GridSearchCVscikit-learn: StandardScaler() freezes in comb. with Pipeline and GridSearchCVKMeans in pipeline with GridSearchCV scikit-learntrain_test_split not splitting dataHow to use Scikit Learn Wrapper around Keras Bi-directional LSTM ModelGridSearchCV on a working pipeline returns ValueError






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















I'm using target encoding on some features in my dataset. My full pipeline is as such:



from sklearn.compose import ColumnTransformer

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler

from category_encoders.target_encoder import TargetEncoder

from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split

numeric_features = ['feature_1']
numeric_pipeline = Pipeline(steps=[('scaler', StandardScaler())])

ohe_features = ['feature_2', 'feature_3', 'feature_4']
ohe_pipeline = Pipeline(steps=[('ohe', OneHotEncoder())])

te_features = ['feature_5', 'feature_6']
te_pipeline = TargetEncoder()

preprocessor = ColumnTransformer(transformers=[
('numeric', numeric_pipeline, numeric_features),
('ohe_features', ohe_pipeline, ohe_features),
('te_features', te_pipeline, te_features)
]
)

clf_lr = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', LogisticRegression())
]
)

X_train, X_test, y_train, y_test = train_test_split(df_testing.drop(columns='target'),
df_testing['target'],
stratify=df_testing['target'])

params = 'classifier__C': [0.001, 0.01, 0.05, 0.1, 1]

gs = GridSearchCV(clf_lr, params, cv=3)
gs.fit(X_train, y_train)


The problem is that the call to the fit method in GridSearchCV is failing because of the TargetEncoder step in the pipeline. Specifically, it is throwing



IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match


Even when I call reset_index(drop=True) on both X_train and y_train, I get this error.



If I just call:



clf_lr.fit(X_train.reset_index(drop=True), y_train.reset_index(drop=True))
clf_lr.score(X_test.reset_index(drop=True), y_train.reset_index(drop=True)) # both calls to reset_index required otherwise the same IndexingError is thrown


the code works. However, I need the cross validation to find the best parameter C for LogisticRegression. The same would apply for cross validation on any other model I wish to try.



Could anyone please let me know if this is a known issue with TargetEncoder or if I've implemented or fitted my pipeline incorrectly?










share|improve this question






























    1















    I'm using target encoding on some features in my dataset. My full pipeline is as such:



    from sklearn.compose import ColumnTransformer

    from sklearn.pipeline import Pipeline

    from sklearn.preprocessing import OneHotEncoder
    from sklearn.preprocessing import StandardScaler

    from category_encoders.target_encoder import TargetEncoder

    from sklearn.model_selection import GridSearchCV
    from sklearn.model_selection import train_test_split

    numeric_features = ['feature_1']
    numeric_pipeline = Pipeline(steps=[('scaler', StandardScaler())])

    ohe_features = ['feature_2', 'feature_3', 'feature_4']
    ohe_pipeline = Pipeline(steps=[('ohe', OneHotEncoder())])

    te_features = ['feature_5', 'feature_6']
    te_pipeline = TargetEncoder()

    preprocessor = ColumnTransformer(transformers=[
    ('numeric', numeric_pipeline, numeric_features),
    ('ohe_features', ohe_pipeline, ohe_features),
    ('te_features', te_pipeline, te_features)
    ]
    )

    clf_lr = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', LogisticRegression())
    ]
    )

    X_train, X_test, y_train, y_test = train_test_split(df_testing.drop(columns='target'),
    df_testing['target'],
    stratify=df_testing['target'])

    params = 'classifier__C': [0.001, 0.01, 0.05, 0.1, 1]

    gs = GridSearchCV(clf_lr, params, cv=3)
    gs.fit(X_train, y_train)


    The problem is that the call to the fit method in GridSearchCV is failing because of the TargetEncoder step in the pipeline. Specifically, it is throwing



    IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match


    Even when I call reset_index(drop=True) on both X_train and y_train, I get this error.



    If I just call:



    clf_lr.fit(X_train.reset_index(drop=True), y_train.reset_index(drop=True))
    clf_lr.score(X_test.reset_index(drop=True), y_train.reset_index(drop=True)) # both calls to reset_index required otherwise the same IndexingError is thrown


    the code works. However, I need the cross validation to find the best parameter C for LogisticRegression. The same would apply for cross validation on any other model I wish to try.



    Could anyone please let me know if this is a known issue with TargetEncoder or if I've implemented or fitted my pipeline incorrectly?










    share|improve this question


























      1












      1








      1


      1






      I'm using target encoding on some features in my dataset. My full pipeline is as such:



      from sklearn.compose import ColumnTransformer

      from sklearn.pipeline import Pipeline

      from sklearn.preprocessing import OneHotEncoder
      from sklearn.preprocessing import StandardScaler

      from category_encoders.target_encoder import TargetEncoder

      from sklearn.model_selection import GridSearchCV
      from sklearn.model_selection import train_test_split

      numeric_features = ['feature_1']
      numeric_pipeline = Pipeline(steps=[('scaler', StandardScaler())])

      ohe_features = ['feature_2', 'feature_3', 'feature_4']
      ohe_pipeline = Pipeline(steps=[('ohe', OneHotEncoder())])

      te_features = ['feature_5', 'feature_6']
      te_pipeline = TargetEncoder()

      preprocessor = ColumnTransformer(transformers=[
      ('numeric', numeric_pipeline, numeric_features),
      ('ohe_features', ohe_pipeline, ohe_features),
      ('te_features', te_pipeline, te_features)
      ]
      )

      clf_lr = Pipeline(steps=[
      ('preprocessor', preprocessor),
      ('classifier', LogisticRegression())
      ]
      )

      X_train, X_test, y_train, y_test = train_test_split(df_testing.drop(columns='target'),
      df_testing['target'],
      stratify=df_testing['target'])

      params = 'classifier__C': [0.001, 0.01, 0.05, 0.1, 1]

      gs = GridSearchCV(clf_lr, params, cv=3)
      gs.fit(X_train, y_train)


      The problem is that the call to the fit method in GridSearchCV is failing because of the TargetEncoder step in the pipeline. Specifically, it is throwing



      IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match


      Even when I call reset_index(drop=True) on both X_train and y_train, I get this error.



      If I just call:



      clf_lr.fit(X_train.reset_index(drop=True), y_train.reset_index(drop=True))
      clf_lr.score(X_test.reset_index(drop=True), y_train.reset_index(drop=True)) # both calls to reset_index required otherwise the same IndexingError is thrown


      the code works. However, I need the cross validation to find the best parameter C for LogisticRegression. The same would apply for cross validation on any other model I wish to try.



      Could anyone please let me know if this is a known issue with TargetEncoder or if I've implemented or fitted my pipeline incorrectly?










      share|improve this question
















      I'm using target encoding on some features in my dataset. My full pipeline is as such:



      from sklearn.compose import ColumnTransformer

      from sklearn.pipeline import Pipeline

      from sklearn.preprocessing import OneHotEncoder
      from sklearn.preprocessing import StandardScaler

      from category_encoders.target_encoder import TargetEncoder

      from sklearn.model_selection import GridSearchCV
      from sklearn.model_selection import train_test_split

      numeric_features = ['feature_1']
      numeric_pipeline = Pipeline(steps=[('scaler', StandardScaler())])

      ohe_features = ['feature_2', 'feature_3', 'feature_4']
      ohe_pipeline = Pipeline(steps=[('ohe', OneHotEncoder())])

      te_features = ['feature_5', 'feature_6']
      te_pipeline = TargetEncoder()

      preprocessor = ColumnTransformer(transformers=[
      ('numeric', numeric_pipeline, numeric_features),
      ('ohe_features', ohe_pipeline, ohe_features),
      ('te_features', te_pipeline, te_features)
      ]
      )

      clf_lr = Pipeline(steps=[
      ('preprocessor', preprocessor),
      ('classifier', LogisticRegression())
      ]
      )

      X_train, X_test, y_train, y_test = train_test_split(df_testing.drop(columns='target'),
      df_testing['target'],
      stratify=df_testing['target'])

      params = 'classifier__C': [0.001, 0.01, 0.05, 0.1, 1]

      gs = GridSearchCV(clf_lr, params, cv=3)
      gs.fit(X_train, y_train)


      The problem is that the call to the fit method in GridSearchCV is failing because of the TargetEncoder step in the pipeline. Specifically, it is throwing



      IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match


      Even when I call reset_index(drop=True) on both X_train and y_train, I get this error.



      If I just call:



      clf_lr.fit(X_train.reset_index(drop=True), y_train.reset_index(drop=True))
      clf_lr.score(X_test.reset_index(drop=True), y_train.reset_index(drop=True)) # both calls to reset_index required otherwise the same IndexingError is thrown


      the code works. However, I need the cross validation to find the best parameter C for LogisticRegression. The same would apply for cross validation on any other model I wish to try.



      Could anyone please let me know if this is a known issue with TargetEncoder or if I've implemented or fitted my pipeline incorrectly?







      python-3.x pandas scikit-learn






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 25 at 7:58









      Gad

      2,4631834




      2,4631834










      asked Mar 25 at 5:14









      Duke KongDuke Kong

      163




      163






















          0






          active

          oldest

          votes












          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331582%2ftargetencoder-from-category-encoders-in-scikit-learn-pipeline-is-causing-gr%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331582%2ftargetencoder-from-category-encoders-in-scikit-learn-pipeline-is-causing-gr%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

          Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

          Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript