Sending Pandas Dataframe with Int64 type to GCP Spanner INT64 column The Next CEO of Stack OverflowAdd one row to pandas DataFrameSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column nameHow to drop rows of Pandas DataFrame whose value in certain columns is NaNChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers

Why, when going from special to general relativity, do we just replace partial derivatives with covariant derivatives?

Example of a Mathematician/Physicist whose Other Publications during their PhD eclipsed their PhD Thesis

Running a General Election and the European Elections together

Grabbing quick drinks

TikZ: How to reverse arrow direction without switching start/end point?

Can you be charged for obstruction for refusing to answer questions?

Chain wire methods together in Lightning Web Components

How to get from Geneva Airport to Metabief, Doubs, France by public transport?

If the heap is zero-initialized for security, then why is the stack merely uninitialized?

Is there a way to save my career from absolute disaster?

Easy to read palindrome checker

Should I tutor a student who I know has cheated on their homework?

Why the difference in type-inference over the as-pattern in two similar function definitions?

Calculator final project in Python

Flying from Cape Town to England and return to another province

How I can get glyphs from a fraktur font and use them as identifiers?

A small doubt about the dominated convergence theorem

Why is the US ranked as #45 in Press Freedom ratings, despite its extremely permissive free speech laws?

How did people program for Consoles with multiple CPUs?

How to avoid supervisors with prejudiced views?

How many extra stops do monopods offer for tele photographs?

How to count occurrences of text in a file?

Would a completely good Muggle be able to use a wand?

What connection does MS Office have to Netscape Navigator?



Sending Pandas Dataframe with Int64 type to GCP Spanner INT64 column



The Next CEO of Stack OverflowAdd one row to pandas DataFrameSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column nameHow to drop rows of Pandas DataFrame whose value in certain columns is NaNChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers










2















I am using Pandas Dataframes. I have a column from a CSV which is integers mixed in with nulls.



I am trying to convert this and insert it into Spanner in as generalizable a way as possible(so I can use the same code for future jobs), which reduces my ability to use sentinel variables. However, DFs cannot handle NaNs in a pure int column so you have to use Int64. When I try to insert this into Spanner I get an error that it is not an int64 type, whereas pure Python ints do work. Is there an automatic way to convert Int64 Pandas values to int values during the insert? Converting the column before inserting doesn't work, again, because of the null values. Is there another path around this?



Trying to convert from a Series goes like so:



>>>s2=pd.Series([3.0,5.0])
>>>s2
0 3.0
1 5.0
dtype: float64
>>>s1=pd.Series([3.0,None])
>>>s1
0 3.0
1 NaN
dtype: float64
>>>df = pd.DataFrame(data=[s1,s2], dtype=np.int64)
>>>df
0 1
0 3 NaN
1 3 5.0
>>>df = pd.DataFrame(data="nullable": s1, "nonnullable": s2, dtype=np.int64)


this last command produces the error ValueError: Cannot convert non-finite values (NA or inf) to integer










share|improve this question




























    2















    I am using Pandas Dataframes. I have a column from a CSV which is integers mixed in with nulls.



    I am trying to convert this and insert it into Spanner in as generalizable a way as possible(so I can use the same code for future jobs), which reduces my ability to use sentinel variables. However, DFs cannot handle NaNs in a pure int column so you have to use Int64. When I try to insert this into Spanner I get an error that it is not an int64 type, whereas pure Python ints do work. Is there an automatic way to convert Int64 Pandas values to int values during the insert? Converting the column before inserting doesn't work, again, because of the null values. Is there another path around this?



    Trying to convert from a Series goes like so:



    >>>s2=pd.Series([3.0,5.0])
    >>>s2
    0 3.0
    1 5.0
    dtype: float64
    >>>s1=pd.Series([3.0,None])
    >>>s1
    0 3.0
    1 NaN
    dtype: float64
    >>>df = pd.DataFrame(data=[s1,s2], dtype=np.int64)
    >>>df
    0 1
    0 3 NaN
    1 3 5.0
    >>>df = pd.DataFrame(data="nullable": s1, "nonnullable": s2, dtype=np.int64)


    this last command produces the error ValueError: Cannot convert non-finite values (NA or inf) to integer










    share|improve this question


























      2












      2








      2








      I am using Pandas Dataframes. I have a column from a CSV which is integers mixed in with nulls.



      I am trying to convert this and insert it into Spanner in as generalizable a way as possible(so I can use the same code for future jobs), which reduces my ability to use sentinel variables. However, DFs cannot handle NaNs in a pure int column so you have to use Int64. When I try to insert this into Spanner I get an error that it is not an int64 type, whereas pure Python ints do work. Is there an automatic way to convert Int64 Pandas values to int values during the insert? Converting the column before inserting doesn't work, again, because of the null values. Is there another path around this?



      Trying to convert from a Series goes like so:



      >>>s2=pd.Series([3.0,5.0])
      >>>s2
      0 3.0
      1 5.0
      dtype: float64
      >>>s1=pd.Series([3.0,None])
      >>>s1
      0 3.0
      1 NaN
      dtype: float64
      >>>df = pd.DataFrame(data=[s1,s2], dtype=np.int64)
      >>>df
      0 1
      0 3 NaN
      1 3 5.0
      >>>df = pd.DataFrame(data="nullable": s1, "nonnullable": s2, dtype=np.int64)


      this last command produces the error ValueError: Cannot convert non-finite values (NA or inf) to integer










      share|improve this question
















      I am using Pandas Dataframes. I have a column from a CSV which is integers mixed in with nulls.



      I am trying to convert this and insert it into Spanner in as generalizable a way as possible(so I can use the same code for future jobs), which reduces my ability to use sentinel variables. However, DFs cannot handle NaNs in a pure int column so you have to use Int64. When I try to insert this into Spanner I get an error that it is not an int64 type, whereas pure Python ints do work. Is there an automatic way to convert Int64 Pandas values to int values during the insert? Converting the column before inserting doesn't work, again, because of the null values. Is there another path around this?



      Trying to convert from a Series goes like so:



      >>>s2=pd.Series([3.0,5.0])
      >>>s2
      0 3.0
      1 5.0
      dtype: float64
      >>>s1=pd.Series([3.0,None])
      >>>s1
      0 3.0
      1 NaN
      dtype: float64
      >>>df = pd.DataFrame(data=[s1,s2], dtype=np.int64)
      >>>df
      0 1
      0 3 NaN
      1 3 5.0
      >>>df = pd.DataFrame(data="nullable": s1, "nonnullable": s2, dtype=np.int64)


      this last command produces the error ValueError: Cannot convert non-finite values (NA or inf) to integer







      python pandas google-cloud-platform google-cloud-spanner






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 26 at 19:13







      WarSame

















      asked Mar 21 at 18:13









      WarSameWarSame

      10910




      10910






















          2 Answers
          2






          active

          oldest

          votes


















          0














          I was unable to reproduce your issue but it seems everyone works as expected



          Is it possible you have a non-nullable column that you are writing null values to?



          Retrieving the schema of a Spanner table



          from google.cloud import spanner

          client = spanner.Client()
          database = client.instance('testinstance').database('testdatabase')
          table_name='inttable'

          query = f'''
          SELECT
          t.column_name,
          t.spanner_type,
          t.is_nullable
          FROM
          information_schema.columns AS t
          WHERE
          t.table_name = 'table_name'
          '''

          with database.snapshot() as snapshot:
          print(list(snapshot.execute_sql(query)))
          # [['nonnullable', 'INT64', 'NO'], ['nullable', 'INT64', 'YES']]


          Inserting to spanner from a Pandas dataframe



          from google.cloud import spanner

          import numpy as np
          import pandas as pd

          client = spanner.Client()
          instance = client.instance('testinstance')
          database = instance.database('testdatabase')


          def insert(df):
          with database.batch() as batch:
          batch.insert(
          table='inttable',
          columns=(
          'nonnullable', 'nullable'),
          values=df.values.tolist()
          )

          print("Succeeds in inserting int rows.")
          d = 'nonnullable': [1, 2], 'nullable': [3, 4]
          df = pd.DataFrame(data=d, dtype=np.int64)
          insert(df)

          print("Succeeds in inserting rows with None in nullable columns.")
          d = 'nonnullable': [3, 4], 'nullable': [None, 6]
          df = pd.DataFrame(data=d, dtype=np.int64)
          insert(df)

          print("Fails (as expected) attempting to insert row with None in a nonnullable column fails as expected")
          d = 'nonnullable': [5, None], 'nullable': [6, 0]
          df = pd.DataFrame(data=d, dtype=np.int64)
          insert(df)
          # Fails with "google.api_core.exceptions.FailedPrecondition: 400 nonnullable must not be NULL in table inttable."





          share|improve this answer























          • I ran into a few problems which I put into the main post. Could you help me understand why this is the case with a Series when doing it with an array seems to work so well? Do I have to use an array?

            – WarSame
            Mar 26 at 18:16



















          0














          My solution was to leave it as NaN(it turns out NaN == 'nan'). Then, at the very end, as I went to insert into the Spanner DB, I replaced all NaN with None in the DF. I used code from another SO answer: df.replace(pd.np.nan: None). Spanner was looking at the NaN as a 'nan' string and rejecting that for insertion into an Int64 column. None is treated as NULL and can get inserted into Spanner with no issue.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55286788%2fsending-pandas-dataframe-with-int64-type-to-gcp-spanner-int64-column%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            I was unable to reproduce your issue but it seems everyone works as expected



            Is it possible you have a non-nullable column that you are writing null values to?



            Retrieving the schema of a Spanner table



            from google.cloud import spanner

            client = spanner.Client()
            database = client.instance('testinstance').database('testdatabase')
            table_name='inttable'

            query = f'''
            SELECT
            t.column_name,
            t.spanner_type,
            t.is_nullable
            FROM
            information_schema.columns AS t
            WHERE
            t.table_name = 'table_name'
            '''

            with database.snapshot() as snapshot:
            print(list(snapshot.execute_sql(query)))
            # [['nonnullable', 'INT64', 'NO'], ['nullable', 'INT64', 'YES']]


            Inserting to spanner from a Pandas dataframe



            from google.cloud import spanner

            import numpy as np
            import pandas as pd

            client = spanner.Client()
            instance = client.instance('testinstance')
            database = instance.database('testdatabase')


            def insert(df):
            with database.batch() as batch:
            batch.insert(
            table='inttable',
            columns=(
            'nonnullable', 'nullable'),
            values=df.values.tolist()
            )

            print("Succeeds in inserting int rows.")
            d = 'nonnullable': [1, 2], 'nullable': [3, 4]
            df = pd.DataFrame(data=d, dtype=np.int64)
            insert(df)

            print("Succeeds in inserting rows with None in nullable columns.")
            d = 'nonnullable': [3, 4], 'nullable': [None, 6]
            df = pd.DataFrame(data=d, dtype=np.int64)
            insert(df)

            print("Fails (as expected) attempting to insert row with None in a nonnullable column fails as expected")
            d = 'nonnullable': [5, None], 'nullable': [6, 0]
            df = pd.DataFrame(data=d, dtype=np.int64)
            insert(df)
            # Fails with "google.api_core.exceptions.FailedPrecondition: 400 nonnullable must not be NULL in table inttable."





            share|improve this answer























            • I ran into a few problems which I put into the main post. Could you help me understand why this is the case with a Series when doing it with an array seems to work so well? Do I have to use an array?

              – WarSame
              Mar 26 at 18:16
















            0














            I was unable to reproduce your issue but it seems everyone works as expected



            Is it possible you have a non-nullable column that you are writing null values to?



            Retrieving the schema of a Spanner table



            from google.cloud import spanner

            client = spanner.Client()
            database = client.instance('testinstance').database('testdatabase')
            table_name='inttable'

            query = f'''
            SELECT
            t.column_name,
            t.spanner_type,
            t.is_nullable
            FROM
            information_schema.columns AS t
            WHERE
            t.table_name = 'table_name'
            '''

            with database.snapshot() as snapshot:
            print(list(snapshot.execute_sql(query)))
            # [['nonnullable', 'INT64', 'NO'], ['nullable', 'INT64', 'YES']]


            Inserting to spanner from a Pandas dataframe



            from google.cloud import spanner

            import numpy as np
            import pandas as pd

            client = spanner.Client()
            instance = client.instance('testinstance')
            database = instance.database('testdatabase')


            def insert(df):
            with database.batch() as batch:
            batch.insert(
            table='inttable',
            columns=(
            'nonnullable', 'nullable'),
            values=df.values.tolist()
            )

            print("Succeeds in inserting int rows.")
            d = 'nonnullable': [1, 2], 'nullable': [3, 4]
            df = pd.DataFrame(data=d, dtype=np.int64)
            insert(df)

            print("Succeeds in inserting rows with None in nullable columns.")
            d = 'nonnullable': [3, 4], 'nullable': [None, 6]
            df = pd.DataFrame(data=d, dtype=np.int64)
            insert(df)

            print("Fails (as expected) attempting to insert row with None in a nonnullable column fails as expected")
            d = 'nonnullable': [5, None], 'nullable': [6, 0]
            df = pd.DataFrame(data=d, dtype=np.int64)
            insert(df)
            # Fails with "google.api_core.exceptions.FailedPrecondition: 400 nonnullable must not be NULL in table inttable."





            share|improve this answer























            • I ran into a few problems which I put into the main post. Could you help me understand why this is the case with a Series when doing it with an array seems to work so well? Do I have to use an array?

              – WarSame
              Mar 26 at 18:16














            0












            0








            0







            I was unable to reproduce your issue but it seems everyone works as expected



            Is it possible you have a non-nullable column that you are writing null values to?



            Retrieving the schema of a Spanner table



            from google.cloud import spanner

            client = spanner.Client()
            database = client.instance('testinstance').database('testdatabase')
            table_name='inttable'

            query = f'''
            SELECT
            t.column_name,
            t.spanner_type,
            t.is_nullable
            FROM
            information_schema.columns AS t
            WHERE
            t.table_name = 'table_name'
            '''

            with database.snapshot() as snapshot:
            print(list(snapshot.execute_sql(query)))
            # [['nonnullable', 'INT64', 'NO'], ['nullable', 'INT64', 'YES']]


            Inserting to spanner from a Pandas dataframe



            from google.cloud import spanner

            import numpy as np
            import pandas as pd

            client = spanner.Client()
            instance = client.instance('testinstance')
            database = instance.database('testdatabase')


            def insert(df):
            with database.batch() as batch:
            batch.insert(
            table='inttable',
            columns=(
            'nonnullable', 'nullable'),
            values=df.values.tolist()
            )

            print("Succeeds in inserting int rows.")
            d = 'nonnullable': [1, 2], 'nullable': [3, 4]
            df = pd.DataFrame(data=d, dtype=np.int64)
            insert(df)

            print("Succeeds in inserting rows with None in nullable columns.")
            d = 'nonnullable': [3, 4], 'nullable': [None, 6]
            df = pd.DataFrame(data=d, dtype=np.int64)
            insert(df)

            print("Fails (as expected) attempting to insert row with None in a nonnullable column fails as expected")
            d = 'nonnullable': [5, None], 'nullable': [6, 0]
            df = pd.DataFrame(data=d, dtype=np.int64)
            insert(df)
            # Fails with "google.api_core.exceptions.FailedPrecondition: 400 nonnullable must not be NULL in table inttable."





            share|improve this answer













            I was unable to reproduce your issue but it seems everyone works as expected



            Is it possible you have a non-nullable column that you are writing null values to?



            Retrieving the schema of a Spanner table



            from google.cloud import spanner

            client = spanner.Client()
            database = client.instance('testinstance').database('testdatabase')
            table_name='inttable'

            query = f'''
            SELECT
            t.column_name,
            t.spanner_type,
            t.is_nullable
            FROM
            information_schema.columns AS t
            WHERE
            t.table_name = 'table_name'
            '''

            with database.snapshot() as snapshot:
            print(list(snapshot.execute_sql(query)))
            # [['nonnullable', 'INT64', 'NO'], ['nullable', 'INT64', 'YES']]


            Inserting to spanner from a Pandas dataframe



            from google.cloud import spanner

            import numpy as np
            import pandas as pd

            client = spanner.Client()
            instance = client.instance('testinstance')
            database = instance.database('testdatabase')


            def insert(df):
            with database.batch() as batch:
            batch.insert(
            table='inttable',
            columns=(
            'nonnullable', 'nullable'),
            values=df.values.tolist()
            )

            print("Succeeds in inserting int rows.")
            d = 'nonnullable': [1, 2], 'nullable': [3, 4]
            df = pd.DataFrame(data=d, dtype=np.int64)
            insert(df)

            print("Succeeds in inserting rows with None in nullable columns.")
            d = 'nonnullable': [3, 4], 'nullable': [None, 6]
            df = pd.DataFrame(data=d, dtype=np.int64)
            insert(df)

            print("Fails (as expected) attempting to insert row with None in a nonnullable column fails as expected")
            d = 'nonnullable': [5, None], 'nullable': [6, 0]
            df = pd.DataFrame(data=d, dtype=np.int64)
            insert(df)
            # Fails with "google.api_core.exceptions.FailedPrecondition: 400 nonnullable must not be NULL in table inttable."






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Mar 26 at 16:48









            Christopher WilcoxChristopher Wilcox

            14115




            14115












            • I ran into a few problems which I put into the main post. Could you help me understand why this is the case with a Series when doing it with an array seems to work so well? Do I have to use an array?

              – WarSame
              Mar 26 at 18:16


















            • I ran into a few problems which I put into the main post. Could you help me understand why this is the case with a Series when doing it with an array seems to work so well? Do I have to use an array?

              – WarSame
              Mar 26 at 18:16

















            I ran into a few problems which I put into the main post. Could you help me understand why this is the case with a Series when doing it with an array seems to work so well? Do I have to use an array?

            – WarSame
            Mar 26 at 18:16






            I ran into a few problems which I put into the main post. Could you help me understand why this is the case with a Series when doing it with an array seems to work so well? Do I have to use an array?

            – WarSame
            Mar 26 at 18:16














            0














            My solution was to leave it as NaN(it turns out NaN == 'nan'). Then, at the very end, as I went to insert into the Spanner DB, I replaced all NaN with None in the DF. I used code from another SO answer: df.replace(pd.np.nan: None). Spanner was looking at the NaN as a 'nan' string and rejecting that for insertion into an Int64 column. None is treated as NULL and can get inserted into Spanner with no issue.






            share|improve this answer



























              0














              My solution was to leave it as NaN(it turns out NaN == 'nan'). Then, at the very end, as I went to insert into the Spanner DB, I replaced all NaN with None in the DF. I used code from another SO answer: df.replace(pd.np.nan: None). Spanner was looking at the NaN as a 'nan' string and rejecting that for insertion into an Int64 column. None is treated as NULL and can get inserted into Spanner with no issue.






              share|improve this answer

























                0












                0








                0







                My solution was to leave it as NaN(it turns out NaN == 'nan'). Then, at the very end, as I went to insert into the Spanner DB, I replaced all NaN with None in the DF. I used code from another SO answer: df.replace(pd.np.nan: None). Spanner was looking at the NaN as a 'nan' string and rejecting that for insertion into an Int64 column. None is treated as NULL and can get inserted into Spanner with no issue.






                share|improve this answer













                My solution was to leave it as NaN(it turns out NaN == 'nan'). Then, at the very end, as I went to insert into the Spanner DB, I replaced all NaN with None in the DF. I used code from another SO answer: df.replace(pd.np.nan: None). Spanner was looking at the NaN as a 'nan' string and rejecting that for insertion into an Int64 column. None is treated as NULL and can get inserted into Spanner with no issue.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Mar 27 at 17:29









                WarSameWarSame

                10910




                10910



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55286788%2fsending-pandas-dataframe-with-int64-type-to-gcp-spanner-int64-column%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

                    Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

                    Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript