Sending Pandas Dataframe with Int64 type to GCP Spanner INT64 column The Next CEO of Stack OverflowAdd one row to pandas DataFrameSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column nameHow to drop rows of Pandas DataFrame whose value in certain columns is NaNChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers
Why, when going from special to general relativity, do we just replace partial derivatives with covariant derivatives?
Example of a Mathematician/Physicist whose Other Publications during their PhD eclipsed their PhD Thesis
Running a General Election and the European Elections together
Grabbing quick drinks
TikZ: How to reverse arrow direction without switching start/end point?
Can you be charged for obstruction for refusing to answer questions?
Chain wire methods together in Lightning Web Components
How to get from Geneva Airport to Metabief, Doubs, France by public transport?
If the heap is zero-initialized for security, then why is the stack merely uninitialized?
Is there a way to save my career from absolute disaster?
Easy to read palindrome checker
Should I tutor a student who I know has cheated on their homework?
Why the difference in type-inference over the as-pattern in two similar function definitions?
Calculator final project in Python
Flying from Cape Town to England and return to another province
How I can get glyphs from a fraktur font and use them as identifiers?
A small doubt about the dominated convergence theorem
Why is the US ranked as #45 in Press Freedom ratings, despite its extremely permissive free speech laws?
How did people program for Consoles with multiple CPUs?
How to avoid supervisors with prejudiced views?
How many extra stops do monopods offer for tele photographs?
How to count occurrences of text in a file?
Would a completely good Muggle be able to use a wand?
What connection does MS Office have to Netscape Navigator?
Sending Pandas Dataframe with Int64 type to GCP Spanner INT64 column
The Next CEO of Stack OverflowAdd one row to pandas DataFrameSelecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column nameHow to drop rows of Pandas DataFrame whose value in certain columns is NaNChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headers
I am using Pandas Dataframes. I have a column from a CSV which is integers mixed in with nulls.
I am trying to convert this and insert it into Spanner in as generalizable a way as possible(so I can use the same code for future jobs), which reduces my ability to use sentinel variables. However, DFs cannot handle NaN
s in a pure int column so you have to use Int64
. When I try to insert this into Spanner I get an error that it is not an int64
type, whereas pure Python int
s do work. Is there an automatic way to convert Int64
Pandas values to int
values during the insert? Converting the column before inserting doesn't work, again, because of the null values. Is there another path around this?
Trying to convert from a Series goes like so:
>>>s2=pd.Series([3.0,5.0])
>>>s2
0 3.0
1 5.0
dtype: float64
>>>s1=pd.Series([3.0,None])
>>>s1
0 3.0
1 NaN
dtype: float64
>>>df = pd.DataFrame(data=[s1,s2], dtype=np.int64)
>>>df
0 1
0 3 NaN
1 3 5.0
>>>df = pd.DataFrame(data="nullable": s1, "nonnullable": s2, dtype=np.int64)
this last command produces the error ValueError: Cannot convert non-finite values (NA or inf) to integer
python pandas google-cloud-platform google-cloud-spanner
add a comment |
I am using Pandas Dataframes. I have a column from a CSV which is integers mixed in with nulls.
I am trying to convert this and insert it into Spanner in as generalizable a way as possible(so I can use the same code for future jobs), which reduces my ability to use sentinel variables. However, DFs cannot handle NaN
s in a pure int column so you have to use Int64
. When I try to insert this into Spanner I get an error that it is not an int64
type, whereas pure Python int
s do work. Is there an automatic way to convert Int64
Pandas values to int
values during the insert? Converting the column before inserting doesn't work, again, because of the null values. Is there another path around this?
Trying to convert from a Series goes like so:
>>>s2=pd.Series([3.0,5.0])
>>>s2
0 3.0
1 5.0
dtype: float64
>>>s1=pd.Series([3.0,None])
>>>s1
0 3.0
1 NaN
dtype: float64
>>>df = pd.DataFrame(data=[s1,s2], dtype=np.int64)
>>>df
0 1
0 3 NaN
1 3 5.0
>>>df = pd.DataFrame(data="nullable": s1, "nonnullable": s2, dtype=np.int64)
this last command produces the error ValueError: Cannot convert non-finite values (NA or inf) to integer
python pandas google-cloud-platform google-cloud-spanner
add a comment |
I am using Pandas Dataframes. I have a column from a CSV which is integers mixed in with nulls.
I am trying to convert this and insert it into Spanner in as generalizable a way as possible(so I can use the same code for future jobs), which reduces my ability to use sentinel variables. However, DFs cannot handle NaN
s in a pure int column so you have to use Int64
. When I try to insert this into Spanner I get an error that it is not an int64
type, whereas pure Python int
s do work. Is there an automatic way to convert Int64
Pandas values to int
values during the insert? Converting the column before inserting doesn't work, again, because of the null values. Is there another path around this?
Trying to convert from a Series goes like so:
>>>s2=pd.Series([3.0,5.0])
>>>s2
0 3.0
1 5.0
dtype: float64
>>>s1=pd.Series([3.0,None])
>>>s1
0 3.0
1 NaN
dtype: float64
>>>df = pd.DataFrame(data=[s1,s2], dtype=np.int64)
>>>df
0 1
0 3 NaN
1 3 5.0
>>>df = pd.DataFrame(data="nullable": s1, "nonnullable": s2, dtype=np.int64)
this last command produces the error ValueError: Cannot convert non-finite values (NA or inf) to integer
python pandas google-cloud-platform google-cloud-spanner
I am using Pandas Dataframes. I have a column from a CSV which is integers mixed in with nulls.
I am trying to convert this and insert it into Spanner in as generalizable a way as possible(so I can use the same code for future jobs), which reduces my ability to use sentinel variables. However, DFs cannot handle NaN
s in a pure int column so you have to use Int64
. When I try to insert this into Spanner I get an error that it is not an int64
type, whereas pure Python int
s do work. Is there an automatic way to convert Int64
Pandas values to int
values during the insert? Converting the column before inserting doesn't work, again, because of the null values. Is there another path around this?
Trying to convert from a Series goes like so:
>>>s2=pd.Series([3.0,5.0])
>>>s2
0 3.0
1 5.0
dtype: float64
>>>s1=pd.Series([3.0,None])
>>>s1
0 3.0
1 NaN
dtype: float64
>>>df = pd.DataFrame(data=[s1,s2], dtype=np.int64)
>>>df
0 1
0 3 NaN
1 3 5.0
>>>df = pd.DataFrame(data="nullable": s1, "nonnullable": s2, dtype=np.int64)
this last command produces the error ValueError: Cannot convert non-finite values (NA or inf) to integer
python pandas google-cloud-platform google-cloud-spanner
python pandas google-cloud-platform google-cloud-spanner
edited Mar 26 at 19:13
WarSame
asked Mar 21 at 18:13
WarSameWarSame
10910
10910
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
I was unable to reproduce your issue but it seems everyone works as expected
Is it possible you have a non-nullable column that you are writing null values to?
Retrieving the schema of a Spanner table
from google.cloud import spanner
client = spanner.Client()
database = client.instance('testinstance').database('testdatabase')
table_name='inttable'
query = f'''
SELECT
t.column_name,
t.spanner_type,
t.is_nullable
FROM
information_schema.columns AS t
WHERE
t.table_name = 'table_name'
'''
with database.snapshot() as snapshot:
print(list(snapshot.execute_sql(query)))
# [['nonnullable', 'INT64', 'NO'], ['nullable', 'INT64', 'YES']]
Inserting to spanner from a Pandas dataframe
from google.cloud import spanner
import numpy as np
import pandas as pd
client = spanner.Client()
instance = client.instance('testinstance')
database = instance.database('testdatabase')
def insert(df):
with database.batch() as batch:
batch.insert(
table='inttable',
columns=(
'nonnullable', 'nullable'),
values=df.values.tolist()
)
print("Succeeds in inserting int rows.")
d = 'nonnullable': [1, 2], 'nullable': [3, 4]
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)
print("Succeeds in inserting rows with None in nullable columns.")
d = 'nonnullable': [3, 4], 'nullable': [None, 6]
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)
print("Fails (as expected) attempting to insert row with None in a nonnullable column fails as expected")
d = 'nonnullable': [5, None], 'nullable': [6, 0]
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)
# Fails with "google.api_core.exceptions.FailedPrecondition: 400 nonnullable must not be NULL in table inttable."
I ran into a few problems which I put into the main post. Could you help me understand why this is the case with a Series when doing it with an array seems to work so well? Do I have to use an array?
– WarSame
Mar 26 at 18:16
add a comment |
My solution was to leave it as NaN
(it turns out NaN == 'nan'
). Then, at the very end, as I went to insert into the Spanner DB, I replaced all NaN
with None
in the DF. I used code from another SO answer: df.replace(pd.np.nan: None)
. Spanner was looking at the NaN
as a 'nan'
string and rejecting that for insertion into an Int64 column. None
is treated as NULL
and can get inserted into Spanner with no issue.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55286788%2fsending-pandas-dataframe-with-int64-type-to-gcp-spanner-int64-column%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I was unable to reproduce your issue but it seems everyone works as expected
Is it possible you have a non-nullable column that you are writing null values to?
Retrieving the schema of a Spanner table
from google.cloud import spanner
client = spanner.Client()
database = client.instance('testinstance').database('testdatabase')
table_name='inttable'
query = f'''
SELECT
t.column_name,
t.spanner_type,
t.is_nullable
FROM
information_schema.columns AS t
WHERE
t.table_name = 'table_name'
'''
with database.snapshot() as snapshot:
print(list(snapshot.execute_sql(query)))
# [['nonnullable', 'INT64', 'NO'], ['nullable', 'INT64', 'YES']]
Inserting to spanner from a Pandas dataframe
from google.cloud import spanner
import numpy as np
import pandas as pd
client = spanner.Client()
instance = client.instance('testinstance')
database = instance.database('testdatabase')
def insert(df):
with database.batch() as batch:
batch.insert(
table='inttable',
columns=(
'nonnullable', 'nullable'),
values=df.values.tolist()
)
print("Succeeds in inserting int rows.")
d = 'nonnullable': [1, 2], 'nullable': [3, 4]
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)
print("Succeeds in inserting rows with None in nullable columns.")
d = 'nonnullable': [3, 4], 'nullable': [None, 6]
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)
print("Fails (as expected) attempting to insert row with None in a nonnullable column fails as expected")
d = 'nonnullable': [5, None], 'nullable': [6, 0]
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)
# Fails with "google.api_core.exceptions.FailedPrecondition: 400 nonnullable must not be NULL in table inttable."
I ran into a few problems which I put into the main post. Could you help me understand why this is the case with a Series when doing it with an array seems to work so well? Do I have to use an array?
– WarSame
Mar 26 at 18:16
add a comment |
I was unable to reproduce your issue but it seems everyone works as expected
Is it possible you have a non-nullable column that you are writing null values to?
Retrieving the schema of a Spanner table
from google.cloud import spanner
client = spanner.Client()
database = client.instance('testinstance').database('testdatabase')
table_name='inttable'
query = f'''
SELECT
t.column_name,
t.spanner_type,
t.is_nullable
FROM
information_schema.columns AS t
WHERE
t.table_name = 'table_name'
'''
with database.snapshot() as snapshot:
print(list(snapshot.execute_sql(query)))
# [['nonnullable', 'INT64', 'NO'], ['nullable', 'INT64', 'YES']]
Inserting to spanner from a Pandas dataframe
from google.cloud import spanner
import numpy as np
import pandas as pd
client = spanner.Client()
instance = client.instance('testinstance')
database = instance.database('testdatabase')
def insert(df):
with database.batch() as batch:
batch.insert(
table='inttable',
columns=(
'nonnullable', 'nullable'),
values=df.values.tolist()
)
print("Succeeds in inserting int rows.")
d = 'nonnullable': [1, 2], 'nullable': [3, 4]
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)
print("Succeeds in inserting rows with None in nullable columns.")
d = 'nonnullable': [3, 4], 'nullable': [None, 6]
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)
print("Fails (as expected) attempting to insert row with None in a nonnullable column fails as expected")
d = 'nonnullable': [5, None], 'nullable': [6, 0]
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)
# Fails with "google.api_core.exceptions.FailedPrecondition: 400 nonnullable must not be NULL in table inttable."
I ran into a few problems which I put into the main post. Could you help me understand why this is the case with a Series when doing it with an array seems to work so well? Do I have to use an array?
– WarSame
Mar 26 at 18:16
add a comment |
I was unable to reproduce your issue but it seems everyone works as expected
Is it possible you have a non-nullable column that you are writing null values to?
Retrieving the schema of a Spanner table
from google.cloud import spanner
client = spanner.Client()
database = client.instance('testinstance').database('testdatabase')
table_name='inttable'
query = f'''
SELECT
t.column_name,
t.spanner_type,
t.is_nullable
FROM
information_schema.columns AS t
WHERE
t.table_name = 'table_name'
'''
with database.snapshot() as snapshot:
print(list(snapshot.execute_sql(query)))
# [['nonnullable', 'INT64', 'NO'], ['nullable', 'INT64', 'YES']]
Inserting to spanner from a Pandas dataframe
from google.cloud import spanner
import numpy as np
import pandas as pd
client = spanner.Client()
instance = client.instance('testinstance')
database = instance.database('testdatabase')
def insert(df):
with database.batch() as batch:
batch.insert(
table='inttable',
columns=(
'nonnullable', 'nullable'),
values=df.values.tolist()
)
print("Succeeds in inserting int rows.")
d = 'nonnullable': [1, 2], 'nullable': [3, 4]
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)
print("Succeeds in inserting rows with None in nullable columns.")
d = 'nonnullable': [3, 4], 'nullable': [None, 6]
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)
print("Fails (as expected) attempting to insert row with None in a nonnullable column fails as expected")
d = 'nonnullable': [5, None], 'nullable': [6, 0]
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)
# Fails with "google.api_core.exceptions.FailedPrecondition: 400 nonnullable must not be NULL in table inttable."
I was unable to reproduce your issue but it seems everyone works as expected
Is it possible you have a non-nullable column that you are writing null values to?
Retrieving the schema of a Spanner table
from google.cloud import spanner
client = spanner.Client()
database = client.instance('testinstance').database('testdatabase')
table_name='inttable'
query = f'''
SELECT
t.column_name,
t.spanner_type,
t.is_nullable
FROM
information_schema.columns AS t
WHERE
t.table_name = 'table_name'
'''
with database.snapshot() as snapshot:
print(list(snapshot.execute_sql(query)))
# [['nonnullable', 'INT64', 'NO'], ['nullable', 'INT64', 'YES']]
Inserting to spanner from a Pandas dataframe
from google.cloud import spanner
import numpy as np
import pandas as pd
client = spanner.Client()
instance = client.instance('testinstance')
database = instance.database('testdatabase')
def insert(df):
with database.batch() as batch:
batch.insert(
table='inttable',
columns=(
'nonnullable', 'nullable'),
values=df.values.tolist()
)
print("Succeeds in inserting int rows.")
d = 'nonnullable': [1, 2], 'nullable': [3, 4]
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)
print("Succeeds in inserting rows with None in nullable columns.")
d = 'nonnullable': [3, 4], 'nullable': [None, 6]
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)
print("Fails (as expected) attempting to insert row with None in a nonnullable column fails as expected")
d = 'nonnullable': [5, None], 'nullable': [6, 0]
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)
# Fails with "google.api_core.exceptions.FailedPrecondition: 400 nonnullable must not be NULL in table inttable."
answered Mar 26 at 16:48
Christopher WilcoxChristopher Wilcox
14115
14115
I ran into a few problems which I put into the main post. Could you help me understand why this is the case with a Series when doing it with an array seems to work so well? Do I have to use an array?
– WarSame
Mar 26 at 18:16
add a comment |
I ran into a few problems which I put into the main post. Could you help me understand why this is the case with a Series when doing it with an array seems to work so well? Do I have to use an array?
– WarSame
Mar 26 at 18:16
I ran into a few problems which I put into the main post. Could you help me understand why this is the case with a Series when doing it with an array seems to work so well? Do I have to use an array?
– WarSame
Mar 26 at 18:16
I ran into a few problems which I put into the main post. Could you help me understand why this is the case with a Series when doing it with an array seems to work so well? Do I have to use an array?
– WarSame
Mar 26 at 18:16
add a comment |
My solution was to leave it as NaN
(it turns out NaN == 'nan'
). Then, at the very end, as I went to insert into the Spanner DB, I replaced all NaN
with None
in the DF. I used code from another SO answer: df.replace(pd.np.nan: None)
. Spanner was looking at the NaN
as a 'nan'
string and rejecting that for insertion into an Int64 column. None
is treated as NULL
and can get inserted into Spanner with no issue.
add a comment |
My solution was to leave it as NaN
(it turns out NaN == 'nan'
). Then, at the very end, as I went to insert into the Spanner DB, I replaced all NaN
with None
in the DF. I used code from another SO answer: df.replace(pd.np.nan: None)
. Spanner was looking at the NaN
as a 'nan'
string and rejecting that for insertion into an Int64 column. None
is treated as NULL
and can get inserted into Spanner with no issue.
add a comment |
My solution was to leave it as NaN
(it turns out NaN == 'nan'
). Then, at the very end, as I went to insert into the Spanner DB, I replaced all NaN
with None
in the DF. I used code from another SO answer: df.replace(pd.np.nan: None)
. Spanner was looking at the NaN
as a 'nan'
string and rejecting that for insertion into an Int64 column. None
is treated as NULL
and can get inserted into Spanner with no issue.
My solution was to leave it as NaN
(it turns out NaN == 'nan'
). Then, at the very end, as I went to insert into the Spanner DB, I replaced all NaN
with None
in the DF. I used code from another SO answer: df.replace(pd.np.nan: None)
. Spanner was looking at the NaN
as a 'nan'
string and rejecting that for insertion into an Int64 column. None
is treated as NULL
and can get inserted into Spanner with no issue.
answered Mar 27 at 17:29
WarSameWarSame
10910
10910
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55286788%2fsending-pandas-dataframe-with-int64-type-to-gcp-spanner-int64-column%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown