Is there a faster way to insert records to postgresql database while iterating over very large ndarray?Update VERY LARGE PostgreSQL database table efficientlyIs there a way to iterate over a dictionary?Postgresql inserts stop at random number of recordsselect “step” recordsWays to iterate over a list in JavaWhats the best way to iterate over circular wrapped indices?Get ID of new record in function trigger before insert on postgreSQLredis vs postgresql for wildcard search of large number of recordsVery large variation of the time elapsed when a small write to PostgresqlNumpy random array limited by other arrays

California: "For quality assurance, this phone call is being recorded"

Is there a way to save this session?

Does nuclear propulsion applied to military ships fall under civil or military nuclear power?

How can a single Member of the House block a Congressional bill?

Is there a rule that prohibits us from using 2 possessives in a row?

Applicants clearly not having the skills they advertise

The most awesome army: 80 men left and 81 returned. Is it true?

What does it mean by "d-ism of Leibniz" and "dotage of Newton" in simple English?

Orientable with respect to complex cobordism?

Expenditure in Poland - Forex doesn't have Zloty

Is there any Biblical Basis for 400 years of silence between Old and New Testament?

Is the world in Game of Thrones spherical or flat?

Future enhancements for the finite element method

Why use water tanks from a retired Space Shuttle?

What is the difference between a game ban and a VAC ban in Steam?

Order by does not work as I expect

Are there regional foods in Westeros?

The qvolume of an integer

Self-Preservation: How to DM NPCs that Love Living?

Bringing Food from Hometown for Out-of-Town Interview?

Asking bank to reduce APR instead of increasing credit limit

What are the problems in teaching guitar via Skype?

When was the word "ambigu" first used with the sense of "meal with all items served at the same time"?

Explain Ant-Man's "not it" scene from Avengers: Endgame



Is there a faster way to insert records to postgresql database while iterating over very large ndarray?


Update VERY LARGE PostgreSQL database table efficientlyIs there a way to iterate over a dictionary?Postgresql inserts stop at random number of recordsselect “step” recordsWays to iterate over a list in JavaWhats the best way to iterate over circular wrapped indices?Get ID of new record in function trigger before insert on postgreSQLredis vs postgresql for wildcard search of large number of recordsVery large variation of the time elapsed when a small write to PostgresqlNumpy random array limited by other arrays






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















I am trying to loop over ndarray to record index and value of it to postgresql. Here is my code:



 for idx, val in enumerate(data):
cur.execute("INSERT INTO public.spams(review_id, label, confidence_level, aoc, created_at) VALUES (%s, %s, %s, %s, %s)", (idx+1, spamlabel, 0, 0, dt.now()))


The size of ndarray is 762k and it tooks more than 8h to insert those values. Is there any more efficient way to do this?










share|improve this question






















  • that doesn't have anything to do with numpy, only the strategy you use with the database library. Which library do you use here? Almost any up-to-date library should support batched INSERT's, which is the way to go here.

    – Ancoron
    Mar 24 at 11:27











  • I am using psycopg2 for postgresql. How can I do batched INSERT's with it according to my ndarray?

    – Mert Koç
    Mar 24 at 11:39

















1















I am trying to loop over ndarray to record index and value of it to postgresql. Here is my code:



 for idx, val in enumerate(data):
cur.execute("INSERT INTO public.spams(review_id, label, confidence_level, aoc, created_at) VALUES (%s, %s, %s, %s, %s)", (idx+1, spamlabel, 0, 0, dt.now()))


The size of ndarray is 762k and it tooks more than 8h to insert those values. Is there any more efficient way to do this?










share|improve this question






















  • that doesn't have anything to do with numpy, only the strategy you use with the database library. Which library do you use here? Almost any up-to-date library should support batched INSERT's, which is the way to go here.

    – Ancoron
    Mar 24 at 11:27











  • I am using psycopg2 for postgresql. How can I do batched INSERT's with it according to my ndarray?

    – Mert Koç
    Mar 24 at 11:39













1












1








1








I am trying to loop over ndarray to record index and value of it to postgresql. Here is my code:



 for idx, val in enumerate(data):
cur.execute("INSERT INTO public.spams(review_id, label, confidence_level, aoc, created_at) VALUES (%s, %s, %s, %s, %s)", (idx+1, spamlabel, 0, 0, dt.now()))


The size of ndarray is 762k and it tooks more than 8h to insert those values. Is there any more efficient way to do this?










share|improve this question














I am trying to loop over ndarray to record index and value of it to postgresql. Here is my code:



 for idx, val in enumerate(data):
cur.execute("INSERT INTO public.spams(review_id, label, confidence_level, aoc, created_at) VALUES (%s, %s, %s, %s, %s)", (idx+1, spamlabel, 0, 0, dt.now()))


The size of ndarray is 762k and it tooks more than 8h to insert those values. Is there any more efficient way to do this?







python-3.x postgresql iteration numpy-ndarray






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 24 at 11:02









Mert KoçMert Koç

253




253












  • that doesn't have anything to do with numpy, only the strategy you use with the database library. Which library do you use here? Almost any up-to-date library should support batched INSERT's, which is the way to go here.

    – Ancoron
    Mar 24 at 11:27











  • I am using psycopg2 for postgresql. How can I do batched INSERT's with it according to my ndarray?

    – Mert Koç
    Mar 24 at 11:39

















  • that doesn't have anything to do with numpy, only the strategy you use with the database library. Which library do you use here? Almost any up-to-date library should support batched INSERT's, which is the way to go here.

    – Ancoron
    Mar 24 at 11:27











  • I am using psycopg2 for postgresql. How can I do batched INSERT's with it according to my ndarray?

    – Mert Koç
    Mar 24 at 11:39
















that doesn't have anything to do with numpy, only the strategy you use with the database library. Which library do you use here? Almost any up-to-date library should support batched INSERT's, which is the way to go here.

– Ancoron
Mar 24 at 11:27





that doesn't have anything to do with numpy, only the strategy you use with the database library. Which library do you use here? Almost any up-to-date library should support batched INSERT's, which is the way to go here.

– Ancoron
Mar 24 at 11:27













I am using psycopg2 for postgresql. How can I do batched INSERT's with it according to my ndarray?

– Mert Koç
Mar 24 at 11:39





I am using psycopg2 for postgresql. How can I do batched INSERT's with it according to my ndarray?

– Mert Koç
Mar 24 at 11:39












1 Answer
1






active

oldest

votes


















0














Use psycopg2's execute_values helper method and also provide constants to limit the data we have to transfer, e.g.:



from psycopg2 import extras

extras.execute_values(
cur,
"INSERT INTO public.spams(review_id, label, confidence_level, aoc, created_at) VALUES %s",
enumerate(data),
template = "(%s + 1, %s, 0, 0, CURRENT_TIMESTAMP)")


You can also experiment with the page_size parameter for further throughput tuning.






share|improve this answer

























  • I have tried what you suggest but extras.execute_values accepts only one %s placeholder in the query. How can I add %s+1 as review_id and %s as val of data?

    – Mert Koç
    Mar 24 at 23:01






  • 1





    Sorry, I was using execute_batch first. Updated for the VALUES template.

    – Ancoron
    Mar 25 at 5:44











  • Thanks for the answer and I think this will work and will try soon but I have another problem here. During the execution, if the server closes the connection or any error occurs the inserts won't be committed and data will be lost. Do you have any suggestions on how can I commit records like part by part (thousands of data)?

    – Mert Koç
    Mar 25 at 15:09







  • 1





    Sorry for the late reply. In that case, you have to split up the data beforehand in chunks and iterate over them. Also, you'd have to make sure that your Python app can "remember" where it left off (which chunk has been committed already).

    – Ancoron
    Mar 28 at 21:59











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55323126%2fis-there-a-faster-way-to-insert-records-to-postgresql-database-while-iterating-o%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














Use psycopg2's execute_values helper method and also provide constants to limit the data we have to transfer, e.g.:



from psycopg2 import extras

extras.execute_values(
cur,
"INSERT INTO public.spams(review_id, label, confidence_level, aoc, created_at) VALUES %s",
enumerate(data),
template = "(%s + 1, %s, 0, 0, CURRENT_TIMESTAMP)")


You can also experiment with the page_size parameter for further throughput tuning.






share|improve this answer

























  • I have tried what you suggest but extras.execute_values accepts only one %s placeholder in the query. How can I add %s+1 as review_id and %s as val of data?

    – Mert Koç
    Mar 24 at 23:01






  • 1





    Sorry, I was using execute_batch first. Updated for the VALUES template.

    – Ancoron
    Mar 25 at 5:44











  • Thanks for the answer and I think this will work and will try soon but I have another problem here. During the execution, if the server closes the connection or any error occurs the inserts won't be committed and data will be lost. Do you have any suggestions on how can I commit records like part by part (thousands of data)?

    – Mert Koç
    Mar 25 at 15:09







  • 1





    Sorry for the late reply. In that case, you have to split up the data beforehand in chunks and iterate over them. Also, you'd have to make sure that your Python app can "remember" where it left off (which chunk has been committed already).

    – Ancoron
    Mar 28 at 21:59















0














Use psycopg2's execute_values helper method and also provide constants to limit the data we have to transfer, e.g.:



from psycopg2 import extras

extras.execute_values(
cur,
"INSERT INTO public.spams(review_id, label, confidence_level, aoc, created_at) VALUES %s",
enumerate(data),
template = "(%s + 1, %s, 0, 0, CURRENT_TIMESTAMP)")


You can also experiment with the page_size parameter for further throughput tuning.






share|improve this answer

























  • I have tried what you suggest but extras.execute_values accepts only one %s placeholder in the query. How can I add %s+1 as review_id and %s as val of data?

    – Mert Koç
    Mar 24 at 23:01






  • 1





    Sorry, I was using execute_batch first. Updated for the VALUES template.

    – Ancoron
    Mar 25 at 5:44











  • Thanks for the answer and I think this will work and will try soon but I have another problem here. During the execution, if the server closes the connection or any error occurs the inserts won't be committed and data will be lost. Do you have any suggestions on how can I commit records like part by part (thousands of data)?

    – Mert Koç
    Mar 25 at 15:09







  • 1





    Sorry for the late reply. In that case, you have to split up the data beforehand in chunks and iterate over them. Also, you'd have to make sure that your Python app can "remember" where it left off (which chunk has been committed already).

    – Ancoron
    Mar 28 at 21:59













0












0








0







Use psycopg2's execute_values helper method and also provide constants to limit the data we have to transfer, e.g.:



from psycopg2 import extras

extras.execute_values(
cur,
"INSERT INTO public.spams(review_id, label, confidence_level, aoc, created_at) VALUES %s",
enumerate(data),
template = "(%s + 1, %s, 0, 0, CURRENT_TIMESTAMP)")


You can also experiment with the page_size parameter for further throughput tuning.






share|improve this answer















Use psycopg2's execute_values helper method and also provide constants to limit the data we have to transfer, e.g.:



from psycopg2 import extras

extras.execute_values(
cur,
"INSERT INTO public.spams(review_id, label, confidence_level, aoc, created_at) VALUES %s",
enumerate(data),
template = "(%s + 1, %s, 0, 0, CURRENT_TIMESTAMP)")


You can also experiment with the page_size parameter for further throughput tuning.







share|improve this answer














share|improve this answer



share|improve this answer








edited Mar 25 at 5:43

























answered Mar 24 at 12:13









AncoronAncoron

1,3101312




1,3101312












  • I have tried what you suggest but extras.execute_values accepts only one %s placeholder in the query. How can I add %s+1 as review_id and %s as val of data?

    – Mert Koç
    Mar 24 at 23:01






  • 1





    Sorry, I was using execute_batch first. Updated for the VALUES template.

    – Ancoron
    Mar 25 at 5:44











  • Thanks for the answer and I think this will work and will try soon but I have another problem here. During the execution, if the server closes the connection or any error occurs the inserts won't be committed and data will be lost. Do you have any suggestions on how can I commit records like part by part (thousands of data)?

    – Mert Koç
    Mar 25 at 15:09







  • 1





    Sorry for the late reply. In that case, you have to split up the data beforehand in chunks and iterate over them. Also, you'd have to make sure that your Python app can "remember" where it left off (which chunk has been committed already).

    – Ancoron
    Mar 28 at 21:59

















  • I have tried what you suggest but extras.execute_values accepts only one %s placeholder in the query. How can I add %s+1 as review_id and %s as val of data?

    – Mert Koç
    Mar 24 at 23:01






  • 1





    Sorry, I was using execute_batch first. Updated for the VALUES template.

    – Ancoron
    Mar 25 at 5:44











  • Thanks for the answer and I think this will work and will try soon but I have another problem here. During the execution, if the server closes the connection or any error occurs the inserts won't be committed and data will be lost. Do you have any suggestions on how can I commit records like part by part (thousands of data)?

    – Mert Koç
    Mar 25 at 15:09







  • 1





    Sorry for the late reply. In that case, you have to split up the data beforehand in chunks and iterate over them. Also, you'd have to make sure that your Python app can "remember" where it left off (which chunk has been committed already).

    – Ancoron
    Mar 28 at 21:59
















I have tried what you suggest but extras.execute_values accepts only one %s placeholder in the query. How can I add %s+1 as review_id and %s as val of data?

– Mert Koç
Mar 24 at 23:01





I have tried what you suggest but extras.execute_values accepts only one %s placeholder in the query. How can I add %s+1 as review_id and %s as val of data?

– Mert Koç
Mar 24 at 23:01




1




1





Sorry, I was using execute_batch first. Updated for the VALUES template.

– Ancoron
Mar 25 at 5:44





Sorry, I was using execute_batch first. Updated for the VALUES template.

– Ancoron
Mar 25 at 5:44













Thanks for the answer and I think this will work and will try soon but I have another problem here. During the execution, if the server closes the connection or any error occurs the inserts won't be committed and data will be lost. Do you have any suggestions on how can I commit records like part by part (thousands of data)?

– Mert Koç
Mar 25 at 15:09






Thanks for the answer and I think this will work and will try soon but I have another problem here. During the execution, if the server closes the connection or any error occurs the inserts won't be committed and data will be lost. Do you have any suggestions on how can I commit records like part by part (thousands of data)?

– Mert Koç
Mar 25 at 15:09





1




1





Sorry for the late reply. In that case, you have to split up the data beforehand in chunks and iterate over them. Also, you'd have to make sure that your Python app can "remember" where it left off (which chunk has been committed already).

– Ancoron
Mar 28 at 21:59





Sorry for the late reply. In that case, you have to split up the data beforehand in chunks and iterate over them. Also, you'd have to make sure that your Python app can "remember" where it left off (which chunk has been committed already).

– Ancoron
Mar 28 at 21:59



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55323126%2fis-there-a-faster-way-to-insert-records-to-postgresql-database-while-iterating-o%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript