Data processing using AWS GlueOptimize data upload on GoogleBigQueryTechniques for working with very large Pandas data frames or csv files in PythonProcessing a large SQL query in Python using Pandas?AWS Glue Job running too slowProblem loading csv into DataFrame in PySparkHow to write user-defined function in AWS-Glue Script?Best way to read in part of a huge table to AWS GLUEAdding data columns in AWS GlueAWS Glue and Python Integration

Is there a maximum distance from a planet that a moon can orbit?

STM Microcontroller burns every time

Can White Castle? #2

Hot coffee brewing solutions for deep woods camping

What do you call a weak person's act of taking on bigger opponents?

Trainee keeps missing deadlines for independent learning

Abel-Jacobi map on symmetric product of genus 4 curve

Do French speakers not use the subjunctive informally?

How to add multiple ip address in destination ip in acl rule

How long would it take to cross the Channel in 1890's?

Why are there so many 'vimrcs'?

What is the legal status of travelling with (unprescribed) methadone in your carry-on?

First-year PhD giving a talk among well-established researchers in the field

What are the penalties for overstaying in USA?

Require advice on power conservation for backpacking trip

How does a blind passenger not die, if driver becomes unconscious

Unusual mail headers, evidence of an attempted attack. Have I been pwned?

Inverse-quotes-quine

Changing the opacity of lines on a plot based on their value

Should I include salary information on my CV?

When is it ok to add filler to a story?

Are all instances of trolls turning to stone ultimately references back to Tolkien?

Should my manager be aware of private LinkedIn approaches I receive? How to politely have this happen?

Why aren't (poly-)cotton tents more popular?

Data processing using AWS Glue

Optimize data upload on GoogleBigQueryTechniques for working with very large Pandas data frames or csv files in PythonProcessing a large SQL query in Python using Pandas?AWS Glue Job running too slowProblem loading csv into DataFrame in PySparkHow to write user-defined function in AWS-Glue Script?Best way to read in part of a huge table to AWS GLUEAdding data columns in AWS GlueAWS Glue and Python Integration

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I am trying use pySpark AWS Glue for data processing/data cleaning. Data is in CSV format and saved in S3. Data has around 7k columns and 7k rows.
Cleaning is a set of rules in another CSV, need to loop through each rule and query the data frame based on the condition, update the data based on action.

I loaded the data in a data frame and for cleaning it take more than 3 hours.

How can i improve the performance? how can i parallelise the cleaning? In normal python, i can divide the data into chunks and apply cleaning rules parallely for chunk.

Please suggest if AWS Glue is suited for this.

Regards
MaX

asked Mar 25 at 9:55

mAx

105 bronze badges

Explain your scenario with sample data and rule, it will help others understand your question and give you a better response. If you are using dataframe in AWS glue and partitioning the data, you can process the data in parallel.

– Abraham
Mar 26 at 18:34

add a comment |

I loaded the data in a data frame and for cleaning it take more than 3 hours.

How can i improve the performance? how can i parallelise the cleaning? In normal python, i can divide the data into chunks and apply cleaning rules parallely for chunk.

Please suggest if AWS Glue is suited for this.

Regards
MaX

asked Mar 25 at 9:55

mAx

105 bronze badges

Explain your scenario with sample data and rule, it will help others understand your question and give you a better response. If you are using dataframe in AWS glue and partitioning the data, you can process the data in parallel.

– Abraham
Mar 26 at 18:34

add a comment |

I loaded the data in a data frame and for cleaning it take more than 3 hours.

How can i improve the performance? how can i parallelise the cleaning? In normal python, i can divide the data into chunks and apply cleaning rules parallely for chunk.

Please suggest if AWS Glue is suited for this.

Regards
MaX

asked Mar 25 at 9:55

mAx

105 bronze badges

I loaded the data in a data frame and for cleaning it take more than 3 hours.

How can i improve the performance? how can i parallelise the cleaning? In normal python, i can divide the data into chunks and apply cleaning rules parallely for chunk.

Please suggest if AWS Glue is suited for this.

Regards
MaX

bigdata aws-glue

asked Mar 25 at 9:55

mAx

105 bronze badges

asked Mar 25 at 9:55

mAx

105 bronze badges

asked Mar 25 at 9:55

mAx

105 bronze badges

asked Mar 25 at 9:55

mAx

105 bronze badges

asked Mar 25 at 9:55

mAx

105 bronze badges

Explain your scenario with sample data and rule, it will help others understand your question and give you a better response. If you are using dataframe in AWS glue and partitioning the data, you can process the data in parallel.

– Abraham
Mar 26 at 18:34

add a comment |

Explain your scenario with sample data and rule, it will help others understand your question and give you a better response. If you are using dataframe in AWS glue and partitioning the data, you can process the data in parallel.

– Abraham
Mar 26 at 18:34

Explain your scenario with sample data and rule, it will help others understand your question and give you a better response. If you are using dataframe in AWS glue and partitioning the data, you can process the data in parallel.

– Abraham
Mar 26 at 18:34

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55335156%2fdata-processing-using-aws-glue%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴