PySpark: Create a column with randomly selected primary keys from another dataframepySpark Create DataFrame from RDD with Key/ValuePySpark add a column to a DataFrame from a TimeStampType columnAdd columns on a Pyspark DataframeCreate a dataframe from a hashmap with keys as column names and values as rows in SparkPySpark adding values to one DataFrame based on columns of 2nd DataFrameSelect columns in Pyspark DataframeSelect all columns of a dataframe as a StructTypeRemove values from pyspark dataframePyspark create DataFrame from rows/data with varying columns

2 weeks and a tight budget to prepare for Z-day. How long can I hunker down?

Why is it considered acid rain with pH <5.6?

What language is Raven using for her attack in the new 52?

Must a song using the A minor scale begin or end with an Am chord? If not, how can I tell what the scale is?

Why isn't there any 9.5 digit multimeter or higher?

What container to use to store developer concentrate?

Why would a pilot use ailerons for countering asymmetric thrust in mid-flight?

Can an Oathbreaker Paladin reform and choose a different Paladin subclass?

Anti-cheating: should there be a limit to a number of toilet breaks per game per player?

How can Paypal know my card is being used in another account?

What is this 4 sharp symbol and what does it mean?

How did the SysRq key get onto modern keyboards if it's rarely used?

Are there any unpublished Iain M. Banks short stories?

What are the closest international airports in different countries?

Move the outer key inward in an association

Why does Canada require mandatory bilingualism in a lot of federal government posts?

Does dual boot harms laptop battery or reduces it's life?

Dual nationality and return to US the day the US Passport expires

Why radial coordinate of a particle must decrease continuously once it is inside the Schwarzschild radius?

Should I accept an invitation to give a talk from someone who might review my proposal?

Japanese reading of an integer

How can I kill my goat?

What is the most efficient way to write 'for' loops in Matlab?

Name These Animals



PySpark: Create a column with randomly selected primary keys from another dataframe


pySpark Create DataFrame from RDD with Key/ValuePySpark add a column to a DataFrame from a TimeStampType columnAdd columns on a Pyspark DataframeCreate a dataframe from a hashmap with keys as column names and values as rows in SparkPySpark adding values to one DataFrame based on columns of 2nd DataFrameSelect columns in Pyspark DataframeSelect all columns of a dataframe as a StructTypeRemove values from pyspark dataframePyspark create DataFrame from rows/data with varying columns






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















I have two dataframes, A and B.



A has a primary key, key_a.



I want to create a column of foreign keys on B that selects a random key from a.



So B will look like



key_b | key_a
1 | 1234123
2 | 5424352


etc...



The keys can repeat, but the goal is for every row of B to be assigned a random value from A's key_a column.










share|improve this question
























  • Is A small enough to broadcast?

    – pault
    Mar 26 at 19:47











  • @pault I'm not familiar with broadcasting but A is roughly 10,000 rows.

    – nao
    Mar 26 at 19:50











  • How about selecting sample by fraction and setting key_b? Like df_A.sample(0.5).withColumn("key_b", monotonically_increasing_id())

    – Sailesh Kotha
    Mar 26 at 19:54


















0















I have two dataframes, A and B.



A has a primary key, key_a.



I want to create a column of foreign keys on B that selects a random key from a.



So B will look like



key_b | key_a
1 | 1234123
2 | 5424352


etc...



The keys can repeat, but the goal is for every row of B to be assigned a random value from A's key_a column.










share|improve this question
























  • Is A small enough to broadcast?

    – pault
    Mar 26 at 19:47











  • @pault I'm not familiar with broadcasting but A is roughly 10,000 rows.

    – nao
    Mar 26 at 19:50











  • How about selecting sample by fraction and setting key_b? Like df_A.sample(0.5).withColumn("key_b", monotonically_increasing_id())

    – Sailesh Kotha
    Mar 26 at 19:54














0












0








0








I have two dataframes, A and B.



A has a primary key, key_a.



I want to create a column of foreign keys on B that selects a random key from a.



So B will look like



key_b | key_a
1 | 1234123
2 | 5424352


etc...



The keys can repeat, but the goal is for every row of B to be assigned a random value from A's key_a column.










share|improve this question














I have two dataframes, A and B.



A has a primary key, key_a.



I want to create a column of foreign keys on B that selects a random key from a.



So B will look like



key_b | key_a
1 | 1234123
2 | 5424352


etc...



The keys can repeat, but the goal is for every row of B to be assigned a random value from A's key_a column.







apache-spark pyspark apache-spark-sql






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 26 at 19:33









naonao

4465 silver badges20 bronze badges




4465 silver badges20 bronze badges















  • Is A small enough to broadcast?

    – pault
    Mar 26 at 19:47











  • @pault I'm not familiar with broadcasting but A is roughly 10,000 rows.

    – nao
    Mar 26 at 19:50











  • How about selecting sample by fraction and setting key_b? Like df_A.sample(0.5).withColumn("key_b", monotonically_increasing_id())

    – Sailesh Kotha
    Mar 26 at 19:54


















  • Is A small enough to broadcast?

    – pault
    Mar 26 at 19:47











  • @pault I'm not familiar with broadcasting but A is roughly 10,000 rows.

    – nao
    Mar 26 at 19:50











  • How about selecting sample by fraction and setting key_b? Like df_A.sample(0.5).withColumn("key_b", monotonically_increasing_id())

    – Sailesh Kotha
    Mar 26 at 19:54

















Is A small enough to broadcast?

– pault
Mar 26 at 19:47





Is A small enough to broadcast?

– pault
Mar 26 at 19:47













@pault I'm not familiar with broadcasting but A is roughly 10,000 rows.

– nao
Mar 26 at 19:50





@pault I'm not familiar with broadcasting but A is roughly 10,000 rows.

– nao
Mar 26 at 19:50













How about selecting sample by fraction and setting key_b? Like df_A.sample(0.5).withColumn("key_b", monotonically_increasing_id())

– Sailesh Kotha
Mar 26 at 19:54






How about selecting sample by fraction and setting key_b? Like df_A.sample(0.5).withColumn("key_b", monotonically_increasing_id())

– Sailesh Kotha
Mar 26 at 19:54













0






active

oldest

votes










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55364988%2fpyspark-create-a-column-with-randomly-selected-primary-keys-from-another-datafr%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes




Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.







Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55364988%2fpyspark-create-a-column-with-randomly-selected-primary-keys-from-another-datafr%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현