New to Pyspark - importing a CSV and creating a parquet file with array columnsHow to import other Python files?Importing files from different folderAdding new column to existing DataFrame in Python pandasPandas writing dataframe to CSV fileNullable field is changed upon writing a Spark DataframeScala - How to avoid java.lang.IllegalArgumentException when Row.get(i) would retrieve a nullHow to extract XML string from parquet columnHow do I apply schema with nullable = false to json readingIgnore missing values when writing to parquet in pysparkpyspark load csv file into dataframe using a schema

Why is my log file so massive? 22gb. I am running log backups

What does "enim et" mean?

Is it wise to hold on to stock that has plummeted and then stabilized?

What does 'script /dev/null' do?

Landlord wants to switch my lease to a "Land contract" to "get back at the city"

What is GPS' 19 year rollover and does it present a cybersecurity issue?

Why do we use polarized capacitors?

Is there a name of the flying bionic bird?

How did the USSR manage to innovate in an environment characterized by government censorship and high bureaucracy?

Are objects structures and/or vice versa?

What happens when a metallic dragon and a chromatic dragon mate?

How to deal with fear of taking dependencies

Extreme, but not acceptable situation and I can't start the work tomorrow morning

Is domain driven design an anti-SQL pattern?

Can I find out the caloric content of bread by dehydrating it?

Why is the design of haulage companies so “special”?

Re-submission of rejected manuscript without informing co-authors

Is there any use for defining additional entity types in a SOQL FROM clause?

Does the average primeness of natural numbers tend to zero?

aging parents with no investments

What do you call something that goes against the spirit of the law, but is legal when interpreting the law to the letter?

Calculate Levenshtein distance between two strings in Python

If a centaur druid Wild Shapes into a Giant Elk, do their Charge features stack?

Does a dangling wire really electrocute me if I'm standing in water?



New to Pyspark - importing a CSV and creating a parquet file with array columns


How to import other Python files?Importing files from different folderAdding new column to existing DataFrame in Python pandasPandas writing dataframe to CSV fileNullable field is changed upon writing a Spark DataframeScala - How to avoid java.lang.IllegalArgumentException when Row.get(i) would retrieve a nullHow to extract XML string from parquet columnHow do I apply schema with nullable = false to json readingIgnore missing values when writing to parquet in pysparkpyspark load csv file into dataframe using a schema






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I am new to Pyspark and I've been pulling my hair out trying to accomplish something I believe is fairly simple. I am trying to do an ETL process where a csv file is converted to a parquet file. The CSV file has a few simple columns, but one column is a delimited array of integers that I want to expand/unzip into a parquet file. This parquet file is actually used by a .net core micro service which uses a Parquet Reader to do calculations downstream. To keep this question simple, the structure of the column is:



"geomap" 5:3:7|4:2:1|8:2:78 -> this represents an array of 3 items, it is split at the "|" and then a tuple is build of the values (5,3,7), (4,2,1), (8,2,78)



I have tried various processes and schemas and I can't get this correct. Via UDF I am creating either a list of lists or a list of tuple, but I can't get the schema correct or unzip explode the data into the parquet write operation. I either get nulls, an error or other problems. Do I need to approach this differently? Relevant code is below. I am just showing the problem column for simplicity since I have the rest working. This is my first Pyspark attempt, so apologies for missing something obvious:



def convert_geo(geo):
return [tuple(x.split(':')) for x in geo.split('|')]

compression_type = 'snappy'

schema = ArrayType(StructType([
StructField("c1", IntegerType(), False),
StructField("c2", IntegerType(), False),
StructField("c3", IntegerType(), False)
]))

spark_convert_geo = udf(lambda z: convert_geo(z),schema)

source_path = '...path to csv'
destination_path = 'path for generated parquet file'

df = spark.read.option('delimiter',',').option('header','true').csv(source_path).withColumn("geomap",spark_convert_geo(col('geomap')).alias("geomap"))
df.write.mode("overwrite").format('parquet').option('compression', compression_type).save(destination_path)


EDIT: Per request adding the printSchema() output, I'm not sure what's wrong in here either. I still can't seem to get the string split values to show up or render properly. This contains all the columns. I do see the c1 and c2 and c3 struct names...



root |-- lrsegid: integer (nullable = true) |-- loadsourceid: integer (nullable = true) |-- agencyid: integer (nullable = true) |-- acres: float (nullable = true) |-- sourcemap: array (nullable = true) | |-- element: integer (containsNull = true) |-- geomap: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- c1: integer (nullable = false) | | |-- c2: integer (nullable = false) | | |-- c3: integer (nullable = false) 









share|improve this question
























  • Can you post the output of df.printSchema

    – sramalingam24
    Mar 22 at 3:27











  • Sure, I have edited the post with the output of printSchema(). It contains all the other columns I left out for simplicity purposes.

    – MGK
    Mar 22 at 4:10

















0















I am new to Pyspark and I've been pulling my hair out trying to accomplish something I believe is fairly simple. I am trying to do an ETL process where a csv file is converted to a parquet file. The CSV file has a few simple columns, but one column is a delimited array of integers that I want to expand/unzip into a parquet file. This parquet file is actually used by a .net core micro service which uses a Parquet Reader to do calculations downstream. To keep this question simple, the structure of the column is:



"geomap" 5:3:7|4:2:1|8:2:78 -> this represents an array of 3 items, it is split at the "|" and then a tuple is build of the values (5,3,7), (4,2,1), (8,2,78)



I have tried various processes and schemas and I can't get this correct. Via UDF I am creating either a list of lists or a list of tuple, but I can't get the schema correct or unzip explode the data into the parquet write operation. I either get nulls, an error or other problems. Do I need to approach this differently? Relevant code is below. I am just showing the problem column for simplicity since I have the rest working. This is my first Pyspark attempt, so apologies for missing something obvious:



def convert_geo(geo):
return [tuple(x.split(':')) for x in geo.split('|')]

compression_type = 'snappy'

schema = ArrayType(StructType([
StructField("c1", IntegerType(), False),
StructField("c2", IntegerType(), False),
StructField("c3", IntegerType(), False)
]))

spark_convert_geo = udf(lambda z: convert_geo(z),schema)

source_path = '...path to csv'
destination_path = 'path for generated parquet file'

df = spark.read.option('delimiter',',').option('header','true').csv(source_path).withColumn("geomap",spark_convert_geo(col('geomap')).alias("geomap"))
df.write.mode("overwrite").format('parquet').option('compression', compression_type).save(destination_path)


EDIT: Per request adding the printSchema() output, I'm not sure what's wrong in here either. I still can't seem to get the string split values to show up or render properly. This contains all the columns. I do see the c1 and c2 and c3 struct names...



root |-- lrsegid: integer (nullable = true) |-- loadsourceid: integer (nullable = true) |-- agencyid: integer (nullable = true) |-- acres: float (nullable = true) |-- sourcemap: array (nullable = true) | |-- element: integer (containsNull = true) |-- geomap: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- c1: integer (nullable = false) | | |-- c2: integer (nullable = false) | | |-- c3: integer (nullable = false) 









share|improve this question
























  • Can you post the output of df.printSchema

    – sramalingam24
    Mar 22 at 3:27











  • Sure, I have edited the post with the output of printSchema(). It contains all the other columns I left out for simplicity purposes.

    – MGK
    Mar 22 at 4:10













0












0








0


0






I am new to Pyspark and I've been pulling my hair out trying to accomplish something I believe is fairly simple. I am trying to do an ETL process where a csv file is converted to a parquet file. The CSV file has a few simple columns, but one column is a delimited array of integers that I want to expand/unzip into a parquet file. This parquet file is actually used by a .net core micro service which uses a Parquet Reader to do calculations downstream. To keep this question simple, the structure of the column is:



"geomap" 5:3:7|4:2:1|8:2:78 -> this represents an array of 3 items, it is split at the "|" and then a tuple is build of the values (5,3,7), (4,2,1), (8,2,78)



I have tried various processes and schemas and I can't get this correct. Via UDF I am creating either a list of lists or a list of tuple, but I can't get the schema correct or unzip explode the data into the parquet write operation. I either get nulls, an error or other problems. Do I need to approach this differently? Relevant code is below. I am just showing the problem column for simplicity since I have the rest working. This is my first Pyspark attempt, so apologies for missing something obvious:



def convert_geo(geo):
return [tuple(x.split(':')) for x in geo.split('|')]

compression_type = 'snappy'

schema = ArrayType(StructType([
StructField("c1", IntegerType(), False),
StructField("c2", IntegerType(), False),
StructField("c3", IntegerType(), False)
]))

spark_convert_geo = udf(lambda z: convert_geo(z),schema)

source_path = '...path to csv'
destination_path = 'path for generated parquet file'

df = spark.read.option('delimiter',',').option('header','true').csv(source_path).withColumn("geomap",spark_convert_geo(col('geomap')).alias("geomap"))
df.write.mode("overwrite").format('parquet').option('compression', compression_type).save(destination_path)


EDIT: Per request adding the printSchema() output, I'm not sure what's wrong in here either. I still can't seem to get the string split values to show up or render properly. This contains all the columns. I do see the c1 and c2 and c3 struct names...



root |-- lrsegid: integer (nullable = true) |-- loadsourceid: integer (nullable = true) |-- agencyid: integer (nullable = true) |-- acres: float (nullable = true) |-- sourcemap: array (nullable = true) | |-- element: integer (containsNull = true) |-- geomap: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- c1: integer (nullable = false) | | |-- c2: integer (nullable = false) | | |-- c3: integer (nullable = false) 









share|improve this question
















I am new to Pyspark and I've been pulling my hair out trying to accomplish something I believe is fairly simple. I am trying to do an ETL process where a csv file is converted to a parquet file. The CSV file has a few simple columns, but one column is a delimited array of integers that I want to expand/unzip into a parquet file. This parquet file is actually used by a .net core micro service which uses a Parquet Reader to do calculations downstream. To keep this question simple, the structure of the column is:



"geomap" 5:3:7|4:2:1|8:2:78 -> this represents an array of 3 items, it is split at the "|" and then a tuple is build of the values (5,3,7), (4,2,1), (8,2,78)



I have tried various processes and schemas and I can't get this correct. Via UDF I am creating either a list of lists or a list of tuple, but I can't get the schema correct or unzip explode the data into the parquet write operation. I either get nulls, an error or other problems. Do I need to approach this differently? Relevant code is below. I am just showing the problem column for simplicity since I have the rest working. This is my first Pyspark attempt, so apologies for missing something obvious:



def convert_geo(geo):
return [tuple(x.split(':')) for x in geo.split('|')]

compression_type = 'snappy'

schema = ArrayType(StructType([
StructField("c1", IntegerType(), False),
StructField("c2", IntegerType(), False),
StructField("c3", IntegerType(), False)
]))

spark_convert_geo = udf(lambda z: convert_geo(z),schema)

source_path = '...path to csv'
destination_path = 'path for generated parquet file'

df = spark.read.option('delimiter',',').option('header','true').csv(source_path).withColumn("geomap",spark_convert_geo(col('geomap')).alias("geomap"))
df.write.mode("overwrite").format('parquet').option('compression', compression_type).save(destination_path)


EDIT: Per request adding the printSchema() output, I'm not sure what's wrong in here either. I still can't seem to get the string split values to show up or render properly. This contains all the columns. I do see the c1 and c2 and c3 struct names...



root |-- lrsegid: integer (nullable = true) |-- loadsourceid: integer (nullable = true) |-- agencyid: integer (nullable = true) |-- acres: float (nullable = true) |-- sourcemap: array (nullable = true) | |-- element: integer (containsNull = true) |-- geomap: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- c1: integer (nullable = false) | | |-- c2: integer (nullable = false) | | |-- c3: integer (nullable = false) 






python apache-spark dataframe pyspark parquet






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 22 at 4:12







MGK

















asked Mar 22 at 1:55









MGKMGK

65




65












  • Can you post the output of df.printSchema

    – sramalingam24
    Mar 22 at 3:27











  • Sure, I have edited the post with the output of printSchema(). It contains all the other columns I left out for simplicity purposes.

    – MGK
    Mar 22 at 4:10

















  • Can you post the output of df.printSchema

    – sramalingam24
    Mar 22 at 3:27











  • Sure, I have edited the post with the output of printSchema(). It contains all the other columns I left out for simplicity purposes.

    – MGK
    Mar 22 at 4:10
















Can you post the output of df.printSchema

– sramalingam24
Mar 22 at 3:27





Can you post the output of df.printSchema

– sramalingam24
Mar 22 at 3:27













Sure, I have edited the post with the output of printSchema(). It contains all the other columns I left out for simplicity purposes.

– MGK
Mar 22 at 4:10





Sure, I have edited the post with the output of printSchema(). It contains all the other columns I left out for simplicity purposes.

– MGK
Mar 22 at 4:10












1 Answer
1






active

oldest

votes


















0














The problem is that the convert_geo function returns a list of tuples with character elements rather than ints as specified in the schema. If you modify as follows it will work:



def convert_geo(geo):
return [tuple([int(y) for y in x.split(':')]) for x in geo.split('|')]





share|improve this answer























  • I could have sworn I tried making the schema all String type() and it still did not work. Let me check again. Also, what if the third item in the Tuple actually needs to be a double? How do I edit the UDF to make the 3rd item a different value type?

    – MGK
    Mar 22 at 14:42











  • It worked for me with the above tweak. you could replace the list comprehension with a for loop and some conditional logic if you want different dtypes for the struct elements

    – ags29
    Mar 22 at 14:47











  • You sir, are correct. Marking as answered. I was playing with a bunch of different schemas and structures, and I must have never tried matching the value types with the proper schema definition. I don't do a lot in Python and I wasn't sure if list comprehension had a way to mix the value types on creation. A for loop would probably be a little slower I assume? I guess it depends on on the implementation of the list comprehension internally. But, yes, my parquet file now has 3 structures, same length, same repetition levels and with the proper data.

    – MGK
    Mar 22 at 14:59












  • Thanks for accepting the answer. Thinking about it, you could probably avoid the for loop by having a list of type-casting functions the same length as the tuple (then zip with the colon split list and use a list comprehension as before)

    – ags29
    Mar 22 at 15:07











  • Hmmm, perhaps. Do you happpen to have an example of that? If not, no worries I can play with the idea in a bit. Thanks again for your help. I'm a .net developer and somewhat rusty in my python. I'm sure I can figure it out.

    – MGK
    Mar 22 at 15:09











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55291790%2fnew-to-pyspark-importing-a-csv-and-creating-a-parquet-file-with-array-columns%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














The problem is that the convert_geo function returns a list of tuples with character elements rather than ints as specified in the schema. If you modify as follows it will work:



def convert_geo(geo):
return [tuple([int(y) for y in x.split(':')]) for x in geo.split('|')]





share|improve this answer























  • I could have sworn I tried making the schema all String type() and it still did not work. Let me check again. Also, what if the third item in the Tuple actually needs to be a double? How do I edit the UDF to make the 3rd item a different value type?

    – MGK
    Mar 22 at 14:42











  • It worked for me with the above tweak. you could replace the list comprehension with a for loop and some conditional logic if you want different dtypes for the struct elements

    – ags29
    Mar 22 at 14:47











  • You sir, are correct. Marking as answered. I was playing with a bunch of different schemas and structures, and I must have never tried matching the value types with the proper schema definition. I don't do a lot in Python and I wasn't sure if list comprehension had a way to mix the value types on creation. A for loop would probably be a little slower I assume? I guess it depends on on the implementation of the list comprehension internally. But, yes, my parquet file now has 3 structures, same length, same repetition levels and with the proper data.

    – MGK
    Mar 22 at 14:59












  • Thanks for accepting the answer. Thinking about it, you could probably avoid the for loop by having a list of type-casting functions the same length as the tuple (then zip with the colon split list and use a list comprehension as before)

    – ags29
    Mar 22 at 15:07











  • Hmmm, perhaps. Do you happpen to have an example of that? If not, no worries I can play with the idea in a bit. Thanks again for your help. I'm a .net developer and somewhat rusty in my python. I'm sure I can figure it out.

    – MGK
    Mar 22 at 15:09















0














The problem is that the convert_geo function returns a list of tuples with character elements rather than ints as specified in the schema. If you modify as follows it will work:



def convert_geo(geo):
return [tuple([int(y) for y in x.split(':')]) for x in geo.split('|')]





share|improve this answer























  • I could have sworn I tried making the schema all String type() and it still did not work. Let me check again. Also, what if the third item in the Tuple actually needs to be a double? How do I edit the UDF to make the 3rd item a different value type?

    – MGK
    Mar 22 at 14:42











  • It worked for me with the above tweak. you could replace the list comprehension with a for loop and some conditional logic if you want different dtypes for the struct elements

    – ags29
    Mar 22 at 14:47











  • You sir, are correct. Marking as answered. I was playing with a bunch of different schemas and structures, and I must have never tried matching the value types with the proper schema definition. I don't do a lot in Python and I wasn't sure if list comprehension had a way to mix the value types on creation. A for loop would probably be a little slower I assume? I guess it depends on on the implementation of the list comprehension internally. But, yes, my parquet file now has 3 structures, same length, same repetition levels and with the proper data.

    – MGK
    Mar 22 at 14:59












  • Thanks for accepting the answer. Thinking about it, you could probably avoid the for loop by having a list of type-casting functions the same length as the tuple (then zip with the colon split list and use a list comprehension as before)

    – ags29
    Mar 22 at 15:07











  • Hmmm, perhaps. Do you happpen to have an example of that? If not, no worries I can play with the idea in a bit. Thanks again for your help. I'm a .net developer and somewhat rusty in my python. I'm sure I can figure it out.

    – MGK
    Mar 22 at 15:09













0












0








0







The problem is that the convert_geo function returns a list of tuples with character elements rather than ints as specified in the schema. If you modify as follows it will work:



def convert_geo(geo):
return [tuple([int(y) for y in x.split(':')]) for x in geo.split('|')]





share|improve this answer













The problem is that the convert_geo function returns a list of tuples with character elements rather than ints as specified in the schema. If you modify as follows it will work:



def convert_geo(geo):
return [tuple([int(y) for y in x.split(':')]) for x in geo.split('|')]






share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 22 at 11:09









ags29ags29

1,01427




1,01427












  • I could have sworn I tried making the schema all String type() and it still did not work. Let me check again. Also, what if the third item in the Tuple actually needs to be a double? How do I edit the UDF to make the 3rd item a different value type?

    – MGK
    Mar 22 at 14:42











  • It worked for me with the above tweak. you could replace the list comprehension with a for loop and some conditional logic if you want different dtypes for the struct elements

    – ags29
    Mar 22 at 14:47











  • You sir, are correct. Marking as answered. I was playing with a bunch of different schemas and structures, and I must have never tried matching the value types with the proper schema definition. I don't do a lot in Python and I wasn't sure if list comprehension had a way to mix the value types on creation. A for loop would probably be a little slower I assume? I guess it depends on on the implementation of the list comprehension internally. But, yes, my parquet file now has 3 structures, same length, same repetition levels and with the proper data.

    – MGK
    Mar 22 at 14:59












  • Thanks for accepting the answer. Thinking about it, you could probably avoid the for loop by having a list of type-casting functions the same length as the tuple (then zip with the colon split list and use a list comprehension as before)

    – ags29
    Mar 22 at 15:07











  • Hmmm, perhaps. Do you happpen to have an example of that? If not, no worries I can play with the idea in a bit. Thanks again for your help. I'm a .net developer and somewhat rusty in my python. I'm sure I can figure it out.

    – MGK
    Mar 22 at 15:09

















  • I could have sworn I tried making the schema all String type() and it still did not work. Let me check again. Also, what if the third item in the Tuple actually needs to be a double? How do I edit the UDF to make the 3rd item a different value type?

    – MGK
    Mar 22 at 14:42











  • It worked for me with the above tweak. you could replace the list comprehension with a for loop and some conditional logic if you want different dtypes for the struct elements

    – ags29
    Mar 22 at 14:47











  • You sir, are correct. Marking as answered. I was playing with a bunch of different schemas and structures, and I must have never tried matching the value types with the proper schema definition. I don't do a lot in Python and I wasn't sure if list comprehension had a way to mix the value types on creation. A for loop would probably be a little slower I assume? I guess it depends on on the implementation of the list comprehension internally. But, yes, my parquet file now has 3 structures, same length, same repetition levels and with the proper data.

    – MGK
    Mar 22 at 14:59












  • Thanks for accepting the answer. Thinking about it, you could probably avoid the for loop by having a list of type-casting functions the same length as the tuple (then zip with the colon split list and use a list comprehension as before)

    – ags29
    Mar 22 at 15:07











  • Hmmm, perhaps. Do you happpen to have an example of that? If not, no worries I can play with the idea in a bit. Thanks again for your help. I'm a .net developer and somewhat rusty in my python. I'm sure I can figure it out.

    – MGK
    Mar 22 at 15:09
















I could have sworn I tried making the schema all String type() and it still did not work. Let me check again. Also, what if the third item in the Tuple actually needs to be a double? How do I edit the UDF to make the 3rd item a different value type?

– MGK
Mar 22 at 14:42





I could have sworn I tried making the schema all String type() and it still did not work. Let me check again. Also, what if the third item in the Tuple actually needs to be a double? How do I edit the UDF to make the 3rd item a different value type?

– MGK
Mar 22 at 14:42













It worked for me with the above tweak. you could replace the list comprehension with a for loop and some conditional logic if you want different dtypes for the struct elements

– ags29
Mar 22 at 14:47





It worked for me with the above tweak. you could replace the list comprehension with a for loop and some conditional logic if you want different dtypes for the struct elements

– ags29
Mar 22 at 14:47













You sir, are correct. Marking as answered. I was playing with a bunch of different schemas and structures, and I must have never tried matching the value types with the proper schema definition. I don't do a lot in Python and I wasn't sure if list comprehension had a way to mix the value types on creation. A for loop would probably be a little slower I assume? I guess it depends on on the implementation of the list comprehension internally. But, yes, my parquet file now has 3 structures, same length, same repetition levels and with the proper data.

– MGK
Mar 22 at 14:59






You sir, are correct. Marking as answered. I was playing with a bunch of different schemas and structures, and I must have never tried matching the value types with the proper schema definition. I don't do a lot in Python and I wasn't sure if list comprehension had a way to mix the value types on creation. A for loop would probably be a little slower I assume? I guess it depends on on the implementation of the list comprehension internally. But, yes, my parquet file now has 3 structures, same length, same repetition levels and with the proper data.

– MGK
Mar 22 at 14:59














Thanks for accepting the answer. Thinking about it, you could probably avoid the for loop by having a list of type-casting functions the same length as the tuple (then zip with the colon split list and use a list comprehension as before)

– ags29
Mar 22 at 15:07





Thanks for accepting the answer. Thinking about it, you could probably avoid the for loop by having a list of type-casting functions the same length as the tuple (then zip with the colon split list and use a list comprehension as before)

– ags29
Mar 22 at 15:07













Hmmm, perhaps. Do you happpen to have an example of that? If not, no worries I can play with the idea in a bit. Thanks again for your help. I'm a .net developer and somewhat rusty in my python. I'm sure I can figure it out.

– MGK
Mar 22 at 15:09





Hmmm, perhaps. Do you happpen to have an example of that? If not, no worries I can play with the idea in a bit. Thanks again for your help. I'm a .net developer and somewhat rusty in my python. I'm sure I can figure it out.

– MGK
Mar 22 at 15:09



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55291790%2fnew-to-pyspark-importing-a-csv-and-creating-a-parquet-file-with-array-columns%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript