PySpark: Convert RDD to List The Next CEO of Stack OverflowHow do I check if a list is empty?Finding the index of an item given a list containing it in PythonConvert two lists into a dictionary in PythonDifference between append vs. extend list methods in PythonHow do you split a list into evenly sized chunks?Getting the last element of a list in PythonHow to make a flat list out of list of lists?How do I get the number of elements in a list in Python?How do I concatenate two lists in Python?How to clone or copy a list?

Is it possible to replace duplicates of a character with one character using tr

Does soap repel water?

Some questions about different axiomatic systems for neighbourhoods

Why don't programming languages automatically manage the synchronous/asynchronous problem?

Is there always a complete, orthogonal set of unitary matrices?

Is it okay to majorly distort historical facts while writing a fiction story?

How to count occurrences of text in a file?

Math-accent symbol over parentheses enclosing accented symbol (amsmath)

If Nick Fury and Coulson already knew about aliens (Kree and Skrull) why did they wait until Thor's appearance to start making weapons?

Is micro rebar a better way to reinforce concrete than rebar?

Bartok - Syncopation (1): Meaning of notes in between Grand Staff

What happened in Rome, when the western empire "fell"?

Is it my responsibility to learn a new technology in my own time my employer wants to implement?

Make solar eclipses exceedingly rare, but still have new moons

Is a distribution that is normal, but highly skewed considered Gaussian?

Won the lottery - how do I keep the money?

Is it convenient to ask the journal's editor for two additional days to complete a review?

Method for adding error messages to a dictionary given a key

Easy to read palindrome checker

Why doesn't UK go for the same deal Japan has with EU to resolve Brexit?

Why isn't acceleration always zero whenever velocity is zero, such as the moment a ball bounces off a wall?

Does increasing your ability score affect your main stat?

Why does standard notation not preserve intervals (visually)

Help understanding this unsettling image of Titan, Epimetheus, and Saturn's rings?

PySpark: Convert RDD to List

The Next CEO of Stack OverflowHow do I check if a list is empty?Finding the index of an item given a list containing it in PythonConvert two lists into a dictionary in PythonDifference between append vs. extend list methods in PythonHow do you split a list into evenly sized chunks?Getting the last element of a list in PythonHow to make a flat list out of list of lists?How do I get the number of elements in a list in Python?How do I concatenate two lists in Python?How to clone or copy a list?

I am having trouble converting an RDD to a list, and I could use some help seeing where I am going wrong.

Here is what I am working with:

This RDD has 49995 elements, and was created using this function:

testList = myData.map(extract_values)

testList.take(1)

[[SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0)]]

The extract_values function is:

def extract_values(friendRDD):
 list = []
 list.append(friendRDD[1])
 return list

At this point, I have tried:

myList = myData.map(extract_values).collect()

but it gives an error:

ValueError: invalid literal for int() with base 10: ''

which I do not have any clue on why it is giving this error output.

How can I convert the testList RDD into a list at this point?

Here is the myData.take(1):

 [(0, SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0))],

Thank you for your help!

edited Mar 22 at 5:30

asked Mar 21 at 18:06

MitterHai

145

It isn't necessarily a source of error since friendRDD could be anything, but just in case, I want to check to make sure you know Python list indices begin at zero.

– Jesse Amano
Mar 21 at 21:54

Since you're able to take an element from your RDD after mapping, you should be able to collect them all in the same way. The likely reason for this to be a problem would be if your source RDD contains some problematic rows and some non-problematic rows, and take(1) happened to choose a non-problematic row.

– Jesse Amano
Mar 21 at 22:03

To diagnose, check the source data (your myData RDD) for any "weird" records. Based on what the ValueError says, I'd especially keep an eye out for any vector fields that have an empty (or other non-integer) key.

– Jesse Amano
Mar 21 at 22:05

@Jesse Amano I understand that the python indices begin at 0. I'll post the myData take(1) above. I intended to take the [1] element of the vector.

– MitterHai
Mar 22 at 5:28

Ok, just checking ;) I still think you might have some rows in the RDD that don't have the exact same schema as the first row, since take(1) worked with that row but collect() didn't. There aren't any syntactical errors in what you've posted here, so it must be an issue with the data source.

– Jesse Amano
Mar 22 at 16:42

add a comment |

I am having trouble converting an RDD to a list, and I could use some help seeing where I am going wrong.

Here is what I am working with:

This RDD has 49995 elements, and was created using this function:

testList = myData.map(extract_values)

testList.take(1)

[[SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0)]]

The extract_values function is:

def extract_values(friendRDD):
 list = []
 list.append(friendRDD[1])
 return list

At this point, I have tried:

myList = myData.map(extract_values).collect()

but it gives an error:

ValueError: invalid literal for int() with base 10: ''

which I do not have any clue on why it is giving this error output.

How can I convert the testList RDD into a list at this point?

Here is the myData.take(1):

 [(0, SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0))],

Thank you for your help!

edited Mar 22 at 5:30

asked Mar 21 at 18:06

MitterHai

145

It isn't necessarily a source of error since friendRDD could be anything, but just in case, I want to check to make sure you know Python list indices begin at zero.

– Jesse Amano
Mar 21 at 21:54

Since you're able to take an element from your RDD after mapping, you should be able to collect them all in the same way. The likely reason for this to be a problem would be if your source RDD contains some problematic rows and some non-problematic rows, and take(1) happened to choose a non-problematic row.

– Jesse Amano
Mar 21 at 22:03

To diagnose, check the source data (your myData RDD) for any "weird" records. Based on what the ValueError says, I'd especially keep an eye out for any vector fields that have an empty (or other non-integer) key.

– Jesse Amano
Mar 21 at 22:05

@Jesse Amano I understand that the python indices begin at 0. I'll post the myData take(1) above. I intended to take the [1] element of the vector.

– MitterHai
Mar 22 at 5:28

Ok, just checking ;) I still think you might have some rows in the RDD that don't have the exact same schema as the first row, since take(1) worked with that row but collect() didn't. There aren't any syntactical errors in what you've posted here, so it must be an issue with the data source.

– Jesse Amano
Mar 22 at 16:42

add a comment |

I am having trouble converting an RDD to a list, and I could use some help seeing where I am going wrong.

Here is what I am working with:

This RDD has 49995 elements, and was created using this function:

testList = myData.map(extract_values)

testList.take(1)

[[SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0)]]

The extract_values function is:

def extract_values(friendRDD):
 list = []
 list.append(friendRDD[1])
 return list

At this point, I have tried:

myList = myData.map(extract_values).collect()

but it gives an error:

ValueError: invalid literal for int() with base 10: ''

which I do not have any clue on why it is giving this error output.

How can I convert the testList RDD into a list at this point?

Here is the myData.take(1):

 [(0, SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0))],

Thank you for your help!

edited Mar 22 at 5:30

asked Mar 21 at 18:06

MitterHai

145

I am having trouble converting an RDD to a list, and I could use some help seeing where I am going wrong.

Here is what I am working with:

This RDD has 49995 elements, and was created using this function:

testList = myData.map(extract_values)

testList.take(1)

[[SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0)]]

The extract_values function is:

def extract_values(friendRDD):
 list = []
 list.append(friendRDD[1])
 return list

At this point, I have tried:

myList = myData.map(extract_values).collect()

but it gives an error:

ValueError: invalid literal for int() with base 10: ''

which I do not have any clue on why it is giving this error output.

How can I convert the testList RDD into a list at this point?

Here is the myData.take(1):

 [(0, SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0))],

Thank you for your help!

list pyspark rdd

edited Mar 22 at 5:30

asked Mar 21 at 18:06

MitterHai

145

edited Mar 22 at 5:30

asked Mar 21 at 18:06

MitterHai

145

edited Mar 22 at 5:30

asked Mar 21 at 18:06

MitterHai

145

asked Mar 21 at 18:06

MitterHai

145

asked Mar 21 at 18:06

MitterHai

145

It isn't necessarily a source of error since friendRDD could be anything, but just in case, I want to check to make sure you know Python list indices begin at zero.

– Jesse Amano
Mar 21 at 21:54

Since you're able to take an element from your RDD after mapping, you should be able to collect them all in the same way. The likely reason for this to be a problem would be if your source RDD contains some problematic rows and some non-problematic rows, and take(1) happened to choose a non-problematic row.

– Jesse Amano
Mar 21 at 22:03

To diagnose, check the source data (your myData RDD) for any "weird" records. Based on what the ValueError says, I'd especially keep an eye out for any vector fields that have an empty (or other non-integer) key.

– Jesse Amano
Mar 21 at 22:05

@Jesse Amano I understand that the python indices begin at 0. I'll post the myData take(1) above. I intended to take the [1] element of the vector.

– MitterHai
Mar 22 at 5:28

Ok, just checking ;) I still think you might have some rows in the RDD that don't have the exact same schema as the first row, since take(1) worked with that row but collect() didn't. There aren't any syntactical errors in what you've posted here, so it must be an issue with the data source.

– Jesse Amano
Mar 22 at 16:42

add a comment |

It isn't necessarily a source of error since friendRDD could be anything, but just in case, I want to check to make sure you know Python list indices begin at zero.

– Jesse Amano
Mar 21 at 21:54

Since you're able to take an element from your RDD after mapping, you should be able to collect them all in the same way. The likely reason for this to be a problem would be if your source RDD contains some problematic rows and some non-problematic rows, and take(1) happened to choose a non-problematic row.

– Jesse Amano
Mar 21 at 22:03

To diagnose, check the source data (your myData RDD) for any "weird" records. Based on what the ValueError says, I'd especially keep an eye out for any vector fields that have an empty (or other non-integer) key.

– Jesse Amano
Mar 21 at 22:05

@Jesse Amano I understand that the python indices begin at 0. I'll post the myData take(1) above. I intended to take the [1] element of the vector.

– MitterHai
Mar 22 at 5:28

Ok, just checking ;) I still think you might have some rows in the RDD that don't have the exact same schema as the first row, since take(1) worked with that row but collect() didn't. There aren't any syntactical errors in what you've posted here, so it must be an issue with the data source.

– Jesse Amano
Mar 22 at 16:42

It isn't necessarily a source of error since friendRDD could be anything, but just in case, I want to check to make sure you know Python list indices begin at zero.

– Jesse Amano
Mar 21 at 21:54

Since you're able to take an element from your RDD after mapping, you should be able to collect them all in the same way. The likely reason for this to be a problem would be if your source RDD contains some problematic rows and some non-problematic rows, and take(1) happened to choose a non-problematic row.

– Jesse Amano
Mar 21 at 22:03

To diagnose, check the source data (your myData RDD) for any "weird" records. Based on what the ValueError says, I'd especially keep an eye out for any vector fields that have an empty (or other non-integer) key.

– Jesse Amano
Mar 21 at 22:05

@Jesse Amano I understand that the python indices begin at 0. I'll post the myData take(1) above. I intended to take the [1] element of the vector.

– MitterHai
Mar 22 at 5:28

Ok, just checking ;) I still think you might have some rows in the RDD that don't have the exact same schema as the first row, since take(1) worked with that row but collect() didn't. There aren't any syntactical errors in what you've posted here, so it must be an issue with the data source.

– Jesse Amano
Mar 22 at 16:42

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55286688%2fpyspark-convert-rdd-to-list%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴