PySpark: Convert RDD to List The Next CEO of Stack OverflowHow do I check if a list is empty?Finding the index of an item given a list containing it in PythonConvert two lists into a dictionary in PythonDifference between append vs. extend list methods in PythonHow do you split a list into evenly sized chunks?Getting the last element of a list in PythonHow to make a flat list out of list of lists?How do I get the number of elements in a list in Python?How do I concatenate two lists in Python?How to clone or copy a list?
Is it possible to replace duplicates of a character with one character using tr
Does soap repel water?
Some questions about different axiomatic systems for neighbourhoods
Why don't programming languages automatically manage the synchronous/asynchronous problem?
Is there always a complete, orthogonal set of unitary matrices?
Is it okay to majorly distort historical facts while writing a fiction story?
How to count occurrences of text in a file?
Math-accent symbol over parentheses enclosing accented symbol (amsmath)
If Nick Fury and Coulson already knew about aliens (Kree and Skrull) why did they wait until Thor's appearance to start making weapons?
Is micro rebar a better way to reinforce concrete than rebar?
Bartok - Syncopation (1): Meaning of notes in between Grand Staff
What happened in Rome, when the western empire "fell"?
Is it my responsibility to learn a new technology in my own time my employer wants to implement?
Make solar eclipses exceedingly rare, but still have new moons
Is a distribution that is normal, but highly skewed considered Gaussian?
Won the lottery - how do I keep the money?
Is it convenient to ask the journal's editor for two additional days to complete a review?
Method for adding error messages to a dictionary given a key
Easy to read palindrome checker
Why doesn't UK go for the same deal Japan has with EU to resolve Brexit?
Why isn't acceleration always zero whenever velocity is zero, such as the moment a ball bounces off a wall?
Does increasing your ability score affect your main stat?
Why does standard notation not preserve intervals (visually)
Help understanding this unsettling image of Titan, Epimetheus, and Saturn's rings?
PySpark: Convert RDD to List
The Next CEO of Stack OverflowHow do I check if a list is empty?Finding the index of an item given a list containing it in PythonConvert two lists into a dictionary in PythonDifference between append vs. extend list methods in PythonHow do you split a list into evenly sized chunks?Getting the last element of a list in PythonHow to make a flat list out of list of lists?How do I get the number of elements in a list in Python?How do I concatenate two lists in Python?How to clone or copy a list?
I am having trouble converting an RDD to a list, and I could use some help seeing where I am going wrong.
Here is what I am working with:
This RDD has 49995 elements, and was created using this function:
testList = myData.map(extract_values)
testList.take(1)
[[SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0)]]
The extract_values function is:
def extract_values(friendRDD):
list = []
list.append(friendRDD[1])
return list
At this point, I have tried:
myList = myData.map(extract_values).collect()
but it gives an error:
ValueError: invalid literal for int() with base 10: ''
which I do not have any clue on why it is giving this error output.
How can I convert the testList RDD into a list at this point?
Here is the myData.take(1):
[(0, SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0))],
Thank you for your help!
list pyspark rdd
add a comment |
I am having trouble converting an RDD to a list, and I could use some help seeing where I am going wrong.
Here is what I am working with:
This RDD has 49995 elements, and was created using this function:
testList = myData.map(extract_values)
testList.take(1)
[[SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0)]]
The extract_values function is:
def extract_values(friendRDD):
list = []
list.append(friendRDD[1])
return list
At this point, I have tried:
myList = myData.map(extract_values).collect()
but it gives an error:
ValueError: invalid literal for int() with base 10: ''
which I do not have any clue on why it is giving this error output.
How can I convert the testList RDD into a list at this point?
Here is the myData.take(1):
[(0, SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0))],
Thank you for your help!
list pyspark rdd
It isn't necessarily a source of error sincefriendRDD
could be anything, but just in case, I want to check to make sure you know Python list indices begin at zero.
– Jesse Amano
Mar 21 at 21:54
Since you're able totake
an element from your RDD after mapping, you should be able tocollect
them all in the same way. The likely reason for this to be a problem would be if your source RDD contains some problematic rows and some non-problematic rows, andtake(1)
happened to choose a non-problematic row.
– Jesse Amano
Mar 21 at 22:03
To diagnose, check the source data (yourmyData
RDD) for any "weird" records. Based on what theValueError
says, I'd especially keep an eye out for any vector fields that have an empty (or other non-integer) key.
– Jesse Amano
Mar 21 at 22:05
@Jesse Amano I understand that the python indices begin at 0. I'll post the myData take(1) above. I intended to take the [1] element of the vector.
– MitterHai
Mar 22 at 5:28
Ok, just checking ;) I still think you might have some rows in the RDD that don't have the exact same schema as the first row, sincetake(1)
worked with that row butcollect()
didn't. There aren't any syntactical errors in what you've posted here, so it must be an issue with the data source.
– Jesse Amano
Mar 22 at 16:42
add a comment |
I am having trouble converting an RDD to a list, and I could use some help seeing where I am going wrong.
Here is what I am working with:
This RDD has 49995 elements, and was created using this function:
testList = myData.map(extract_values)
testList.take(1)
[[SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0)]]
The extract_values function is:
def extract_values(friendRDD):
list = []
list.append(friendRDD[1])
return list
At this point, I have tried:
myList = myData.map(extract_values).collect()
but it gives an error:
ValueError: invalid literal for int() with base 10: ''
which I do not have any clue on why it is giving this error output.
How can I convert the testList RDD into a list at this point?
Here is the myData.take(1):
[(0, SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0))],
Thank you for your help!
list pyspark rdd
I am having trouble converting an RDD to a list, and I could use some help seeing where I am going wrong.
Here is what I am working with:
This RDD has 49995 elements, and was created using this function:
testList = myData.map(extract_values)
testList.take(1)
[[SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0)]]
The extract_values function is:
def extract_values(friendRDD):
list = []
list.append(friendRDD[1])
return list
At this point, I have tried:
myList = myData.map(extract_values).collect()
but it gives an error:
ValueError: invalid literal for int() with base 10: ''
which I do not have any clue on why it is giving this error output.
How can I convert the testList RDD into a list at this point?
Here is the myData.take(1):
[(0, SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0))],
Thank you for your help!
list pyspark rdd
list pyspark rdd
edited Mar 22 at 5:30
MitterHai
asked Mar 21 at 18:06
MitterHaiMitterHai
145
145
It isn't necessarily a source of error sincefriendRDD
could be anything, but just in case, I want to check to make sure you know Python list indices begin at zero.
– Jesse Amano
Mar 21 at 21:54
Since you're able totake
an element from your RDD after mapping, you should be able tocollect
them all in the same way. The likely reason for this to be a problem would be if your source RDD contains some problematic rows and some non-problematic rows, andtake(1)
happened to choose a non-problematic row.
– Jesse Amano
Mar 21 at 22:03
To diagnose, check the source data (yourmyData
RDD) for any "weird" records. Based on what theValueError
says, I'd especially keep an eye out for any vector fields that have an empty (or other non-integer) key.
– Jesse Amano
Mar 21 at 22:05
@Jesse Amano I understand that the python indices begin at 0. I'll post the myData take(1) above. I intended to take the [1] element of the vector.
– MitterHai
Mar 22 at 5:28
Ok, just checking ;) I still think you might have some rows in the RDD that don't have the exact same schema as the first row, sincetake(1)
worked with that row butcollect()
didn't. There aren't any syntactical errors in what you've posted here, so it must be an issue with the data source.
– Jesse Amano
Mar 22 at 16:42
add a comment |
It isn't necessarily a source of error sincefriendRDD
could be anything, but just in case, I want to check to make sure you know Python list indices begin at zero.
– Jesse Amano
Mar 21 at 21:54
Since you're able totake
an element from your RDD after mapping, you should be able tocollect
them all in the same way. The likely reason for this to be a problem would be if your source RDD contains some problematic rows and some non-problematic rows, andtake(1)
happened to choose a non-problematic row.
– Jesse Amano
Mar 21 at 22:03
To diagnose, check the source data (yourmyData
RDD) for any "weird" records. Based on what theValueError
says, I'd especially keep an eye out for any vector fields that have an empty (or other non-integer) key.
– Jesse Amano
Mar 21 at 22:05
@Jesse Amano I understand that the python indices begin at 0. I'll post the myData take(1) above. I intended to take the [1] element of the vector.
– MitterHai
Mar 22 at 5:28
Ok, just checking ;) I still think you might have some rows in the RDD that don't have the exact same schema as the first row, sincetake(1)
worked with that row butcollect()
didn't. There aren't any syntactical errors in what you've posted here, so it must be an issue with the data source.
– Jesse Amano
Mar 22 at 16:42
It isn't necessarily a source of error since
friendRDD
could be anything, but just in case, I want to check to make sure you know Python list indices begin at zero.– Jesse Amano
Mar 21 at 21:54
It isn't necessarily a source of error since
friendRDD
could be anything, but just in case, I want to check to make sure you know Python list indices begin at zero.– Jesse Amano
Mar 21 at 21:54
Since you're able to
take
an element from your RDD after mapping, you should be able to collect
them all in the same way. The likely reason for this to be a problem would be if your source RDD contains some problematic rows and some non-problematic rows, and take(1)
happened to choose a non-problematic row.– Jesse Amano
Mar 21 at 22:03
Since you're able to
take
an element from your RDD after mapping, you should be able to collect
them all in the same way. The likely reason for this to be a problem would be if your source RDD contains some problematic rows and some non-problematic rows, and take(1)
happened to choose a non-problematic row.– Jesse Amano
Mar 21 at 22:03
To diagnose, check the source data (your
myData
RDD) for any "weird" records. Based on what the ValueError
says, I'd especially keep an eye out for any vector fields that have an empty (or other non-integer) key.– Jesse Amano
Mar 21 at 22:05
To diagnose, check the source data (your
myData
RDD) for any "weird" records. Based on what the ValueError
says, I'd especially keep an eye out for any vector fields that have an empty (or other non-integer) key.– Jesse Amano
Mar 21 at 22:05
@Jesse Amano I understand that the python indices begin at 0. I'll post the myData take(1) above. I intended to take the [1] element of the vector.
– MitterHai
Mar 22 at 5:28
@Jesse Amano I understand that the python indices begin at 0. I'll post the myData take(1) above. I intended to take the [1] element of the vector.
– MitterHai
Mar 22 at 5:28
Ok, just checking ;) I still think you might have some rows in the RDD that don't have the exact same schema as the first row, since
take(1)
worked with that row but collect()
didn't. There aren't any syntactical errors in what you've posted here, so it must be an issue with the data source.– Jesse Amano
Mar 22 at 16:42
Ok, just checking ;) I still think you might have some rows in the RDD that don't have the exact same schema as the first row, since
take(1)
worked with that row but collect()
didn't. There aren't any syntactical errors in what you've posted here, so it must be an issue with the data source.– Jesse Amano
Mar 22 at 16:42
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55286688%2fpyspark-convert-rdd-to-list%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55286688%2fpyspark-convert-rdd-to-list%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
It isn't necessarily a source of error since
friendRDD
could be anything, but just in case, I want to check to make sure you know Python list indices begin at zero.– Jesse Amano
Mar 21 at 21:54
Since you're able to
take
an element from your RDD after mapping, you should be able tocollect
them all in the same way. The likely reason for this to be a problem would be if your source RDD contains some problematic rows and some non-problematic rows, andtake(1)
happened to choose a non-problematic row.– Jesse Amano
Mar 21 at 22:03
To diagnose, check the source data (your
myData
RDD) for any "weird" records. Based on what theValueError
says, I'd especially keep an eye out for any vector fields that have an empty (or other non-integer) key.– Jesse Amano
Mar 21 at 22:05
@Jesse Amano I understand that the python indices begin at 0. I'll post the myData take(1) above. I intended to take the [1] element of the vector.
– MitterHai
Mar 22 at 5:28
Ok, just checking ;) I still think you might have some rows in the RDD that don't have the exact same schema as the first row, since
take(1)
worked with that row butcollect()
didn't. There aren't any syntactical errors in what you've posted here, so it must be an issue with the data source.– Jesse Amano
Mar 22 at 16:42