PySpark: Convert RDD to List The Next CEO of Stack OverflowHow do I check if a list is empty?Finding the index of an item given a list containing it in PythonConvert two lists into a dictionary in PythonDifference between append vs. extend list methods in PythonHow do you split a list into evenly sized chunks?Getting the last element of a list in PythonHow to make a flat list out of list of lists?How do I get the number of elements in a list in Python?How do I concatenate two lists in Python?How to clone or copy a list?

Is it possible to replace duplicates of a character with one character using tr

Does soap repel water?

Some questions about different axiomatic systems for neighbourhoods

Why don't programming languages automatically manage the synchronous/asynchronous problem?

Is there always a complete, orthogonal set of unitary matrices?

Is it okay to majorly distort historical facts while writing a fiction story?

How to count occurrences of text in a file?

Math-accent symbol over parentheses enclosing accented symbol (amsmath)

If Nick Fury and Coulson already knew about aliens (Kree and Skrull) why did they wait until Thor's appearance to start making weapons?

Is micro rebar a better way to reinforce concrete than rebar?

Bartok - Syncopation (1): Meaning of notes in between Grand Staff

What happened in Rome, when the western empire "fell"?

Is it my responsibility to learn a new technology in my own time my employer wants to implement?

Make solar eclipses exceedingly rare, but still have new moons

Is a distribution that is normal, but highly skewed considered Gaussian?

Won the lottery - how do I keep the money?

Is it convenient to ask the journal's editor for two additional days to complete a review?

Method for adding error messages to a dictionary given a key

Easy to read palindrome checker

Why doesn't UK go for the same deal Japan has with EU to resolve Brexit?

Why isn't acceleration always zero whenever velocity is zero, such as the moment a ball bounces off a wall?

Does increasing your ability score affect your main stat?

Why does standard notation not preserve intervals (visually)

Help understanding this unsettling image of Titan, Epimetheus, and Saturn's rings?



PySpark: Convert RDD to List



The Next CEO of Stack OverflowHow do I check if a list is empty?Finding the index of an item given a list containing it in PythonConvert two lists into a dictionary in PythonDifference between append vs. extend list methods in PythonHow do you split a list into evenly sized chunks?Getting the last element of a list in PythonHow to make a flat list out of list of lists?How do I get the number of elements in a list in Python?How do I concatenate two lists in Python?How to clone or copy a list?










0















I am having trouble converting an RDD to a list, and I could use some help seeing where I am going wrong.



Here is what I am working with:



This RDD has 49995 elements, and was created using this function:



testList = myData.map(extract_values)

testList.take(1)

[[SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0)]]


The extract_values function is:



def extract_values(friendRDD):
list = []
list.append(friendRDD[1])
return list


At this point, I have tried:



myList = myData.map(extract_values).collect()


but it gives an error:



ValueError: invalid literal for int() with base 10: ''


which I do not have any clue on why it is giving this error output.



How can I convert the testList RDD into a list at this point?



Here is the myData.take(1):



 [(0, SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0))],


Thank you for your help!










share|improve this question
























  • It isn't necessarily a source of error since friendRDD could be anything, but just in case, I want to check to make sure you know Python list indices begin at zero.

    – Jesse Amano
    Mar 21 at 21:54











  • Since you're able to take an element from your RDD after mapping, you should be able to collect them all in the same way. The likely reason for this to be a problem would be if your source RDD contains some problematic rows and some non-problematic rows, and take(1) happened to choose a non-problematic row.

    – Jesse Amano
    Mar 21 at 22:03











  • To diagnose, check the source data (your myData RDD) for any "weird" records. Based on what the ValueError says, I'd especially keep an eye out for any vector fields that have an empty (or other non-integer) key.

    – Jesse Amano
    Mar 21 at 22:05











  • @Jesse Amano I understand that the python indices begin at 0. I'll post the myData take(1) above. I intended to take the [1] element of the vector.

    – MitterHai
    Mar 22 at 5:28












  • Ok, just checking ;) I still think you might have some rows in the RDD that don't have the exact same schema as the first row, since take(1) worked with that row but collect() didn't. There aren't any syntactical errors in what you've posted here, so it must be an issue with the data source.

    – Jesse Amano
    Mar 22 at 16:42















0















I am having trouble converting an RDD to a list, and I could use some help seeing where I am going wrong.



Here is what I am working with:



This RDD has 49995 elements, and was created using this function:



testList = myData.map(extract_values)

testList.take(1)

[[SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0)]]


The extract_values function is:



def extract_values(friendRDD):
list = []
list.append(friendRDD[1])
return list


At this point, I have tried:



myList = myData.map(extract_values).collect()


but it gives an error:



ValueError: invalid literal for int() with base 10: ''


which I do not have any clue on why it is giving this error output.



How can I convert the testList RDD into a list at this point?



Here is the myData.take(1):



 [(0, SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0))],


Thank you for your help!










share|improve this question
























  • It isn't necessarily a source of error since friendRDD could be anything, but just in case, I want to check to make sure you know Python list indices begin at zero.

    – Jesse Amano
    Mar 21 at 21:54











  • Since you're able to take an element from your RDD after mapping, you should be able to collect them all in the same way. The likely reason for this to be a problem would be if your source RDD contains some problematic rows and some non-problematic rows, and take(1) happened to choose a non-problematic row.

    – Jesse Amano
    Mar 21 at 22:03











  • To diagnose, check the source data (your myData RDD) for any "weird" records. Based on what the ValueError says, I'd especially keep an eye out for any vector fields that have an empty (or other non-integer) key.

    – Jesse Amano
    Mar 21 at 22:05











  • @Jesse Amano I understand that the python indices begin at 0. I'll post the myData take(1) above. I intended to take the [1] element of the vector.

    – MitterHai
    Mar 22 at 5:28












  • Ok, just checking ;) I still think you might have some rows in the RDD that don't have the exact same schema as the first row, since take(1) worked with that row but collect() didn't. There aren't any syntactical errors in what you've posted here, so it must be an issue with the data source.

    – Jesse Amano
    Mar 22 at 16:42













0












0








0








I am having trouble converting an RDD to a list, and I could use some help seeing where I am going wrong.



Here is what I am working with:



This RDD has 49995 elements, and was created using this function:



testList = myData.map(extract_values)

testList.take(1)

[[SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0)]]


The extract_values function is:



def extract_values(friendRDD):
list = []
list.append(friendRDD[1])
return list


At this point, I have tried:



myList = myData.map(extract_values).collect()


but it gives an error:



ValueError: invalid literal for int() with base 10: ''


which I do not have any clue on why it is giving this error output.



How can I convert the testList RDD into a list at this point?



Here is the myData.take(1):



 [(0, SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0))],


Thank you for your help!










share|improve this question
















I am having trouble converting an RDD to a list, and I could use some help seeing where I am going wrong.



Here is what I am working with:



This RDD has 49995 elements, and was created using this function:



testList = myData.map(extract_values)

testList.take(1)

[[SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0)]]


The extract_values function is:



def extract_values(friendRDD):
list = []
list.append(friendRDD[1])
return list


At this point, I have tried:



myList = myData.map(extract_values).collect()


but it gives an error:



ValueError: invalid literal for int() with base 10: ''


which I do not have any clue on why it is giving this error output.



How can I convert the testList RDD into a list at this point?



Here is the myData.take(1):



 [(0, SparseVector(49995, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 11: 1.0, 12: 1.0, 13: 1.0, 14: 1.0, 15: 1.0, 16: 1.0, 17: 1.0, 18: 1.0, 19: 1.0, 20: 1.0, 21: 1.0, 22: 1.0, 23: 1.0, 24: 1.0, 25: 1.0, 26: 1.0, 27: 1.0, 28: 1.0, 29: 1.0, 30: 1.0, 31: 1.0, 32: 1.0, 33: 1.0, 34: 1.0, 35: 1.0, 36: 1.0, 37: 1.0, 38: 1.0, 39: 1.0, 40: 1.0, 41: 1.0, 42: 1.0, 43: 1.0, 44: 1.0, 45: 1.0, 46: 1.0, 47: 1.0, 48: 1.0, 49: 1.0, 50: 1.0, 51: 1.0, 52: 1.0, 53: 1.0, 54: 1.0, 55: 1.0, 56: 1.0, 57: 1.0, 58: 1.0, 59: 1.0, 60: 1.0, 61: 1.0, 62: 1.0, 63: 1.0, 64: 1.0, 65: 1.0, 66: 1.0, 67: 1.0, 68: 1.0, 69: 1.0, 70: 1.0, 71: 1.0, 72: 1.0, 73: 1.0, 74: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81: 1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89: 1.0, 90: 1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0))],


Thank you for your help!







list pyspark rdd






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 22 at 5:30







MitterHai

















asked Mar 21 at 18:06









MitterHaiMitterHai

145




145












  • It isn't necessarily a source of error since friendRDD could be anything, but just in case, I want to check to make sure you know Python list indices begin at zero.

    – Jesse Amano
    Mar 21 at 21:54











  • Since you're able to take an element from your RDD after mapping, you should be able to collect them all in the same way. The likely reason for this to be a problem would be if your source RDD contains some problematic rows and some non-problematic rows, and take(1) happened to choose a non-problematic row.

    – Jesse Amano
    Mar 21 at 22:03











  • To diagnose, check the source data (your myData RDD) for any "weird" records. Based on what the ValueError says, I'd especially keep an eye out for any vector fields that have an empty (or other non-integer) key.

    – Jesse Amano
    Mar 21 at 22:05











  • @Jesse Amano I understand that the python indices begin at 0. I'll post the myData take(1) above. I intended to take the [1] element of the vector.

    – MitterHai
    Mar 22 at 5:28












  • Ok, just checking ;) I still think you might have some rows in the RDD that don't have the exact same schema as the first row, since take(1) worked with that row but collect() didn't. There aren't any syntactical errors in what you've posted here, so it must be an issue with the data source.

    – Jesse Amano
    Mar 22 at 16:42

















  • It isn't necessarily a source of error since friendRDD could be anything, but just in case, I want to check to make sure you know Python list indices begin at zero.

    – Jesse Amano
    Mar 21 at 21:54











  • Since you're able to take an element from your RDD after mapping, you should be able to collect them all in the same way. The likely reason for this to be a problem would be if your source RDD contains some problematic rows and some non-problematic rows, and take(1) happened to choose a non-problematic row.

    – Jesse Amano
    Mar 21 at 22:03











  • To diagnose, check the source data (your myData RDD) for any "weird" records. Based on what the ValueError says, I'd especially keep an eye out for any vector fields that have an empty (or other non-integer) key.

    – Jesse Amano
    Mar 21 at 22:05











  • @Jesse Amano I understand that the python indices begin at 0. I'll post the myData take(1) above. I intended to take the [1] element of the vector.

    – MitterHai
    Mar 22 at 5:28












  • Ok, just checking ;) I still think you might have some rows in the RDD that don't have the exact same schema as the first row, since take(1) worked with that row but collect() didn't. There aren't any syntactical errors in what you've posted here, so it must be an issue with the data source.

    – Jesse Amano
    Mar 22 at 16:42
















It isn't necessarily a source of error since friendRDD could be anything, but just in case, I want to check to make sure you know Python list indices begin at zero.

– Jesse Amano
Mar 21 at 21:54





It isn't necessarily a source of error since friendRDD could be anything, but just in case, I want to check to make sure you know Python list indices begin at zero.

– Jesse Amano
Mar 21 at 21:54













Since you're able to take an element from your RDD after mapping, you should be able to collect them all in the same way. The likely reason for this to be a problem would be if your source RDD contains some problematic rows and some non-problematic rows, and take(1) happened to choose a non-problematic row.

– Jesse Amano
Mar 21 at 22:03





Since you're able to take an element from your RDD after mapping, you should be able to collect them all in the same way. The likely reason for this to be a problem would be if your source RDD contains some problematic rows and some non-problematic rows, and take(1) happened to choose a non-problematic row.

– Jesse Amano
Mar 21 at 22:03













To diagnose, check the source data (your myData RDD) for any "weird" records. Based on what the ValueError says, I'd especially keep an eye out for any vector fields that have an empty (or other non-integer) key.

– Jesse Amano
Mar 21 at 22:05





To diagnose, check the source data (your myData RDD) for any "weird" records. Based on what the ValueError says, I'd especially keep an eye out for any vector fields that have an empty (or other non-integer) key.

– Jesse Amano
Mar 21 at 22:05













@Jesse Amano I understand that the python indices begin at 0. I'll post the myData take(1) above. I intended to take the [1] element of the vector.

– MitterHai
Mar 22 at 5:28






@Jesse Amano I understand that the python indices begin at 0. I'll post the myData take(1) above. I intended to take the [1] element of the vector.

– MitterHai
Mar 22 at 5:28














Ok, just checking ;) I still think you might have some rows in the RDD that don't have the exact same schema as the first row, since take(1) worked with that row but collect() didn't. There aren't any syntactical errors in what you've posted here, so it must be an issue with the data source.

– Jesse Amano
Mar 22 at 16:42





Ok, just checking ;) I still think you might have some rows in the RDD that don't have the exact same schema as the first row, since take(1) worked with that row but collect() didn't. There aren't any syntactical errors in what you've posted here, so it must be an issue with the data source.

– Jesse Amano
Mar 22 at 16:42












0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55286688%2fpyspark-convert-rdd-to-list%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55286688%2fpyspark-convert-rdd-to-list%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript