Sort or orderBy in pyspark showing strange outputHow do I sort a list of dictionaries by a value of the dictionary?How to flush output of print function?How do I sort a dictionary by value?Why sortBy() cannot sort the data evenly in Spark?Unknown error on PySpark map + broadcastHow to retrieve all columns using pyspark collect_list functionsTransforming a list into pyspark dataframePySpark Access DataFrame columns at foreachPartition() custom functionPySpark divide column by its sumConvert pyspark dataframe column of dense vector into numpy array
Justification of physical currency in an interstellar civilization?
Translation of "invincible independence"
Which "exotic salt" can lower water's freezing point by 70 °C?
How can I finally understand the confusing modal verb "мочь"?
How can I test a shell script in a "safe environment" to avoid harm to my computer?
Why doesn't increasing the temperature of something like wood or paper set them on fire?
What's the 2-minute timer on mobile Deutsche Bahn tickets?
My large rocket is still flipping over
How does jetBlue determine its boarding order?
Scaling rounded rectangles in Illustrator
Are modes in jazz primarily a melody thing?
What is more safe for browsing the web: PC or smartphone?
All of my Firefox add-ons have been disabled suddenly, how can I re-enable them?
Extracting the parent, leaf, and extension from a valid path
Average of samples in a period of time
What does the copyright in a dissertation protect exactly?
An adjective or a noun to describe a very small apartment / house etc
How to increase speed on my hybrid bike with flat handlebars and 700X35C tyres?
Make me a minimum magic sum
Can you just subtract the challenge rating of friendly NPCs?
Can I use LPGL3 for library and Apache 2 for "main()"?
Do the Zhentarim fire members for killing fellow members?
I want to write a blog post building upon someone else's paper, how can I properly cite/credit them?
Was there a dinosaur-counter in the original Jurassic Park movie?
Sort or orderBy in pyspark showing strange output
How do I sort a list of dictionaries by a value of the dictionary?How to flush output of print function?How do I sort a dictionary by value?Why sortBy() cannot sort the data evenly in Spark?Unknown error on PySpark map + broadcastHow to retrieve all columns using pyspark collect_list functionsTransforming a list into pyspark dataframePySpark Access DataFrame columns at foreachPartition() custom functionPySpark divide column by its sumConvert pyspark dataframe column of dense vector into numpy array
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I am trying to sort value in my pyspark dataframe, but its showing me strange output. Instead of sorting by entire number, it is sorting by first digit of entire number
I have tried sort and orderBy method, both are giving same result
sdf=spark.read.csv("dummy.txt", header=True)
sdf.sort('1',ascending=False).show()
I am getting following output
+---+
| 98|
| 9|
| 8|
| 76|
| 7|
| 68|
| 6|
| 54|
| 5|
| 43|
| 4|
| 35|
| 34|
| 34|
| 3|
| 2|
| 2|
| 2|
| 10|
+---+
Can any one explain me this thing
python pyspark
add a comment |
I am trying to sort value in my pyspark dataframe, but its showing me strange output. Instead of sorting by entire number, it is sorting by first digit of entire number
I have tried sort and orderBy method, both are giving same result
sdf=spark.read.csv("dummy.txt", header=True)
sdf.sort('1',ascending=False).show()
I am getting following output
+---+
| 98|
| 9|
| 8|
| 76|
| 7|
| 68|
| 6|
| 54|
| 5|
| 43|
| 4|
| 35|
| 34|
| 34|
| 3|
| 2|
| 2|
| 2|
| 10|
+---+
Can any one explain me this thing
python pyspark
1
That's a string sort (your input is string of numbers, not ints). You need a natural sort, probably passingkey=int
.
– Austin
Mar 23 at 6:21
add a comment |
I am trying to sort value in my pyspark dataframe, but its showing me strange output. Instead of sorting by entire number, it is sorting by first digit of entire number
I have tried sort and orderBy method, both are giving same result
sdf=spark.read.csv("dummy.txt", header=True)
sdf.sort('1',ascending=False).show()
I am getting following output
+---+
| 98|
| 9|
| 8|
| 76|
| 7|
| 68|
| 6|
| 54|
| 5|
| 43|
| 4|
| 35|
| 34|
| 34|
| 3|
| 2|
| 2|
| 2|
| 10|
+---+
Can any one explain me this thing
python pyspark
I am trying to sort value in my pyspark dataframe, but its showing me strange output. Instead of sorting by entire number, it is sorting by first digit of entire number
I have tried sort and orderBy method, both are giving same result
sdf=spark.read.csv("dummy.txt", header=True)
sdf.sort('1',ascending=False).show()
I am getting following output
+---+
| 98|
| 9|
| 8|
| 76|
| 7|
| 68|
| 6|
| 54|
| 5|
| 43|
| 4|
| 35|
| 34|
| 34|
| 3|
| 2|
| 2|
| 2|
| 10|
+---+
Can any one explain me this thing
python pyspark
python pyspark
asked Mar 23 at 6:16
Talha AnwarTalha Anwar
93
93
1
That's a string sort (your input is string of numbers, not ints). You need a natural sort, probably passingkey=int
.
– Austin
Mar 23 at 6:21
add a comment |
1
That's a string sort (your input is string of numbers, not ints). You need a natural sort, probably passingkey=int
.
– Austin
Mar 23 at 6:21
1
1
That's a string sort (your input is string of numbers, not ints). You need a natural sort, probably passing
key=int
.– Austin
Mar 23 at 6:21
That's a string sort (your input is string of numbers, not ints). You need a natural sort, probably passing
key=int
.– Austin
Mar 23 at 6:21
add a comment |
1 Answer
1
active
oldest
votes
As your column contains data of String
type, the String
is being converted into a Sequence of chars and these chars are sorted.It works like a map function.
So, you could do a type cast, and then apply the orderBy
function to achieve your required result.
>>> df
DataFrame[Numb: string]
>>> df.show()
+----+
|Numb|
+----+
| 20|
| 19|
| 1|
| 200|
| 60|
+----+
>>> df.orderBy(df.Numb.cast('int'),ascending=False).show()
+----+
|Numb|
+----+
| 200|
| 60|
| 20|
| 19|
| 1|
+----+
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55311152%2fsort-or-orderby-in-pyspark-showing-strange-output%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
As your column contains data of String
type, the String
is being converted into a Sequence of chars and these chars are sorted.It works like a map function.
So, you could do a type cast, and then apply the orderBy
function to achieve your required result.
>>> df
DataFrame[Numb: string]
>>> df.show()
+----+
|Numb|
+----+
| 20|
| 19|
| 1|
| 200|
| 60|
+----+
>>> df.orderBy(df.Numb.cast('int'),ascending=False).show()
+----+
|Numb|
+----+
| 200|
| 60|
| 20|
| 19|
| 1|
+----+
add a comment |
As your column contains data of String
type, the String
is being converted into a Sequence of chars and these chars are sorted.It works like a map function.
So, you could do a type cast, and then apply the orderBy
function to achieve your required result.
>>> df
DataFrame[Numb: string]
>>> df.show()
+----+
|Numb|
+----+
| 20|
| 19|
| 1|
| 200|
| 60|
+----+
>>> df.orderBy(df.Numb.cast('int'),ascending=False).show()
+----+
|Numb|
+----+
| 200|
| 60|
| 20|
| 19|
| 1|
+----+
add a comment |
As your column contains data of String
type, the String
is being converted into a Sequence of chars and these chars are sorted.It works like a map function.
So, you could do a type cast, and then apply the orderBy
function to achieve your required result.
>>> df
DataFrame[Numb: string]
>>> df.show()
+----+
|Numb|
+----+
| 20|
| 19|
| 1|
| 200|
| 60|
+----+
>>> df.orderBy(df.Numb.cast('int'),ascending=False).show()
+----+
|Numb|
+----+
| 200|
| 60|
| 20|
| 19|
| 1|
+----+
As your column contains data of String
type, the String
is being converted into a Sequence of chars and these chars are sorted.It works like a map function.
So, you could do a type cast, and then apply the orderBy
function to achieve your required result.
>>> df
DataFrame[Numb: string]
>>> df.show()
+----+
|Numb|
+----+
| 20|
| 19|
| 1|
| 200|
| 60|
+----+
>>> df.orderBy(df.Numb.cast('int'),ascending=False).show()
+----+
|Numb|
+----+
| 200|
| 60|
| 20|
| 19|
| 1|
+----+
answered Mar 23 at 7:16
Jim ToddJim Todd
9521611
9521611
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55311152%2fsort-or-orderby-in-pyspark-showing-strange-output%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
That's a string sort (your input is string of numbers, not ints). You need a natural sort, probably passing
key=int
.– Austin
Mar 23 at 6:21