Sort or orderBy in pyspark showing strange outputHow do I sort a list of dictionaries by a value of the dictionary?How to flush output of print function?How do I sort a dictionary by value?Why sortBy() cannot sort the data evenly in Spark?Unknown error on PySpark map + broadcastHow to retrieve all columns using pyspark collect_list functionsTransforming a list into pyspark dataframePySpark Access DataFrame columns at foreachPartition() custom functionPySpark divide column by its sumConvert pyspark dataframe column of dense vector into numpy array

Justification of physical currency in an interstellar civilization?

Translation of "invincible independence"

Which "exotic salt" can lower water's freezing point by 70 °C?

How can I finally understand the confusing modal verb "мочь"?

How can I test a shell script in a "safe environment" to avoid harm to my computer?

Why doesn't increasing the temperature of something like wood or paper set them on fire?

What's the 2-minute timer on mobile Deutsche Bahn tickets?

My large rocket is still flipping over

How does jetBlue determine its boarding order?

Scaling rounded rectangles in Illustrator

Are modes in jazz primarily a melody thing?

What is more safe for browsing the web: PC or smartphone?

All of my Firefox add-ons have been disabled suddenly, how can I re-enable them?

Extracting the parent, leaf, and extension from a valid path

Average of samples in a period of time

What does the copyright in a dissertation protect exactly?

An adjective or a noun to describe a very small apartment / house etc

How to increase speed on my hybrid bike with flat handlebars and 700X35C tyres?

Make me a minimum magic sum

Can you just subtract the challenge rating of friendly NPCs?

Can I use LPGL3 for library and Apache 2 for "main()"?

Do the Zhentarim fire members for killing fellow members?

I want to write a blog post building upon someone else's paper, how can I properly cite/credit them?

Was there a dinosaur-counter in the original Jurassic Park movie?



Sort or orderBy in pyspark showing strange output


How do I sort a list of dictionaries by a value of the dictionary?How to flush output of print function?How do I sort a dictionary by value?Why sortBy() cannot sort the data evenly in Spark?Unknown error on PySpark map + broadcastHow to retrieve all columns using pyspark collect_list functionsTransforming a list into pyspark dataframePySpark Access DataFrame columns at foreachPartition() custom functionPySpark divide column by its sumConvert pyspark dataframe column of dense vector into numpy array






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I am trying to sort value in my pyspark dataframe, but its showing me strange output. Instead of sorting by entire number, it is sorting by first digit of entire number



I have tried sort and orderBy method, both are giving same result



sdf=spark.read.csv("dummy.txt", header=True)
sdf.sort('1',ascending=False).show()


I am getting following output



+---+
| 98|
| 9|
| 8|
| 76|
| 7|
| 68|
| 6|
| 54|
| 5|
| 43|
| 4|
| 35|
| 34|
| 34|
| 3|
| 2|
| 2|
| 2|
| 10|
+---+


Can any one explain me this thing










share|improve this question

















  • 1





    That's a string sort (your input is string of numbers, not ints). You need a natural sort, probably passing key=int.

    – Austin
    Mar 23 at 6:21

















0















I am trying to sort value in my pyspark dataframe, but its showing me strange output. Instead of sorting by entire number, it is sorting by first digit of entire number



I have tried sort and orderBy method, both are giving same result



sdf=spark.read.csv("dummy.txt", header=True)
sdf.sort('1',ascending=False).show()


I am getting following output



+---+
| 98|
| 9|
| 8|
| 76|
| 7|
| 68|
| 6|
| 54|
| 5|
| 43|
| 4|
| 35|
| 34|
| 34|
| 3|
| 2|
| 2|
| 2|
| 10|
+---+


Can any one explain me this thing










share|improve this question

















  • 1





    That's a string sort (your input is string of numbers, not ints). You need a natural sort, probably passing key=int.

    – Austin
    Mar 23 at 6:21













0












0








0








I am trying to sort value in my pyspark dataframe, but its showing me strange output. Instead of sorting by entire number, it is sorting by first digit of entire number



I have tried sort and orderBy method, both are giving same result



sdf=spark.read.csv("dummy.txt", header=True)
sdf.sort('1',ascending=False).show()


I am getting following output



+---+
| 98|
| 9|
| 8|
| 76|
| 7|
| 68|
| 6|
| 54|
| 5|
| 43|
| 4|
| 35|
| 34|
| 34|
| 3|
| 2|
| 2|
| 2|
| 10|
+---+


Can any one explain me this thing










share|improve this question














I am trying to sort value in my pyspark dataframe, but its showing me strange output. Instead of sorting by entire number, it is sorting by first digit of entire number



I have tried sort and orderBy method, both are giving same result



sdf=spark.read.csv("dummy.txt", header=True)
sdf.sort('1',ascending=False).show()


I am getting following output



+---+
| 98|
| 9|
| 8|
| 76|
| 7|
| 68|
| 6|
| 54|
| 5|
| 43|
| 4|
| 35|
| 34|
| 34|
| 3|
| 2|
| 2|
| 2|
| 10|
+---+


Can any one explain me this thing







python pyspark






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 23 at 6:16









Talha AnwarTalha Anwar

93




93







  • 1





    That's a string sort (your input is string of numbers, not ints). You need a natural sort, probably passing key=int.

    – Austin
    Mar 23 at 6:21












  • 1





    That's a string sort (your input is string of numbers, not ints). You need a natural sort, probably passing key=int.

    – Austin
    Mar 23 at 6:21







1




1





That's a string sort (your input is string of numbers, not ints). You need a natural sort, probably passing key=int.

– Austin
Mar 23 at 6:21





That's a string sort (your input is string of numbers, not ints). You need a natural sort, probably passing key=int.

– Austin
Mar 23 at 6:21












1 Answer
1






active

oldest

votes


















0














As your column contains data of String type, the String is being converted into a Sequence of chars and these chars are sorted.It works like a map function.



So, you could do a type cast, and then apply the orderBy function to achieve your required result.



>>> df
DataFrame[Numb: string]
>>> df.show()
+----+
|Numb|
+----+
| 20|
| 19|
| 1|
| 200|
| 60|
+----+

>>> df.orderBy(df.Numb.cast('int'),ascending=False).show()
+----+
|Numb|
+----+
| 200|
| 60|
| 20|
| 19|
| 1|
+----+





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55311152%2fsort-or-orderby-in-pyspark-showing-strange-output%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    As your column contains data of String type, the String is being converted into a Sequence of chars and these chars are sorted.It works like a map function.



    So, you could do a type cast, and then apply the orderBy function to achieve your required result.



    >>> df
    DataFrame[Numb: string]
    >>> df.show()
    +----+
    |Numb|
    +----+
    | 20|
    | 19|
    | 1|
    | 200|
    | 60|
    +----+

    >>> df.orderBy(df.Numb.cast('int'),ascending=False).show()
    +----+
    |Numb|
    +----+
    | 200|
    | 60|
    | 20|
    | 19|
    | 1|
    +----+





    share|improve this answer



























      0














      As your column contains data of String type, the String is being converted into a Sequence of chars and these chars are sorted.It works like a map function.



      So, you could do a type cast, and then apply the orderBy function to achieve your required result.



      >>> df
      DataFrame[Numb: string]
      >>> df.show()
      +----+
      |Numb|
      +----+
      | 20|
      | 19|
      | 1|
      | 200|
      | 60|
      +----+

      >>> df.orderBy(df.Numb.cast('int'),ascending=False).show()
      +----+
      |Numb|
      +----+
      | 200|
      | 60|
      | 20|
      | 19|
      | 1|
      +----+





      share|improve this answer

























        0












        0








        0







        As your column contains data of String type, the String is being converted into a Sequence of chars and these chars are sorted.It works like a map function.



        So, you could do a type cast, and then apply the orderBy function to achieve your required result.



        >>> df
        DataFrame[Numb: string]
        >>> df.show()
        +----+
        |Numb|
        +----+
        | 20|
        | 19|
        | 1|
        | 200|
        | 60|
        +----+

        >>> df.orderBy(df.Numb.cast('int'),ascending=False).show()
        +----+
        |Numb|
        +----+
        | 200|
        | 60|
        | 20|
        | 19|
        | 1|
        +----+





        share|improve this answer













        As your column contains data of String type, the String is being converted into a Sequence of chars and these chars are sorted.It works like a map function.



        So, you could do a type cast, and then apply the orderBy function to achieve your required result.



        >>> df
        DataFrame[Numb: string]
        >>> df.show()
        +----+
        |Numb|
        +----+
        | 20|
        | 19|
        | 1|
        | 200|
        | 60|
        +----+

        >>> df.orderBy(df.Numb.cast('int'),ascending=False).show()
        +----+
        |Numb|
        +----+
        | 200|
        | 60|
        | 20|
        | 19|
        | 1|
        +----+






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 23 at 7:16









        Jim ToddJim Todd

        9521611




        9521611





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55311152%2fsort-or-orderby-in-pyspark-showing-strange-output%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

            Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

            Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript