Python: How to get the similar-sounding words togetherHow can I represent an 'Enum' in Python?How do I copy a file in Python?How can I safely create a nested directory?How to get the current time in PythonHow can I make a time delay in Python?How to remove an element from a list by index?Getting the last element of a listHow do I get the number of elements in a list?How do I concatenate two lists in Python?Why not inherit from List<T>?

...and then she held the gun

Is swap gate equivalent to just exchanging the wire of the two qubits?

What kind of chart is this?

Is a sequel allowed to start before the end of the first book?

Is there a polite way to ask about one's ethnicity?

Bent arrow under a node

Is this broken pipe the reason my freezer is not working? Can it be fixed?

TV show starring two men who develop various gadgets

My student in one course asks for paid tutoring in another course. Appropriate?

How to sort human readable size

How to prevent cables getting intertwined

Would a 7805 5v regulator drain a 9v battery?

Does knowing the surface area of all faces uniquely determine a tetrahedron?

Is it a bad idea to have a pen name with only an initial for a surname?

What is the word?

Why "amatus est" instead of "*amavitur"

Digital signature that is only verifiable by one specific person

Having some issue with notation in a Hilbert space

How do credit card companies know what type of business I'm paying for?

Scaling an object to change its key

How much steel armor can you wear and still be able to swim?

How can the US president give an order to a civilian?

Is using Legacy mode is a bad thing to do?

writing a function between sets vertically



Python: How to get the similar-sounding words together


How can I represent an 'Enum' in Python?How do I copy a file in Python?How can I safely create a nested directory?How to get the current time in PythonHow can I make a time delay in Python?How to remove an element from a list by index?Getting the last element of a listHow do I get the number of elements in a list?How do I concatenate two lists in Python?Why not inherit from List<T>?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








19















I am trying to get all the similar sounding words from a list.



I tried to get them using cosine similarity but that does not fulfil my purpose.



from sklearn.metrics.pairwise import cosine_similarity
dataList = ['two','fourth','forth','dessert','to','desert']
cosine_similarity(dataList)


I know this is not the right approach, I cannot seem to get a result like:



result = ['xx', 'xx', 'yy', 'yy', 'zz', 'zz'] 


where they mean that the words which sound similar










share|improve this question






























    19















    I am trying to get all the similar sounding words from a list.



    I tried to get them using cosine similarity but that does not fulfil my purpose.



    from sklearn.metrics.pairwise import cosine_similarity
    dataList = ['two','fourth','forth','dessert','to','desert']
    cosine_similarity(dataList)


    I know this is not the right approach, I cannot seem to get a result like:



    result = ['xx', 'xx', 'yy', 'yy', 'zz', 'zz'] 


    where they mean that the words which sound similar










    share|improve this question


























      19












      19








      19


      4






      I am trying to get all the similar sounding words from a list.



      I tried to get them using cosine similarity but that does not fulfil my purpose.



      from sklearn.metrics.pairwise import cosine_similarity
      dataList = ['two','fourth','forth','dessert','to','desert']
      cosine_similarity(dataList)


      I know this is not the right approach, I cannot seem to get a result like:



      result = ['xx', 'xx', 'yy', 'yy', 'zz', 'zz'] 


      where they mean that the words which sound similar










      share|improve this question
















      I am trying to get all the similar sounding words from a list.



      I tried to get them using cosine similarity but that does not fulfil my purpose.



      from sklearn.metrics.pairwise import cosine_similarity
      dataList = ['two','fourth','forth','dessert','to','desert']
      cosine_similarity(dataList)


      I know this is not the right approach, I cannot seem to get a result like:



      result = ['xx', 'xx', 'yy', 'yy', 'zz', 'zz'] 


      where they mean that the words which sound similar







      python python-3.x list






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 25 at 11:19









      DirtyBit

      1




      1










      asked Mar 25 at 5:31







      user11253016





























          1 Answer
          1






          active

          oldest

          votes


















          30














          First, you need to use a right way to get the similar sounding words i.e. string similarity, I would suggest:



          Using jellyfish:



          from jellyfish import soundex

          print(soundex("two"))
          print(soundex("to"))


          OUTPUT:



          T000
          T000


          Now perhaps, create a function that would handle the list and then sort it to get them:



          def getSoundexList(dList):
          res = [soundex(x) for x in dList] # iterate over each elem in the dataList
          # print(res) # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
          return res

          dataList = ['two','fourth','forth','dessert','to','desert']
          print([x for x in sorted(getSoundexList(dataList))])


          OUTPUT:



          ['D263', 'D263', 'F630', 'F630', 'T000', 'T000']


          EDIT:



          Another way could be:



          Using fuzzy:



          import fuzzy
          soundex = fuzzy.Soundex(4)

          print(soundex("to"))
          print(soundex("two"))


          OUTPUT:



          T000
          T000


          EDIT 2:



          If you want them grouped, you could use groupby:



          from itertools import groupby

          def getSoundexList(dList):
          return sorted([soundex(x) for x in dList])

          dataList = ['two','fourth','forth','dessert','to','desert']
          print([list(g) for _, g in groupby(getSoundexList(dataList), lambda x: x)])


          OUTPUT:



          [['D263', 'D263'], ['F630', 'F630'], ['T000', 'T000']]


          EDIT 3:



          This ones for @Eric Duminil, let's say you want both the names and their respective val:



          Using a dict along with itemgetter:



          from operator import itemgetter

          def getSoundexDict(dList):
          return sorted(dict_.items(), key=itemgetter(1)) # sorting the dict_ on val

          dataList = ['two','fourth','forth','dessert','to','desert']
          res = [soundex(x) for x in dataList] # to get the val for each elem
          dict_ = dict(list(zip(dataList, res))) # dict_ with k,v as name/val

          print([list(g) for _, g in groupby(getSoundexDict(dataList), lambda x: x[1])])


          OUTPUT:



          [[('dessert', 'D263'), ('desert', 'D263')], [('fourth', 'F630'), ('forth', 'F630')], [('two', 'T000'), ('to', 'T000')]]


          EDIT 4 (for OP):



          Soundex:




          Soundex is a system whereby values are assigned to names in such a
          manner that similar-sounding names get the same value. These values
          are known as soundex encodings. A search application based on soundex
          will not search for a name directly but rather will search for the
          soundex encoding. By doing so, it will obtain all names that sound
          like the name being sought.




          read more..






          share|improve this answer

























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331723%2fpython-how-to-get-the-similar-sounding-words-together%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown
























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            30














            First, you need to use a right way to get the similar sounding words i.e. string similarity, I would suggest:



            Using jellyfish:



            from jellyfish import soundex

            print(soundex("two"))
            print(soundex("to"))


            OUTPUT:



            T000
            T000


            Now perhaps, create a function that would handle the list and then sort it to get them:



            def getSoundexList(dList):
            res = [soundex(x) for x in dList] # iterate over each elem in the dataList
            # print(res) # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
            return res

            dataList = ['two','fourth','forth','dessert','to','desert']
            print([x for x in sorted(getSoundexList(dataList))])


            OUTPUT:



            ['D263', 'D263', 'F630', 'F630', 'T000', 'T000']


            EDIT:



            Another way could be:



            Using fuzzy:



            import fuzzy
            soundex = fuzzy.Soundex(4)

            print(soundex("to"))
            print(soundex("two"))


            OUTPUT:



            T000
            T000


            EDIT 2:



            If you want them grouped, you could use groupby:



            from itertools import groupby

            def getSoundexList(dList):
            return sorted([soundex(x) for x in dList])

            dataList = ['two','fourth','forth','dessert','to','desert']
            print([list(g) for _, g in groupby(getSoundexList(dataList), lambda x: x)])


            OUTPUT:



            [['D263', 'D263'], ['F630', 'F630'], ['T000', 'T000']]


            EDIT 3:



            This ones for @Eric Duminil, let's say you want both the names and their respective val:



            Using a dict along with itemgetter:



            from operator import itemgetter

            def getSoundexDict(dList):
            return sorted(dict_.items(), key=itemgetter(1)) # sorting the dict_ on val

            dataList = ['two','fourth','forth','dessert','to','desert']
            res = [soundex(x) for x in dataList] # to get the val for each elem
            dict_ = dict(list(zip(dataList, res))) # dict_ with k,v as name/val

            print([list(g) for _, g in groupby(getSoundexDict(dataList), lambda x: x[1])])


            OUTPUT:



            [[('dessert', 'D263'), ('desert', 'D263')], [('fourth', 'F630'), ('forth', 'F630')], [('two', 'T000'), ('to', 'T000')]]


            EDIT 4 (for OP):



            Soundex:




            Soundex is a system whereby values are assigned to names in such a
            manner that similar-sounding names get the same value. These values
            are known as soundex encodings. A search application based on soundex
            will not search for a name directly but rather will search for the
            soundex encoding. By doing so, it will obtain all names that sound
            like the name being sought.




            read more..






            share|improve this answer





























              30














              First, you need to use a right way to get the similar sounding words i.e. string similarity, I would suggest:



              Using jellyfish:



              from jellyfish import soundex

              print(soundex("two"))
              print(soundex("to"))


              OUTPUT:



              T000
              T000


              Now perhaps, create a function that would handle the list and then sort it to get them:



              def getSoundexList(dList):
              res = [soundex(x) for x in dList] # iterate over each elem in the dataList
              # print(res) # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
              return res

              dataList = ['two','fourth','forth','dessert','to','desert']
              print([x for x in sorted(getSoundexList(dataList))])


              OUTPUT:



              ['D263', 'D263', 'F630', 'F630', 'T000', 'T000']


              EDIT:



              Another way could be:



              Using fuzzy:



              import fuzzy
              soundex = fuzzy.Soundex(4)

              print(soundex("to"))
              print(soundex("two"))


              OUTPUT:



              T000
              T000


              EDIT 2:



              If you want them grouped, you could use groupby:



              from itertools import groupby

              def getSoundexList(dList):
              return sorted([soundex(x) for x in dList])

              dataList = ['two','fourth','forth','dessert','to','desert']
              print([list(g) for _, g in groupby(getSoundexList(dataList), lambda x: x)])


              OUTPUT:



              [['D263', 'D263'], ['F630', 'F630'], ['T000', 'T000']]


              EDIT 3:



              This ones for @Eric Duminil, let's say you want both the names and their respective val:



              Using a dict along with itemgetter:



              from operator import itemgetter

              def getSoundexDict(dList):
              return sorted(dict_.items(), key=itemgetter(1)) # sorting the dict_ on val

              dataList = ['two','fourth','forth','dessert','to','desert']
              res = [soundex(x) for x in dataList] # to get the val for each elem
              dict_ = dict(list(zip(dataList, res))) # dict_ with k,v as name/val

              print([list(g) for _, g in groupby(getSoundexDict(dataList), lambda x: x[1])])


              OUTPUT:



              [[('dessert', 'D263'), ('desert', 'D263')], [('fourth', 'F630'), ('forth', 'F630')], [('two', 'T000'), ('to', 'T000')]]


              EDIT 4 (for OP):



              Soundex:




              Soundex is a system whereby values are assigned to names in such a
              manner that similar-sounding names get the same value. These values
              are known as soundex encodings. A search application based on soundex
              will not search for a name directly but rather will search for the
              soundex encoding. By doing so, it will obtain all names that sound
              like the name being sought.




              read more..






              share|improve this answer



























                30












                30








                30







                First, you need to use a right way to get the similar sounding words i.e. string similarity, I would suggest:



                Using jellyfish:



                from jellyfish import soundex

                print(soundex("two"))
                print(soundex("to"))


                OUTPUT:



                T000
                T000


                Now perhaps, create a function that would handle the list and then sort it to get them:



                def getSoundexList(dList):
                res = [soundex(x) for x in dList] # iterate over each elem in the dataList
                # print(res) # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
                return res

                dataList = ['two','fourth','forth','dessert','to','desert']
                print([x for x in sorted(getSoundexList(dataList))])


                OUTPUT:



                ['D263', 'D263', 'F630', 'F630', 'T000', 'T000']


                EDIT:



                Another way could be:



                Using fuzzy:



                import fuzzy
                soundex = fuzzy.Soundex(4)

                print(soundex("to"))
                print(soundex("two"))


                OUTPUT:



                T000
                T000


                EDIT 2:



                If you want them grouped, you could use groupby:



                from itertools import groupby

                def getSoundexList(dList):
                return sorted([soundex(x) for x in dList])

                dataList = ['two','fourth','forth','dessert','to','desert']
                print([list(g) for _, g in groupby(getSoundexList(dataList), lambda x: x)])


                OUTPUT:



                [['D263', 'D263'], ['F630', 'F630'], ['T000', 'T000']]


                EDIT 3:



                This ones for @Eric Duminil, let's say you want both the names and their respective val:



                Using a dict along with itemgetter:



                from operator import itemgetter

                def getSoundexDict(dList):
                return sorted(dict_.items(), key=itemgetter(1)) # sorting the dict_ on val

                dataList = ['two','fourth','forth','dessert','to','desert']
                res = [soundex(x) for x in dataList] # to get the val for each elem
                dict_ = dict(list(zip(dataList, res))) # dict_ with k,v as name/val

                print([list(g) for _, g in groupby(getSoundexDict(dataList), lambda x: x[1])])


                OUTPUT:



                [[('dessert', 'D263'), ('desert', 'D263')], [('fourth', 'F630'), ('forth', 'F630')], [('two', 'T000'), ('to', 'T000')]]


                EDIT 4 (for OP):



                Soundex:




                Soundex is a system whereby values are assigned to names in such a
                manner that similar-sounding names get the same value. These values
                are known as soundex encodings. A search application based on soundex
                will not search for a name directly but rather will search for the
                soundex encoding. By doing so, it will obtain all names that sound
                like the name being sought.




                read more..






                share|improve this answer















                First, you need to use a right way to get the similar sounding words i.e. string similarity, I would suggest:



                Using jellyfish:



                from jellyfish import soundex

                print(soundex("two"))
                print(soundex("to"))


                OUTPUT:



                T000
                T000


                Now perhaps, create a function that would handle the list and then sort it to get them:



                def getSoundexList(dList):
                res = [soundex(x) for x in dList] # iterate over each elem in the dataList
                # print(res) # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
                return res

                dataList = ['two','fourth','forth','dessert','to','desert']
                print([x for x in sorted(getSoundexList(dataList))])


                OUTPUT:



                ['D263', 'D263', 'F630', 'F630', 'T000', 'T000']


                EDIT:



                Another way could be:



                Using fuzzy:



                import fuzzy
                soundex = fuzzy.Soundex(4)

                print(soundex("to"))
                print(soundex("two"))


                OUTPUT:



                T000
                T000


                EDIT 2:



                If you want them grouped, you could use groupby:



                from itertools import groupby

                def getSoundexList(dList):
                return sorted([soundex(x) for x in dList])

                dataList = ['two','fourth','forth','dessert','to','desert']
                print([list(g) for _, g in groupby(getSoundexList(dataList), lambda x: x)])


                OUTPUT:



                [['D263', 'D263'], ['F630', 'F630'], ['T000', 'T000']]


                EDIT 3:



                This ones for @Eric Duminil, let's say you want both the names and their respective val:



                Using a dict along with itemgetter:



                from operator import itemgetter

                def getSoundexDict(dList):
                return sorted(dict_.items(), key=itemgetter(1)) # sorting the dict_ on val

                dataList = ['two','fourth','forth','dessert','to','desert']
                res = [soundex(x) for x in dataList] # to get the val for each elem
                dict_ = dict(list(zip(dataList, res))) # dict_ with k,v as name/val

                print([list(g) for _, g in groupby(getSoundexDict(dataList), lambda x: x[1])])


                OUTPUT:



                [[('dessert', 'D263'), ('desert', 'D263')], [('fourth', 'F630'), ('forth', 'F630')], [('two', 'T000'), ('to', 'T000')]]


                EDIT 4 (for OP):



                Soundex:




                Soundex is a system whereby values are assigned to names in such a
                manner that similar-sounding names get the same value. These values
                are known as soundex encodings. A search application based on soundex
                will not search for a name directly but rather will search for the
                soundex encoding. By doing so, it will obtain all names that sound
                like the name being sought.




                read more..







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Mar 26 at 6:07

























                answered Mar 25 at 5:34









                DirtyBitDirtyBit

                1




                1





























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331723%2fpython-how-to-get-the-similar-sounding-words-together%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

                    Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

                    Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript