Python: How to get the similar-sounding words togetherHow can I represent an 'Enum' in Python?How do I copy a file in Python?How can I safely create a nested directory?How to get the current time in PythonHow can I make a time delay in Python?How to remove an element from a list by index?Getting the last element of a listHow do I get the number of elements in a list?How do I concatenate two lists in Python?Why not inherit from List<T>?
...and then she held the gun
Is swap gate equivalent to just exchanging the wire of the two qubits?
What kind of chart is this?
Is a sequel allowed to start before the end of the first book?
Is there a polite way to ask about one's ethnicity?
Bent arrow under a node
Is this broken pipe the reason my freezer is not working? Can it be fixed?
TV show starring two men who develop various gadgets
My student in one course asks for paid tutoring in another course. Appropriate?
How to sort human readable size
How to prevent cables getting intertwined
Would a 7805 5v regulator drain a 9v battery?
Does knowing the surface area of all faces uniquely determine a tetrahedron?
Is it a bad idea to have a pen name with only an initial for a surname?
What is the word?
Why "amatus est" instead of "*amavitur"
Digital signature that is only verifiable by one specific person
Having some issue with notation in a Hilbert space
How do credit card companies know what type of business I'm paying for?
Scaling an object to change its key
How much steel armor can you wear and still be able to swim?
How can the US president give an order to a civilian?
Is using Legacy mode is a bad thing to do?
writing a function between sets vertically
Python: How to get the similar-sounding words together
How can I represent an 'Enum' in Python?How do I copy a file in Python?How can I safely create a nested directory?How to get the current time in PythonHow can I make a time delay in Python?How to remove an element from a list by index?Getting the last element of a listHow do I get the number of elements in a list?How do I concatenate two lists in Python?Why not inherit from List<T>?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I am trying to get all the similar sounding words from a list.
I tried to get them using cosine similarity but that does not fulfil my purpose.
from sklearn.metrics.pairwise import cosine_similarity
dataList = ['two','fourth','forth','dessert','to','desert']
cosine_similarity(dataList)
I know this is not the right approach, I cannot seem to get a result like:
result = ['xx', 'xx', 'yy', 'yy', 'zz', 'zz']
where they mean that the words which sound similar
python python-3.x list
add a comment |
I am trying to get all the similar sounding words from a list.
I tried to get them using cosine similarity but that does not fulfil my purpose.
from sklearn.metrics.pairwise import cosine_similarity
dataList = ['two','fourth','forth','dessert','to','desert']
cosine_similarity(dataList)
I know this is not the right approach, I cannot seem to get a result like:
result = ['xx', 'xx', 'yy', 'yy', 'zz', 'zz']
where they mean that the words which sound similar
python python-3.x list
add a comment |
I am trying to get all the similar sounding words from a list.
I tried to get them using cosine similarity but that does not fulfil my purpose.
from sklearn.metrics.pairwise import cosine_similarity
dataList = ['two','fourth','forth','dessert','to','desert']
cosine_similarity(dataList)
I know this is not the right approach, I cannot seem to get a result like:
result = ['xx', 'xx', 'yy', 'yy', 'zz', 'zz']
where they mean that the words which sound similar
python python-3.x list
I am trying to get all the similar sounding words from a list.
I tried to get them using cosine similarity but that does not fulfil my purpose.
from sklearn.metrics.pairwise import cosine_similarity
dataList = ['two','fourth','forth','dessert','to','desert']
cosine_similarity(dataList)
I know this is not the right approach, I cannot seem to get a result like:
result = ['xx', 'xx', 'yy', 'yy', 'zz', 'zz']
where they mean that the words which sound similar
python python-3.x list
python python-3.x list
edited Mar 25 at 11:19
DirtyBit
1
1
asked Mar 25 at 5:31
user11253016
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
First, you need to use a right way to get the similar sounding words i.e. string similarity, I would suggest:
Using jellyfish
:
from jellyfish import soundex
print(soundex("two"))
print(soundex("to"))
OUTPUT:
T000
T000
Now perhaps, create a function that would handle the list and then sort it to get them:
def getSoundexList(dList):
res = [soundex(x) for x in dList] # iterate over each elem in the dataList
# print(res) # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
return res
dataList = ['two','fourth','forth','dessert','to','desert']
print([x for x in sorted(getSoundexList(dataList))])
OUTPUT:
['D263', 'D263', 'F630', 'F630', 'T000', 'T000']
EDIT:
Another way could be:
Using fuzzy
:
import fuzzy
soundex = fuzzy.Soundex(4)
print(soundex("to"))
print(soundex("two"))
OUTPUT:
T000
T000
EDIT 2:
If you want them grouped
, you could use groupby:
from itertools import groupby
def getSoundexList(dList):
return sorted([soundex(x) for x in dList])
dataList = ['two','fourth','forth','dessert','to','desert']
print([list(g) for _, g in groupby(getSoundexList(dataList), lambda x: x)])
OUTPUT:
[['D263', 'D263'], ['F630', 'F630'], ['T000', 'T000']]
EDIT 3:
This ones for @Eric Duminil, let's say you want both the names
and their respective val
:
Using a dict
along with itemgetter
:
from operator import itemgetter
def getSoundexDict(dList):
return sorted(dict_.items(), key=itemgetter(1)) # sorting the dict_ on val
dataList = ['two','fourth','forth','dessert','to','desert']
res = [soundex(x) for x in dataList] # to get the val for each elem
dict_ = dict(list(zip(dataList, res))) # dict_ with k,v as name/val
print([list(g) for _, g in groupby(getSoundexDict(dataList), lambda x: x[1])])
OUTPUT:
[[('dessert', 'D263'), ('desert', 'D263')], [('fourth', 'F630'), ('forth', 'F630')], [('two', 'T000'), ('to', 'T000')]]
EDIT 4 (for OP):
Soundex:
Soundex is a system whereby values are assigned to names in such a
manner that similar-sounding names get the same value. These values
are known as soundex encodings. A search application based on soundex
will not search for a name directly but rather will search for the
soundex encoding. By doing so, it will obtain all names that sound
like the name being sought.
read more..
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331723%2fpython-how-to-get-the-similar-sounding-words-together%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
First, you need to use a right way to get the similar sounding words i.e. string similarity, I would suggest:
Using jellyfish
:
from jellyfish import soundex
print(soundex("two"))
print(soundex("to"))
OUTPUT:
T000
T000
Now perhaps, create a function that would handle the list and then sort it to get them:
def getSoundexList(dList):
res = [soundex(x) for x in dList] # iterate over each elem in the dataList
# print(res) # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
return res
dataList = ['two','fourth','forth','dessert','to','desert']
print([x for x in sorted(getSoundexList(dataList))])
OUTPUT:
['D263', 'D263', 'F630', 'F630', 'T000', 'T000']
EDIT:
Another way could be:
Using fuzzy
:
import fuzzy
soundex = fuzzy.Soundex(4)
print(soundex("to"))
print(soundex("two"))
OUTPUT:
T000
T000
EDIT 2:
If you want them grouped
, you could use groupby:
from itertools import groupby
def getSoundexList(dList):
return sorted([soundex(x) for x in dList])
dataList = ['two','fourth','forth','dessert','to','desert']
print([list(g) for _, g in groupby(getSoundexList(dataList), lambda x: x)])
OUTPUT:
[['D263', 'D263'], ['F630', 'F630'], ['T000', 'T000']]
EDIT 3:
This ones for @Eric Duminil, let's say you want both the names
and their respective val
:
Using a dict
along with itemgetter
:
from operator import itemgetter
def getSoundexDict(dList):
return sorted(dict_.items(), key=itemgetter(1)) # sorting the dict_ on val
dataList = ['two','fourth','forth','dessert','to','desert']
res = [soundex(x) for x in dataList] # to get the val for each elem
dict_ = dict(list(zip(dataList, res))) # dict_ with k,v as name/val
print([list(g) for _, g in groupby(getSoundexDict(dataList), lambda x: x[1])])
OUTPUT:
[[('dessert', 'D263'), ('desert', 'D263')], [('fourth', 'F630'), ('forth', 'F630')], [('two', 'T000'), ('to', 'T000')]]
EDIT 4 (for OP):
Soundex:
Soundex is a system whereby values are assigned to names in such a
manner that similar-sounding names get the same value. These values
are known as soundex encodings. A search application based on soundex
will not search for a name directly but rather will search for the
soundex encoding. By doing so, it will obtain all names that sound
like the name being sought.
read more..
add a comment |
First, you need to use a right way to get the similar sounding words i.e. string similarity, I would suggest:
Using jellyfish
:
from jellyfish import soundex
print(soundex("two"))
print(soundex("to"))
OUTPUT:
T000
T000
Now perhaps, create a function that would handle the list and then sort it to get them:
def getSoundexList(dList):
res = [soundex(x) for x in dList] # iterate over each elem in the dataList
# print(res) # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
return res
dataList = ['two','fourth','forth','dessert','to','desert']
print([x for x in sorted(getSoundexList(dataList))])
OUTPUT:
['D263', 'D263', 'F630', 'F630', 'T000', 'T000']
EDIT:
Another way could be:
Using fuzzy
:
import fuzzy
soundex = fuzzy.Soundex(4)
print(soundex("to"))
print(soundex("two"))
OUTPUT:
T000
T000
EDIT 2:
If you want them grouped
, you could use groupby:
from itertools import groupby
def getSoundexList(dList):
return sorted([soundex(x) for x in dList])
dataList = ['two','fourth','forth','dessert','to','desert']
print([list(g) for _, g in groupby(getSoundexList(dataList), lambda x: x)])
OUTPUT:
[['D263', 'D263'], ['F630', 'F630'], ['T000', 'T000']]
EDIT 3:
This ones for @Eric Duminil, let's say you want both the names
and their respective val
:
Using a dict
along with itemgetter
:
from operator import itemgetter
def getSoundexDict(dList):
return sorted(dict_.items(), key=itemgetter(1)) # sorting the dict_ on val
dataList = ['two','fourth','forth','dessert','to','desert']
res = [soundex(x) for x in dataList] # to get the val for each elem
dict_ = dict(list(zip(dataList, res))) # dict_ with k,v as name/val
print([list(g) for _, g in groupby(getSoundexDict(dataList), lambda x: x[1])])
OUTPUT:
[[('dessert', 'D263'), ('desert', 'D263')], [('fourth', 'F630'), ('forth', 'F630')], [('two', 'T000'), ('to', 'T000')]]
EDIT 4 (for OP):
Soundex:
Soundex is a system whereby values are assigned to names in such a
manner that similar-sounding names get the same value. These values
are known as soundex encodings. A search application based on soundex
will not search for a name directly but rather will search for the
soundex encoding. By doing so, it will obtain all names that sound
like the name being sought.
read more..
add a comment |
First, you need to use a right way to get the similar sounding words i.e. string similarity, I would suggest:
Using jellyfish
:
from jellyfish import soundex
print(soundex("two"))
print(soundex("to"))
OUTPUT:
T000
T000
Now perhaps, create a function that would handle the list and then sort it to get them:
def getSoundexList(dList):
res = [soundex(x) for x in dList] # iterate over each elem in the dataList
# print(res) # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
return res
dataList = ['two','fourth','forth','dessert','to','desert']
print([x for x in sorted(getSoundexList(dataList))])
OUTPUT:
['D263', 'D263', 'F630', 'F630', 'T000', 'T000']
EDIT:
Another way could be:
Using fuzzy
:
import fuzzy
soundex = fuzzy.Soundex(4)
print(soundex("to"))
print(soundex("two"))
OUTPUT:
T000
T000
EDIT 2:
If you want them grouped
, you could use groupby:
from itertools import groupby
def getSoundexList(dList):
return sorted([soundex(x) for x in dList])
dataList = ['two','fourth','forth','dessert','to','desert']
print([list(g) for _, g in groupby(getSoundexList(dataList), lambda x: x)])
OUTPUT:
[['D263', 'D263'], ['F630', 'F630'], ['T000', 'T000']]
EDIT 3:
This ones for @Eric Duminil, let's say you want both the names
and their respective val
:
Using a dict
along with itemgetter
:
from operator import itemgetter
def getSoundexDict(dList):
return sorted(dict_.items(), key=itemgetter(1)) # sorting the dict_ on val
dataList = ['two','fourth','forth','dessert','to','desert']
res = [soundex(x) for x in dataList] # to get the val for each elem
dict_ = dict(list(zip(dataList, res))) # dict_ with k,v as name/val
print([list(g) for _, g in groupby(getSoundexDict(dataList), lambda x: x[1])])
OUTPUT:
[[('dessert', 'D263'), ('desert', 'D263')], [('fourth', 'F630'), ('forth', 'F630')], [('two', 'T000'), ('to', 'T000')]]
EDIT 4 (for OP):
Soundex:
Soundex is a system whereby values are assigned to names in such a
manner that similar-sounding names get the same value. These values
are known as soundex encodings. A search application based on soundex
will not search for a name directly but rather will search for the
soundex encoding. By doing so, it will obtain all names that sound
like the name being sought.
read more..
First, you need to use a right way to get the similar sounding words i.e. string similarity, I would suggest:
Using jellyfish
:
from jellyfish import soundex
print(soundex("two"))
print(soundex("to"))
OUTPUT:
T000
T000
Now perhaps, create a function that would handle the list and then sort it to get them:
def getSoundexList(dList):
res = [soundex(x) for x in dList] # iterate over each elem in the dataList
# print(res) # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
return res
dataList = ['two','fourth','forth','dessert','to','desert']
print([x for x in sorted(getSoundexList(dataList))])
OUTPUT:
['D263', 'D263', 'F630', 'F630', 'T000', 'T000']
EDIT:
Another way could be:
Using fuzzy
:
import fuzzy
soundex = fuzzy.Soundex(4)
print(soundex("to"))
print(soundex("two"))
OUTPUT:
T000
T000
EDIT 2:
If you want them grouped
, you could use groupby:
from itertools import groupby
def getSoundexList(dList):
return sorted([soundex(x) for x in dList])
dataList = ['two','fourth','forth','dessert','to','desert']
print([list(g) for _, g in groupby(getSoundexList(dataList), lambda x: x)])
OUTPUT:
[['D263', 'D263'], ['F630', 'F630'], ['T000', 'T000']]
EDIT 3:
This ones for @Eric Duminil, let's say you want both the names
and their respective val
:
Using a dict
along with itemgetter
:
from operator import itemgetter
def getSoundexDict(dList):
return sorted(dict_.items(), key=itemgetter(1)) # sorting the dict_ on val
dataList = ['two','fourth','forth','dessert','to','desert']
res = [soundex(x) for x in dataList] # to get the val for each elem
dict_ = dict(list(zip(dataList, res))) # dict_ with k,v as name/val
print([list(g) for _, g in groupby(getSoundexDict(dataList), lambda x: x[1])])
OUTPUT:
[[('dessert', 'D263'), ('desert', 'D263')], [('fourth', 'F630'), ('forth', 'F630')], [('two', 'T000'), ('to', 'T000')]]
EDIT 4 (for OP):
Soundex:
Soundex is a system whereby values are assigned to names in such a
manner that similar-sounding names get the same value. These values
are known as soundex encodings. A search application based on soundex
will not search for a name directly but rather will search for the
soundex encoding. By doing so, it will obtain all names that sound
like the name being sought.
read more..
edited Mar 26 at 6:07
answered Mar 25 at 5:34
DirtyBitDirtyBit
1
1
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331723%2fpython-how-to-get-the-similar-sounding-words-together%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown