How to output in R all possible deviations of a word for a fixed distance value?Grouping functions (tapply, by, aggregate) and the *apply familyRemove rows with all or some NAs (missing values) in data.frameHow to build a function to return the minimum distance to [-1,0,1]?Get all possible string sequences where each element comes from different set in RHow do I replace multiple words in the column of a data frame, with a single word?Word-level edit distance between two sentences in RFaster alternative to extract all words in text matching words in another vectorHow to calculate proximity of words to a specific term in a documentCount Word Occurrence in Distance to Defined TermRegex positive lookbehind including all words preceded by word
How could an animal "smell" carbon monoxide?
Why should I cook the flour first when making bechamel sauce?
Why did Steve Rogers choose this character in Endgame?
How to determine the optimal threshold to achieve the highest accuracy
Cine footage fron Saturn V launch's
Is straight-up writing someone's opinions telling?
Is it ethical for a company to ask its employees to move furniture on a weekend?
What details should I consider before agreeing for part of my salary to be 'retained' by employer?
What are "full piece" and "half piece" in chess?
What could be reasoning of male prison in VR world to only allow undershirt and sarong as nightwear to male prisoners
What "fuel more powerful than anything the West (had) in stock" put Laika in orbit aboard Sputnik 2?
Advice for paying off student loans and auto loans now that I have my first 'real' job
Interviewing with an unmentioned 9 months of sick leave taken during a job
Why doesn't philosophy have higher standards for its arguments?
Is the Malay "garam" (salt) related to the Latin "garum" (fish sauce)?
Why isn't aluminium involved in biological processes?
What problems was on a lunar module of Apollo 11?
Can a Resident Assistant Be Told to Ignore a Lawful Order?
What impact would a dragon the size of Asia have on the environment?
How to ask my office to remove the pride decorations without appearing anti-LGBTQ?
What exactly is a Hadouken?
What is the meaning of [[:space:]] in bash?
Can "plane" (aeroplane) be used as a non-count noun?
What is the word for "event executor"?
How to output in R all possible deviations of a word for a fixed distance value?
Grouping functions (tapply, by, aggregate) and the *apply familyRemove rows with all or some NAs (missing values) in data.frameHow to build a function to return the minimum distance to [-1,0,1]?Get all possible string sequences where each element comes from different set in RHow do I replace multiple words in the column of a data frame, with a single word?Word-level edit distance between two sentences in RFaster alternative to extract all words in text matching words in another vectorHow to calculate proximity of words to a specific term in a documentCount Word Occurrence in Distance to Defined TermRegex positive lookbehind including all words preceded by word
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I have a word and want to output in R all possible deviatons (replacement, substitution, insertion) for a fixed distance value into a vector.
For instance, the word "Cat" and a fixed distance value of 1 results in a vector with the elements "cot", "at", ...
r text-mining tidyverse stringr quanteda
add a comment |
I have a word and want to output in R all possible deviatons (replacement, substitution, insertion) for a fixed distance value into a vector.
For instance, the word "Cat" and a fixed distance value of 1 results in a vector with the elements "cot", "at", ...
r text-mining tidyverse stringr quanteda
You might want to try all possible modifications at all possible places and collect result. Be aware, however, that for the distance greater than 1 some modifications may inadvertently cancel out. For example if you replace 'a' with 'o' and then insert 'a' you may get "cat"→"cot"→"coat" which appears just 1 away from "cat", even though 2 modifications were done.
– CiaPan
Mar 26 at 8:21
But how to do it automatically in R ???
– De De
Mar 26 at 12:18
Theadist
function would be a good place to start.
– mickey
Mar 26 at 14:10
Thanks for the suggestion. Is it just a rough idea or do you have anything in mind ??
– De De
Mar 26 at 14:16
add a comment |
I have a word and want to output in R all possible deviatons (replacement, substitution, insertion) for a fixed distance value into a vector.
For instance, the word "Cat" and a fixed distance value of 1 results in a vector with the elements "cot", "at", ...
r text-mining tidyverse stringr quanteda
I have a word and want to output in R all possible deviatons (replacement, substitution, insertion) for a fixed distance value into a vector.
For instance, the word "Cat" and a fixed distance value of 1 results in a vector with the elements "cot", "at", ...
r text-mining tidyverse stringr quanteda
r text-mining tidyverse stringr quanteda
edited Mar 26 at 23:06
Ken Benoit
8,19317 silver badges36 bronze badges
8,19317 silver badges36 bronze badges
asked Mar 26 at 8:12
De DeDe De
245 bronze badges
245 bronze badges
You might want to try all possible modifications at all possible places and collect result. Be aware, however, that for the distance greater than 1 some modifications may inadvertently cancel out. For example if you replace 'a' with 'o' and then insert 'a' you may get "cat"→"cot"→"coat" which appears just 1 away from "cat", even though 2 modifications were done.
– CiaPan
Mar 26 at 8:21
But how to do it automatically in R ???
– De De
Mar 26 at 12:18
Theadist
function would be a good place to start.
– mickey
Mar 26 at 14:10
Thanks for the suggestion. Is it just a rough idea or do you have anything in mind ??
– De De
Mar 26 at 14:16
add a comment |
You might want to try all possible modifications at all possible places and collect result. Be aware, however, that for the distance greater than 1 some modifications may inadvertently cancel out. For example if you replace 'a' with 'o' and then insert 'a' you may get "cat"→"cot"→"coat" which appears just 1 away from "cat", even though 2 modifications were done.
– CiaPan
Mar 26 at 8:21
But how to do it automatically in R ???
– De De
Mar 26 at 12:18
Theadist
function would be a good place to start.
– mickey
Mar 26 at 14:10
Thanks for the suggestion. Is it just a rough idea or do you have anything in mind ??
– De De
Mar 26 at 14:16
You might want to try all possible modifications at all possible places and collect result. Be aware, however, that for the distance greater than 1 some modifications may inadvertently cancel out. For example if you replace 'a' with 'o' and then insert 'a' you may get "cat"→"cot"→"coat" which appears just 1 away from "cat", even though 2 modifications were done.
– CiaPan
Mar 26 at 8:21
You might want to try all possible modifications at all possible places and collect result. Be aware, however, that for the distance greater than 1 some modifications may inadvertently cancel out. For example if you replace 'a' with 'o' and then insert 'a' you may get "cat"→"cot"→"coat" which appears just 1 away from "cat", even though 2 modifications were done.
– CiaPan
Mar 26 at 8:21
But how to do it automatically in R ???
– De De
Mar 26 at 12:18
But how to do it automatically in R ???
– De De
Mar 26 at 12:18
The
adist
function would be a good place to start.– mickey
Mar 26 at 14:10
The
adist
function would be a good place to start.– mickey
Mar 26 at 14:10
Thanks for the suggestion. Is it just a rough idea or do you have anything in mind ??
– De De
Mar 26 at 14:16
Thanks for the suggestion. Is it just a rough idea or do you have anything in mind ??
– De De
Mar 26 at 14:16
add a comment |
1 Answer
1
active
oldest
votes
I'm going to assume that you want all actual words, not just permutations of the characters with an edit distance of 1 that would include non-words such as "zat".
We can do this using adist()
to compute the edit distance between your target word and all eligible English words, taken from some word list. Here, I used the English syllable dictionary from the quanteda package (you did tag this question as quanteda
after all) but this could have been any vector of English dictionary words from any other source as well.
To narrow things down, we first exclude all words that are different in length from the target word by your distance value.
distfn <- function(word, distance = 1)
# select eligible words for efficiency
eligible_y_words <- names(quanteda::data_int_syllables)
wordlengths <- nchar(eligible_y_words)
eligible_y_words <- eligible_y_words[wordlengths >= (nchar(word) - distance) &
wordlengths <= (nchar(word) + distance)]
# compute Levenshtein distance
distances <- utils::adist(word, eligible_y_words)[1, ]
# return only those for the requested distance value
eligible_y_words[distances == distance]
distfn("cat", 1)
## [1] "at" "bat" "ca" "cab" "cac" "cad" "cai" "cal" "cam" "can"
## [11] "cant" "cao" "cap" "caq" "car" "cart" "cas" "cast" "cate" "cato"
## [21] "cats" "catt" "cau" "caw" "cay" "chat" "coat" "cot" "ct" "cut"
## [31] "dat" "eat" "fat" "gat" "hat" "kat" "lat" "mat" "nat" "oat"
## [41] "pat" "rat" "sat" "scat" "tat" "vat" "wat"
To demonstrate how this works on longer words, with alternative distance values.
distfn("coffee", 1)
## [1] "caffee" "coffeen" "coffees" "coffel" "coffer" "coffey" "cuffee"
## [8] "toffee"
distfn("coffee", 2)
## [1] "caffey" "calfee" "chafee" "chaffee" "cofer" "coffee's"
## [7] "coffelt" "coffers" "coffin" "cofide" "cohee" "coiffe"
## [13] "coiffed" "colee" "colfer" "combee" "comfed" "confer"
## [19] "conlee" "coppee" "cottee" "coulee" "coutee" "cuffe"
## [25] "cuffed" "diffee" "duffee" "hoffer" "jaffee" "joffe"
## [31] "mcaffee" "moffet" "noffke" "offen" "offer" "roffe"
## [37] "scoffed" "soffel" "soffer" "yoffie"
(Yes, according to the CMU pronunciation dictionary, those are all actual words...)
EDIT: Make for all permutations of letters, not just actual words
This involves permutations from the alphabet that have the fixed edit distances from the input word. Here I've done it not particular efficiently by forming all permutations of letters within the eligible ranges, and then computing their edit distance from the target word, and then selecting them. So it's a variation of above, except instead of a dictionary, it uses permuted words.
distfn2 <- function(word, distance = 1)
result <- character()
# start with deletions
for (i in max((nchar(word) - distance), 0):(nchar(word) - 1))
result <- c(
result,
combn(unlist(strsplit(word, "", fixed = TRUE)), i,
paste,
collapse = "", simplify = TRUE
)
)
# now for changes and insertions
for (i in (nchar(word)):(nchar(word) + distance))
# all possible edits
edits <- apply(expand.grid(rep(list(letters), i)),
1, paste0,
collapse = ""
)
# remove original word
edits <- edits[edits != word]
# get all distances, add to result
distances <- utils::adist(word, edits)[1, ]
result <- c(result, edits[distances == distance])
result
For the OP example:
distfn2("cat", 1)
## [1] "ca" "ct" "at" "caa" "cab" "cac" "cad" "cae" "caf" "cag"
## [11] "cah" "cai" "caj" "cak" "cal" "cam" "can" "cao" "cap" "caq"
## [21] "car" "cas" "aat" "bat" "dat" "eat" "fat" "gat" "hat" "iat"
## [31] "jat" "kat" "lat" "mat" "nat" "oat" "pat" "qat" "rat" "sat"
## [41] "tat" "uat" "vat" "wat" "xat" "yat" "zat" "cbt" "cct" "cdt"
## [51] "cet" "cft" "cgt" "cht" "cit" "cjt" "ckt" "clt" "cmt" "cnt"
## [61] "cot" "cpt" "cqt" "crt" "cst" "ctt" "cut" "cvt" "cwt" "cxt"
## [71] "cyt" "czt" "cau" "cav" "caw" "cax" "cay" "caz" "cata" "catb"
## [81] "catc" "catd" "cate" "catf" "catg" "cath" "cati" "catj" "catk" "catl"
## [91] "catm" "catn" "cato" "catp" "catq" "catr" "cats" "caat" "cbat" "acat"
## [101] "bcat" "ccat" "dcat" "ecat" "fcat" "gcat" "hcat" "icat" "jcat" "kcat"
## [111] "lcat" "mcat" "ncat" "ocat" "pcat" "qcat" "rcat" "scat" "tcat" "ucat"
## [121] "vcat" "wcat" "xcat" "ycat" "zcat" "cdat" "ceat" "cfat" "cgat" "chat"
## [131] "ciat" "cjat" "ckat" "clat" "cmat" "cnat" "coat" "cpat" "cqat" "crat"
## [141] "csat" "ctat" "cuat" "cvat" "cwat" "cxat" "cyat" "czat" "cabt" "cact"
## [151] "cadt" "caet" "caft" "cagt" "caht" "cait" "cajt" "cakt" "calt" "camt"
## [161] "cant" "caot" "capt" "caqt" "cart" "cast" "catt" "caut" "cavt" "cawt"
## [171] "caxt" "cayt" "cazt" "catu" "catv" "catw" "catx" "caty" "catz"
Also works with other edit distances, although it becomes very slow for longer words.
d2 <- distfn2("cat", 2)
set.seed(100)
c(head(d2, 50), sample(d2, 50), tail(d2, 50))
## [1] "c" "a" "t" "ca" "ct" "at" "aaa" "baa"
## [9] "daa" "eaa" "faa" "gaa" "haa" "iaa" "jaa" "kaa"
## [17] "laa" "maa" "naa" "oaa" "paa" "qaa" "raa" "saa"
## [25] "taa" "uaa" "vaa" "waa" "xaa" "yaa" "zaa" "cba"
## [33] "aca" "bca" "cca" "dca" "eca" "fca" "gca" "hca"
## [41] "ica" "jca" "kca" "lca" "mca" "nca" "oca" "pca"
## [49] "qca" "rca" "cnts" "cian" "pcatb" "cqo" "uawt" "hazt"
## [57] "cpxat" "aaet" "ckata" "caod" "ncatl" "qcamt" "cdtp" "qajt"
## [65] "bckat" "qcatr" "cqah" "rcbt" "cvbt" "bbcat" "vcaz" "ylcat"
## [73] "cahz" "jcgat" "mant" "jatd" "czlat" "cbamt" "cajta" "cafp"
## [81] "cizt" "cmaut" "qwat" "jcazt" "hdcat" "ucant" "hate" "cajtl"
## [89] "caaty" "cix" "nmat" "cajit" "cmnat" "caobt" "catoi" "ncau"
## [97] "ucoat" "ncamt" "jath" "oats" "chatz" "ciatz" "cjatz" "ckatz"
## [105] "clatz" "cmatz" "cnatz" "coatz" "cpatz" "cqatz" "cratz" "csatz"
## [113] "ctatz" "cuatz" "cvatz" "cwatz" "cxatz" "cyatz" "czatz" "cabtz"
## [121] "cactz" "cadtz" "caetz" "caftz" "cagtz" "cahtz" "caitz" "cajtz"
## [129] "caktz" "caltz" "camtz" "cantz" "caotz" "captz" "caqtz" "cartz"
## [137] "castz" "cattz" "cautz" "cavtz" "cawtz" "caxtz" "caytz" "caztz"
## [145] "catuz" "catvz" "catwz" "catxz" "catyz" "catzz"
This could be speeded up by less brute force formation of all permutations and then applying adist()
to them - it could consist of changes or insertions of known edit distances generated algorithmically from letters
.
Thank you very much for your effort! But I am searching for all possible combinations (permutations) ...
– De De
Mar 27 at 14:58
OK I added that.
– Ken Benoit
Mar 28 at 16:46
Many thanks for your support!
– De De
Apr 4 at 16:47
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55352498%2fhow-to-output-in-r-all-possible-deviations-of-a-word-for-a-fixed-distance-value%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I'm going to assume that you want all actual words, not just permutations of the characters with an edit distance of 1 that would include non-words such as "zat".
We can do this using adist()
to compute the edit distance between your target word and all eligible English words, taken from some word list. Here, I used the English syllable dictionary from the quanteda package (you did tag this question as quanteda
after all) but this could have been any vector of English dictionary words from any other source as well.
To narrow things down, we first exclude all words that are different in length from the target word by your distance value.
distfn <- function(word, distance = 1)
# select eligible words for efficiency
eligible_y_words <- names(quanteda::data_int_syllables)
wordlengths <- nchar(eligible_y_words)
eligible_y_words <- eligible_y_words[wordlengths >= (nchar(word) - distance) &
wordlengths <= (nchar(word) + distance)]
# compute Levenshtein distance
distances <- utils::adist(word, eligible_y_words)[1, ]
# return only those for the requested distance value
eligible_y_words[distances == distance]
distfn("cat", 1)
## [1] "at" "bat" "ca" "cab" "cac" "cad" "cai" "cal" "cam" "can"
## [11] "cant" "cao" "cap" "caq" "car" "cart" "cas" "cast" "cate" "cato"
## [21] "cats" "catt" "cau" "caw" "cay" "chat" "coat" "cot" "ct" "cut"
## [31] "dat" "eat" "fat" "gat" "hat" "kat" "lat" "mat" "nat" "oat"
## [41] "pat" "rat" "sat" "scat" "tat" "vat" "wat"
To demonstrate how this works on longer words, with alternative distance values.
distfn("coffee", 1)
## [1] "caffee" "coffeen" "coffees" "coffel" "coffer" "coffey" "cuffee"
## [8] "toffee"
distfn("coffee", 2)
## [1] "caffey" "calfee" "chafee" "chaffee" "cofer" "coffee's"
## [7] "coffelt" "coffers" "coffin" "cofide" "cohee" "coiffe"
## [13] "coiffed" "colee" "colfer" "combee" "comfed" "confer"
## [19] "conlee" "coppee" "cottee" "coulee" "coutee" "cuffe"
## [25] "cuffed" "diffee" "duffee" "hoffer" "jaffee" "joffe"
## [31] "mcaffee" "moffet" "noffke" "offen" "offer" "roffe"
## [37] "scoffed" "soffel" "soffer" "yoffie"
(Yes, according to the CMU pronunciation dictionary, those are all actual words...)
EDIT: Make for all permutations of letters, not just actual words
This involves permutations from the alphabet that have the fixed edit distances from the input word. Here I've done it not particular efficiently by forming all permutations of letters within the eligible ranges, and then computing their edit distance from the target word, and then selecting them. So it's a variation of above, except instead of a dictionary, it uses permuted words.
distfn2 <- function(word, distance = 1)
result <- character()
# start with deletions
for (i in max((nchar(word) - distance), 0):(nchar(word) - 1))
result <- c(
result,
combn(unlist(strsplit(word, "", fixed = TRUE)), i,
paste,
collapse = "", simplify = TRUE
)
)
# now for changes and insertions
for (i in (nchar(word)):(nchar(word) + distance))
# all possible edits
edits <- apply(expand.grid(rep(list(letters), i)),
1, paste0,
collapse = ""
)
# remove original word
edits <- edits[edits != word]
# get all distances, add to result
distances <- utils::adist(word, edits)[1, ]
result <- c(result, edits[distances == distance])
result
For the OP example:
distfn2("cat", 1)
## [1] "ca" "ct" "at" "caa" "cab" "cac" "cad" "cae" "caf" "cag"
## [11] "cah" "cai" "caj" "cak" "cal" "cam" "can" "cao" "cap" "caq"
## [21] "car" "cas" "aat" "bat" "dat" "eat" "fat" "gat" "hat" "iat"
## [31] "jat" "kat" "lat" "mat" "nat" "oat" "pat" "qat" "rat" "sat"
## [41] "tat" "uat" "vat" "wat" "xat" "yat" "zat" "cbt" "cct" "cdt"
## [51] "cet" "cft" "cgt" "cht" "cit" "cjt" "ckt" "clt" "cmt" "cnt"
## [61] "cot" "cpt" "cqt" "crt" "cst" "ctt" "cut" "cvt" "cwt" "cxt"
## [71] "cyt" "czt" "cau" "cav" "caw" "cax" "cay" "caz" "cata" "catb"
## [81] "catc" "catd" "cate" "catf" "catg" "cath" "cati" "catj" "catk" "catl"
## [91] "catm" "catn" "cato" "catp" "catq" "catr" "cats" "caat" "cbat" "acat"
## [101] "bcat" "ccat" "dcat" "ecat" "fcat" "gcat" "hcat" "icat" "jcat" "kcat"
## [111] "lcat" "mcat" "ncat" "ocat" "pcat" "qcat" "rcat" "scat" "tcat" "ucat"
## [121] "vcat" "wcat" "xcat" "ycat" "zcat" "cdat" "ceat" "cfat" "cgat" "chat"
## [131] "ciat" "cjat" "ckat" "clat" "cmat" "cnat" "coat" "cpat" "cqat" "crat"
## [141] "csat" "ctat" "cuat" "cvat" "cwat" "cxat" "cyat" "czat" "cabt" "cact"
## [151] "cadt" "caet" "caft" "cagt" "caht" "cait" "cajt" "cakt" "calt" "camt"
## [161] "cant" "caot" "capt" "caqt" "cart" "cast" "catt" "caut" "cavt" "cawt"
## [171] "caxt" "cayt" "cazt" "catu" "catv" "catw" "catx" "caty" "catz"
Also works with other edit distances, although it becomes very slow for longer words.
d2 <- distfn2("cat", 2)
set.seed(100)
c(head(d2, 50), sample(d2, 50), tail(d2, 50))
## [1] "c" "a" "t" "ca" "ct" "at" "aaa" "baa"
## [9] "daa" "eaa" "faa" "gaa" "haa" "iaa" "jaa" "kaa"
## [17] "laa" "maa" "naa" "oaa" "paa" "qaa" "raa" "saa"
## [25] "taa" "uaa" "vaa" "waa" "xaa" "yaa" "zaa" "cba"
## [33] "aca" "bca" "cca" "dca" "eca" "fca" "gca" "hca"
## [41] "ica" "jca" "kca" "lca" "mca" "nca" "oca" "pca"
## [49] "qca" "rca" "cnts" "cian" "pcatb" "cqo" "uawt" "hazt"
## [57] "cpxat" "aaet" "ckata" "caod" "ncatl" "qcamt" "cdtp" "qajt"
## [65] "bckat" "qcatr" "cqah" "rcbt" "cvbt" "bbcat" "vcaz" "ylcat"
## [73] "cahz" "jcgat" "mant" "jatd" "czlat" "cbamt" "cajta" "cafp"
## [81] "cizt" "cmaut" "qwat" "jcazt" "hdcat" "ucant" "hate" "cajtl"
## [89] "caaty" "cix" "nmat" "cajit" "cmnat" "caobt" "catoi" "ncau"
## [97] "ucoat" "ncamt" "jath" "oats" "chatz" "ciatz" "cjatz" "ckatz"
## [105] "clatz" "cmatz" "cnatz" "coatz" "cpatz" "cqatz" "cratz" "csatz"
## [113] "ctatz" "cuatz" "cvatz" "cwatz" "cxatz" "cyatz" "czatz" "cabtz"
## [121] "cactz" "cadtz" "caetz" "caftz" "cagtz" "cahtz" "caitz" "cajtz"
## [129] "caktz" "caltz" "camtz" "cantz" "caotz" "captz" "caqtz" "cartz"
## [137] "castz" "cattz" "cautz" "cavtz" "cawtz" "caxtz" "caytz" "caztz"
## [145] "catuz" "catvz" "catwz" "catxz" "catyz" "catzz"
This could be speeded up by less brute force formation of all permutations and then applying adist()
to them - it could consist of changes or insertions of known edit distances generated algorithmically from letters
.
Thank you very much for your effort! But I am searching for all possible combinations (permutations) ...
– De De
Mar 27 at 14:58
OK I added that.
– Ken Benoit
Mar 28 at 16:46
Many thanks for your support!
– De De
Apr 4 at 16:47
add a comment |
I'm going to assume that you want all actual words, not just permutations of the characters with an edit distance of 1 that would include non-words such as "zat".
We can do this using adist()
to compute the edit distance between your target word and all eligible English words, taken from some word list. Here, I used the English syllable dictionary from the quanteda package (you did tag this question as quanteda
after all) but this could have been any vector of English dictionary words from any other source as well.
To narrow things down, we first exclude all words that are different in length from the target word by your distance value.
distfn <- function(word, distance = 1)
# select eligible words for efficiency
eligible_y_words <- names(quanteda::data_int_syllables)
wordlengths <- nchar(eligible_y_words)
eligible_y_words <- eligible_y_words[wordlengths >= (nchar(word) - distance) &
wordlengths <= (nchar(word) + distance)]
# compute Levenshtein distance
distances <- utils::adist(word, eligible_y_words)[1, ]
# return only those for the requested distance value
eligible_y_words[distances == distance]
distfn("cat", 1)
## [1] "at" "bat" "ca" "cab" "cac" "cad" "cai" "cal" "cam" "can"
## [11] "cant" "cao" "cap" "caq" "car" "cart" "cas" "cast" "cate" "cato"
## [21] "cats" "catt" "cau" "caw" "cay" "chat" "coat" "cot" "ct" "cut"
## [31] "dat" "eat" "fat" "gat" "hat" "kat" "lat" "mat" "nat" "oat"
## [41] "pat" "rat" "sat" "scat" "tat" "vat" "wat"
To demonstrate how this works on longer words, with alternative distance values.
distfn("coffee", 1)
## [1] "caffee" "coffeen" "coffees" "coffel" "coffer" "coffey" "cuffee"
## [8] "toffee"
distfn("coffee", 2)
## [1] "caffey" "calfee" "chafee" "chaffee" "cofer" "coffee's"
## [7] "coffelt" "coffers" "coffin" "cofide" "cohee" "coiffe"
## [13] "coiffed" "colee" "colfer" "combee" "comfed" "confer"
## [19] "conlee" "coppee" "cottee" "coulee" "coutee" "cuffe"
## [25] "cuffed" "diffee" "duffee" "hoffer" "jaffee" "joffe"
## [31] "mcaffee" "moffet" "noffke" "offen" "offer" "roffe"
## [37] "scoffed" "soffel" "soffer" "yoffie"
(Yes, according to the CMU pronunciation dictionary, those are all actual words...)
EDIT: Make for all permutations of letters, not just actual words
This involves permutations from the alphabet that have the fixed edit distances from the input word. Here I've done it not particular efficiently by forming all permutations of letters within the eligible ranges, and then computing their edit distance from the target word, and then selecting them. So it's a variation of above, except instead of a dictionary, it uses permuted words.
distfn2 <- function(word, distance = 1)
result <- character()
# start with deletions
for (i in max((nchar(word) - distance), 0):(nchar(word) - 1))
result <- c(
result,
combn(unlist(strsplit(word, "", fixed = TRUE)), i,
paste,
collapse = "", simplify = TRUE
)
)
# now for changes and insertions
for (i in (nchar(word)):(nchar(word) + distance))
# all possible edits
edits <- apply(expand.grid(rep(list(letters), i)),
1, paste0,
collapse = ""
)
# remove original word
edits <- edits[edits != word]
# get all distances, add to result
distances <- utils::adist(word, edits)[1, ]
result <- c(result, edits[distances == distance])
result
For the OP example:
distfn2("cat", 1)
## [1] "ca" "ct" "at" "caa" "cab" "cac" "cad" "cae" "caf" "cag"
## [11] "cah" "cai" "caj" "cak" "cal" "cam" "can" "cao" "cap" "caq"
## [21] "car" "cas" "aat" "bat" "dat" "eat" "fat" "gat" "hat" "iat"
## [31] "jat" "kat" "lat" "mat" "nat" "oat" "pat" "qat" "rat" "sat"
## [41] "tat" "uat" "vat" "wat" "xat" "yat" "zat" "cbt" "cct" "cdt"
## [51] "cet" "cft" "cgt" "cht" "cit" "cjt" "ckt" "clt" "cmt" "cnt"
## [61] "cot" "cpt" "cqt" "crt" "cst" "ctt" "cut" "cvt" "cwt" "cxt"
## [71] "cyt" "czt" "cau" "cav" "caw" "cax" "cay" "caz" "cata" "catb"
## [81] "catc" "catd" "cate" "catf" "catg" "cath" "cati" "catj" "catk" "catl"
## [91] "catm" "catn" "cato" "catp" "catq" "catr" "cats" "caat" "cbat" "acat"
## [101] "bcat" "ccat" "dcat" "ecat" "fcat" "gcat" "hcat" "icat" "jcat" "kcat"
## [111] "lcat" "mcat" "ncat" "ocat" "pcat" "qcat" "rcat" "scat" "tcat" "ucat"
## [121] "vcat" "wcat" "xcat" "ycat" "zcat" "cdat" "ceat" "cfat" "cgat" "chat"
## [131] "ciat" "cjat" "ckat" "clat" "cmat" "cnat" "coat" "cpat" "cqat" "crat"
## [141] "csat" "ctat" "cuat" "cvat" "cwat" "cxat" "cyat" "czat" "cabt" "cact"
## [151] "cadt" "caet" "caft" "cagt" "caht" "cait" "cajt" "cakt" "calt" "camt"
## [161] "cant" "caot" "capt" "caqt" "cart" "cast" "catt" "caut" "cavt" "cawt"
## [171] "caxt" "cayt" "cazt" "catu" "catv" "catw" "catx" "caty" "catz"
Also works with other edit distances, although it becomes very slow for longer words.
d2 <- distfn2("cat", 2)
set.seed(100)
c(head(d2, 50), sample(d2, 50), tail(d2, 50))
## [1] "c" "a" "t" "ca" "ct" "at" "aaa" "baa"
## [9] "daa" "eaa" "faa" "gaa" "haa" "iaa" "jaa" "kaa"
## [17] "laa" "maa" "naa" "oaa" "paa" "qaa" "raa" "saa"
## [25] "taa" "uaa" "vaa" "waa" "xaa" "yaa" "zaa" "cba"
## [33] "aca" "bca" "cca" "dca" "eca" "fca" "gca" "hca"
## [41] "ica" "jca" "kca" "lca" "mca" "nca" "oca" "pca"
## [49] "qca" "rca" "cnts" "cian" "pcatb" "cqo" "uawt" "hazt"
## [57] "cpxat" "aaet" "ckata" "caod" "ncatl" "qcamt" "cdtp" "qajt"
## [65] "bckat" "qcatr" "cqah" "rcbt" "cvbt" "bbcat" "vcaz" "ylcat"
## [73] "cahz" "jcgat" "mant" "jatd" "czlat" "cbamt" "cajta" "cafp"
## [81] "cizt" "cmaut" "qwat" "jcazt" "hdcat" "ucant" "hate" "cajtl"
## [89] "caaty" "cix" "nmat" "cajit" "cmnat" "caobt" "catoi" "ncau"
## [97] "ucoat" "ncamt" "jath" "oats" "chatz" "ciatz" "cjatz" "ckatz"
## [105] "clatz" "cmatz" "cnatz" "coatz" "cpatz" "cqatz" "cratz" "csatz"
## [113] "ctatz" "cuatz" "cvatz" "cwatz" "cxatz" "cyatz" "czatz" "cabtz"
## [121] "cactz" "cadtz" "caetz" "caftz" "cagtz" "cahtz" "caitz" "cajtz"
## [129] "caktz" "caltz" "camtz" "cantz" "caotz" "captz" "caqtz" "cartz"
## [137] "castz" "cattz" "cautz" "cavtz" "cawtz" "caxtz" "caytz" "caztz"
## [145] "catuz" "catvz" "catwz" "catxz" "catyz" "catzz"
This could be speeded up by less brute force formation of all permutations and then applying adist()
to them - it could consist of changes or insertions of known edit distances generated algorithmically from letters
.
Thank you very much for your effort! But I am searching for all possible combinations (permutations) ...
– De De
Mar 27 at 14:58
OK I added that.
– Ken Benoit
Mar 28 at 16:46
Many thanks for your support!
– De De
Apr 4 at 16:47
add a comment |
I'm going to assume that you want all actual words, not just permutations of the characters with an edit distance of 1 that would include non-words such as "zat".
We can do this using adist()
to compute the edit distance between your target word and all eligible English words, taken from some word list. Here, I used the English syllable dictionary from the quanteda package (you did tag this question as quanteda
after all) but this could have been any vector of English dictionary words from any other source as well.
To narrow things down, we first exclude all words that are different in length from the target word by your distance value.
distfn <- function(word, distance = 1)
# select eligible words for efficiency
eligible_y_words <- names(quanteda::data_int_syllables)
wordlengths <- nchar(eligible_y_words)
eligible_y_words <- eligible_y_words[wordlengths >= (nchar(word) - distance) &
wordlengths <= (nchar(word) + distance)]
# compute Levenshtein distance
distances <- utils::adist(word, eligible_y_words)[1, ]
# return only those for the requested distance value
eligible_y_words[distances == distance]
distfn("cat", 1)
## [1] "at" "bat" "ca" "cab" "cac" "cad" "cai" "cal" "cam" "can"
## [11] "cant" "cao" "cap" "caq" "car" "cart" "cas" "cast" "cate" "cato"
## [21] "cats" "catt" "cau" "caw" "cay" "chat" "coat" "cot" "ct" "cut"
## [31] "dat" "eat" "fat" "gat" "hat" "kat" "lat" "mat" "nat" "oat"
## [41] "pat" "rat" "sat" "scat" "tat" "vat" "wat"
To demonstrate how this works on longer words, with alternative distance values.
distfn("coffee", 1)
## [1] "caffee" "coffeen" "coffees" "coffel" "coffer" "coffey" "cuffee"
## [8] "toffee"
distfn("coffee", 2)
## [1] "caffey" "calfee" "chafee" "chaffee" "cofer" "coffee's"
## [7] "coffelt" "coffers" "coffin" "cofide" "cohee" "coiffe"
## [13] "coiffed" "colee" "colfer" "combee" "comfed" "confer"
## [19] "conlee" "coppee" "cottee" "coulee" "coutee" "cuffe"
## [25] "cuffed" "diffee" "duffee" "hoffer" "jaffee" "joffe"
## [31] "mcaffee" "moffet" "noffke" "offen" "offer" "roffe"
## [37] "scoffed" "soffel" "soffer" "yoffie"
(Yes, according to the CMU pronunciation dictionary, those are all actual words...)
EDIT: Make for all permutations of letters, not just actual words
This involves permutations from the alphabet that have the fixed edit distances from the input word. Here I've done it not particular efficiently by forming all permutations of letters within the eligible ranges, and then computing their edit distance from the target word, and then selecting them. So it's a variation of above, except instead of a dictionary, it uses permuted words.
distfn2 <- function(word, distance = 1)
result <- character()
# start with deletions
for (i in max((nchar(word) - distance), 0):(nchar(word) - 1))
result <- c(
result,
combn(unlist(strsplit(word, "", fixed = TRUE)), i,
paste,
collapse = "", simplify = TRUE
)
)
# now for changes and insertions
for (i in (nchar(word)):(nchar(word) + distance))
# all possible edits
edits <- apply(expand.grid(rep(list(letters), i)),
1, paste0,
collapse = ""
)
# remove original word
edits <- edits[edits != word]
# get all distances, add to result
distances <- utils::adist(word, edits)[1, ]
result <- c(result, edits[distances == distance])
result
For the OP example:
distfn2("cat", 1)
## [1] "ca" "ct" "at" "caa" "cab" "cac" "cad" "cae" "caf" "cag"
## [11] "cah" "cai" "caj" "cak" "cal" "cam" "can" "cao" "cap" "caq"
## [21] "car" "cas" "aat" "bat" "dat" "eat" "fat" "gat" "hat" "iat"
## [31] "jat" "kat" "lat" "mat" "nat" "oat" "pat" "qat" "rat" "sat"
## [41] "tat" "uat" "vat" "wat" "xat" "yat" "zat" "cbt" "cct" "cdt"
## [51] "cet" "cft" "cgt" "cht" "cit" "cjt" "ckt" "clt" "cmt" "cnt"
## [61] "cot" "cpt" "cqt" "crt" "cst" "ctt" "cut" "cvt" "cwt" "cxt"
## [71] "cyt" "czt" "cau" "cav" "caw" "cax" "cay" "caz" "cata" "catb"
## [81] "catc" "catd" "cate" "catf" "catg" "cath" "cati" "catj" "catk" "catl"
## [91] "catm" "catn" "cato" "catp" "catq" "catr" "cats" "caat" "cbat" "acat"
## [101] "bcat" "ccat" "dcat" "ecat" "fcat" "gcat" "hcat" "icat" "jcat" "kcat"
## [111] "lcat" "mcat" "ncat" "ocat" "pcat" "qcat" "rcat" "scat" "tcat" "ucat"
## [121] "vcat" "wcat" "xcat" "ycat" "zcat" "cdat" "ceat" "cfat" "cgat" "chat"
## [131] "ciat" "cjat" "ckat" "clat" "cmat" "cnat" "coat" "cpat" "cqat" "crat"
## [141] "csat" "ctat" "cuat" "cvat" "cwat" "cxat" "cyat" "czat" "cabt" "cact"
## [151] "cadt" "caet" "caft" "cagt" "caht" "cait" "cajt" "cakt" "calt" "camt"
## [161] "cant" "caot" "capt" "caqt" "cart" "cast" "catt" "caut" "cavt" "cawt"
## [171] "caxt" "cayt" "cazt" "catu" "catv" "catw" "catx" "caty" "catz"
Also works with other edit distances, although it becomes very slow for longer words.
d2 <- distfn2("cat", 2)
set.seed(100)
c(head(d2, 50), sample(d2, 50), tail(d2, 50))
## [1] "c" "a" "t" "ca" "ct" "at" "aaa" "baa"
## [9] "daa" "eaa" "faa" "gaa" "haa" "iaa" "jaa" "kaa"
## [17] "laa" "maa" "naa" "oaa" "paa" "qaa" "raa" "saa"
## [25] "taa" "uaa" "vaa" "waa" "xaa" "yaa" "zaa" "cba"
## [33] "aca" "bca" "cca" "dca" "eca" "fca" "gca" "hca"
## [41] "ica" "jca" "kca" "lca" "mca" "nca" "oca" "pca"
## [49] "qca" "rca" "cnts" "cian" "pcatb" "cqo" "uawt" "hazt"
## [57] "cpxat" "aaet" "ckata" "caod" "ncatl" "qcamt" "cdtp" "qajt"
## [65] "bckat" "qcatr" "cqah" "rcbt" "cvbt" "bbcat" "vcaz" "ylcat"
## [73] "cahz" "jcgat" "mant" "jatd" "czlat" "cbamt" "cajta" "cafp"
## [81] "cizt" "cmaut" "qwat" "jcazt" "hdcat" "ucant" "hate" "cajtl"
## [89] "caaty" "cix" "nmat" "cajit" "cmnat" "caobt" "catoi" "ncau"
## [97] "ucoat" "ncamt" "jath" "oats" "chatz" "ciatz" "cjatz" "ckatz"
## [105] "clatz" "cmatz" "cnatz" "coatz" "cpatz" "cqatz" "cratz" "csatz"
## [113] "ctatz" "cuatz" "cvatz" "cwatz" "cxatz" "cyatz" "czatz" "cabtz"
## [121] "cactz" "cadtz" "caetz" "caftz" "cagtz" "cahtz" "caitz" "cajtz"
## [129] "caktz" "caltz" "camtz" "cantz" "caotz" "captz" "caqtz" "cartz"
## [137] "castz" "cattz" "cautz" "cavtz" "cawtz" "caxtz" "caytz" "caztz"
## [145] "catuz" "catvz" "catwz" "catxz" "catyz" "catzz"
This could be speeded up by less brute force formation of all permutations and then applying adist()
to them - it could consist of changes or insertions of known edit distances generated algorithmically from letters
.
I'm going to assume that you want all actual words, not just permutations of the characters with an edit distance of 1 that would include non-words such as "zat".
We can do this using adist()
to compute the edit distance between your target word and all eligible English words, taken from some word list. Here, I used the English syllable dictionary from the quanteda package (you did tag this question as quanteda
after all) but this could have been any vector of English dictionary words from any other source as well.
To narrow things down, we first exclude all words that are different in length from the target word by your distance value.
distfn <- function(word, distance = 1)
# select eligible words for efficiency
eligible_y_words <- names(quanteda::data_int_syllables)
wordlengths <- nchar(eligible_y_words)
eligible_y_words <- eligible_y_words[wordlengths >= (nchar(word) - distance) &
wordlengths <= (nchar(word) + distance)]
# compute Levenshtein distance
distances <- utils::adist(word, eligible_y_words)[1, ]
# return only those for the requested distance value
eligible_y_words[distances == distance]
distfn("cat", 1)
## [1] "at" "bat" "ca" "cab" "cac" "cad" "cai" "cal" "cam" "can"
## [11] "cant" "cao" "cap" "caq" "car" "cart" "cas" "cast" "cate" "cato"
## [21] "cats" "catt" "cau" "caw" "cay" "chat" "coat" "cot" "ct" "cut"
## [31] "dat" "eat" "fat" "gat" "hat" "kat" "lat" "mat" "nat" "oat"
## [41] "pat" "rat" "sat" "scat" "tat" "vat" "wat"
To demonstrate how this works on longer words, with alternative distance values.
distfn("coffee", 1)
## [1] "caffee" "coffeen" "coffees" "coffel" "coffer" "coffey" "cuffee"
## [8] "toffee"
distfn("coffee", 2)
## [1] "caffey" "calfee" "chafee" "chaffee" "cofer" "coffee's"
## [7] "coffelt" "coffers" "coffin" "cofide" "cohee" "coiffe"
## [13] "coiffed" "colee" "colfer" "combee" "comfed" "confer"
## [19] "conlee" "coppee" "cottee" "coulee" "coutee" "cuffe"
## [25] "cuffed" "diffee" "duffee" "hoffer" "jaffee" "joffe"
## [31] "mcaffee" "moffet" "noffke" "offen" "offer" "roffe"
## [37] "scoffed" "soffel" "soffer" "yoffie"
(Yes, according to the CMU pronunciation dictionary, those are all actual words...)
EDIT: Make for all permutations of letters, not just actual words
This involves permutations from the alphabet that have the fixed edit distances from the input word. Here I've done it not particular efficiently by forming all permutations of letters within the eligible ranges, and then computing their edit distance from the target word, and then selecting them. So it's a variation of above, except instead of a dictionary, it uses permuted words.
distfn2 <- function(word, distance = 1)
result <- character()
# start with deletions
for (i in max((nchar(word) - distance), 0):(nchar(word) - 1))
result <- c(
result,
combn(unlist(strsplit(word, "", fixed = TRUE)), i,
paste,
collapse = "", simplify = TRUE
)
)
# now for changes and insertions
for (i in (nchar(word)):(nchar(word) + distance))
# all possible edits
edits <- apply(expand.grid(rep(list(letters), i)),
1, paste0,
collapse = ""
)
# remove original word
edits <- edits[edits != word]
# get all distances, add to result
distances <- utils::adist(word, edits)[1, ]
result <- c(result, edits[distances == distance])
result
For the OP example:
distfn2("cat", 1)
## [1] "ca" "ct" "at" "caa" "cab" "cac" "cad" "cae" "caf" "cag"
## [11] "cah" "cai" "caj" "cak" "cal" "cam" "can" "cao" "cap" "caq"
## [21] "car" "cas" "aat" "bat" "dat" "eat" "fat" "gat" "hat" "iat"
## [31] "jat" "kat" "lat" "mat" "nat" "oat" "pat" "qat" "rat" "sat"
## [41] "tat" "uat" "vat" "wat" "xat" "yat" "zat" "cbt" "cct" "cdt"
## [51] "cet" "cft" "cgt" "cht" "cit" "cjt" "ckt" "clt" "cmt" "cnt"
## [61] "cot" "cpt" "cqt" "crt" "cst" "ctt" "cut" "cvt" "cwt" "cxt"
## [71] "cyt" "czt" "cau" "cav" "caw" "cax" "cay" "caz" "cata" "catb"
## [81] "catc" "catd" "cate" "catf" "catg" "cath" "cati" "catj" "catk" "catl"
## [91] "catm" "catn" "cato" "catp" "catq" "catr" "cats" "caat" "cbat" "acat"
## [101] "bcat" "ccat" "dcat" "ecat" "fcat" "gcat" "hcat" "icat" "jcat" "kcat"
## [111] "lcat" "mcat" "ncat" "ocat" "pcat" "qcat" "rcat" "scat" "tcat" "ucat"
## [121] "vcat" "wcat" "xcat" "ycat" "zcat" "cdat" "ceat" "cfat" "cgat" "chat"
## [131] "ciat" "cjat" "ckat" "clat" "cmat" "cnat" "coat" "cpat" "cqat" "crat"
## [141] "csat" "ctat" "cuat" "cvat" "cwat" "cxat" "cyat" "czat" "cabt" "cact"
## [151] "cadt" "caet" "caft" "cagt" "caht" "cait" "cajt" "cakt" "calt" "camt"
## [161] "cant" "caot" "capt" "caqt" "cart" "cast" "catt" "caut" "cavt" "cawt"
## [171] "caxt" "cayt" "cazt" "catu" "catv" "catw" "catx" "caty" "catz"
Also works with other edit distances, although it becomes very slow for longer words.
d2 <- distfn2("cat", 2)
set.seed(100)
c(head(d2, 50), sample(d2, 50), tail(d2, 50))
## [1] "c" "a" "t" "ca" "ct" "at" "aaa" "baa"
## [9] "daa" "eaa" "faa" "gaa" "haa" "iaa" "jaa" "kaa"
## [17] "laa" "maa" "naa" "oaa" "paa" "qaa" "raa" "saa"
## [25] "taa" "uaa" "vaa" "waa" "xaa" "yaa" "zaa" "cba"
## [33] "aca" "bca" "cca" "dca" "eca" "fca" "gca" "hca"
## [41] "ica" "jca" "kca" "lca" "mca" "nca" "oca" "pca"
## [49] "qca" "rca" "cnts" "cian" "pcatb" "cqo" "uawt" "hazt"
## [57] "cpxat" "aaet" "ckata" "caod" "ncatl" "qcamt" "cdtp" "qajt"
## [65] "bckat" "qcatr" "cqah" "rcbt" "cvbt" "bbcat" "vcaz" "ylcat"
## [73] "cahz" "jcgat" "mant" "jatd" "czlat" "cbamt" "cajta" "cafp"
## [81] "cizt" "cmaut" "qwat" "jcazt" "hdcat" "ucant" "hate" "cajtl"
## [89] "caaty" "cix" "nmat" "cajit" "cmnat" "caobt" "catoi" "ncau"
## [97] "ucoat" "ncamt" "jath" "oats" "chatz" "ciatz" "cjatz" "ckatz"
## [105] "clatz" "cmatz" "cnatz" "coatz" "cpatz" "cqatz" "cratz" "csatz"
## [113] "ctatz" "cuatz" "cvatz" "cwatz" "cxatz" "cyatz" "czatz" "cabtz"
## [121] "cactz" "cadtz" "caetz" "caftz" "cagtz" "cahtz" "caitz" "cajtz"
## [129] "caktz" "caltz" "camtz" "cantz" "caotz" "captz" "caqtz" "cartz"
## [137] "castz" "cattz" "cautz" "cavtz" "cawtz" "caxtz" "caytz" "caztz"
## [145] "catuz" "catvz" "catwz" "catxz" "catyz" "catzz"
This could be speeded up by less brute force formation of all permutations and then applying adist()
to them - it could consist of changes or insertions of known edit distances generated algorithmically from letters
.
edited Mar 28 at 16:46
answered Mar 26 at 23:06
Ken BenoitKen Benoit
8,19317 silver badges36 bronze badges
8,19317 silver badges36 bronze badges
Thank you very much for your effort! But I am searching for all possible combinations (permutations) ...
– De De
Mar 27 at 14:58
OK I added that.
– Ken Benoit
Mar 28 at 16:46
Many thanks for your support!
– De De
Apr 4 at 16:47
add a comment |
Thank you very much for your effort! But I am searching for all possible combinations (permutations) ...
– De De
Mar 27 at 14:58
OK I added that.
– Ken Benoit
Mar 28 at 16:46
Many thanks for your support!
– De De
Apr 4 at 16:47
Thank you very much for your effort! But I am searching for all possible combinations (permutations) ...
– De De
Mar 27 at 14:58
Thank you very much for your effort! But I am searching for all possible combinations (permutations) ...
– De De
Mar 27 at 14:58
OK I added that.
– Ken Benoit
Mar 28 at 16:46
OK I added that.
– Ken Benoit
Mar 28 at 16:46
Many thanks for your support!
– De De
Apr 4 at 16:47
Many thanks for your support!
– De De
Apr 4 at 16:47
add a comment |
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55352498%2fhow-to-output-in-r-all-possible-deviations-of-a-word-for-a-fixed-distance-value%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You might want to try all possible modifications at all possible places and collect result. Be aware, however, that for the distance greater than 1 some modifications may inadvertently cancel out. For example if you replace 'a' with 'o' and then insert 'a' you may get "cat"→"cot"→"coat" which appears just 1 away from "cat", even though 2 modifications were done.
– CiaPan
Mar 26 at 8:21
But how to do it automatically in R ???
– De De
Mar 26 at 12:18
The
adist
function would be a good place to start.– mickey
Mar 26 at 14:10
Thanks for the suggestion. Is it just a rough idea or do you have anything in mind ??
– De De
Mar 26 at 14:16