Overlapping matches in RLocate regex strings with repeats or a sliding windowFinding the indexes of multiple/overlapping matching substringsCount number of occurrences when string contains substringr ngram extraction with regexPositive look ahead in R - passing variablesMatching pattern multiple times in same string with regexR parse timestamp of form %m%d%Y with no leading zeroesMatch all occurrences of a regexRegular expression to match a line that doesn't contain a wordHow do you access the matched groups in a JavaScript regular expression?RegEx match open tags except XHTML self-contained tagsRegular expression to stop at first matchRegex to match part of string, when match does not contain a specific string - PCRE grepHow to count the number of matches and use it inside of a regular expression?Ruby Regex, get all possible matches (no clipping of the string)data.table vs dplyr: can one do something well the other can't or does poorly?Is it possible to add `'s` or `'` (if a word ends with `s`) using only PCRE replace?

Why did the population of Bhutan drop by 70% between 2007 and 2008?

Why did Starhopper's exhaust plume become brighter just before landing?

If I said I had $100 when asked, but I actually had $200, would I be lying by omission?

Modifing a GFF3 file and writting to a new file

Is this position a forced win for Black after move 14?

What is the name of this plot that has rows with two connected dots?

Is there a word or phrase that means "use other people's wifi or Internet service without consent"?

Is Nikon D500 a good fit for nature and ambient-lighting portraits and occasional other uses?

How to say "I only speak one language which is English" in French?

Should I ask for a raise one month before the end of an internship?

Can you illusion a window out of a solid wall?

Is allowing Barbarian features to work with Dex-based attacks imbalancing?

Looking for a plural noun related to ‘fulcrum’ or ‘pivot’ that denotes multiple things as crucial to success

Why is the Grievance Studies affair considered to be research requiring IRB approval?

How could a self contained organic body propel itself in space

What is Soda Fountain Etiquette?

Number of Fingers for a Math Oriented Race

Stolen MacBook should I worry about my data?

Printing a list as "a, b, c." using Python

How does attacking during a conversation affect initiative?

Why might one *not* want to use a capo?

What to do about my 1-month-old boy peeing through diapers?

In Endgame, wouldn't Stark have remembered Hulk busting out of the stairwell?

What does GDPR mean to myself regarding my own data?

Overlapping matches in R

Locate regex strings with repeats or a sliding windowFinding the indexes of multiple/overlapping matching substringsCount number of occurrences when string contains substringr ngram extraction with regexPositive look ahead in R - passing variablesMatching pattern multiple times in same string with regexR parse timestamp of form %m%d%Y with no leading zeroesMatch all occurrences of a regexRegular expression to match a line that doesn't contain a wordHow do you access the matched groups in a JavaScript regular expression?RegEx match open tags except XHTML self-contained tagsRegular expression to stop at first matchRegex to match part of string, when match does not contain a specific string - PCRE grepHow to count the number of matches and use it inside of a regular expression?Ruby Regex, get all possible matches (no clipping of the string)data.table vs dplyr: can one do something well the other can't or does poorly?Is it possible to add `'s` or `'` (if a word ends with `s`) using only PCRE replace?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I have searched and was able to find this forum discussion for achieving the effect of overlapping matches.

I also found the following SO question speaking of finding indexes to perform this task, but was not able to find anything concise about grabbing overlapping matches in the R language.

I can perform this task in most any language that supports (PCRE) by using a Positive Lookahead assertion while implementing a capturing group inside of the lookahead to capture the overlapped matches.

But, while actually performing this the same way I would in other languages, using perl=T in R, no results yield.

> x <- 'ACCACCACCAC'
> regmatches(x, gregexpr('(?=([AC]C))', x, perl=T))[[1]]
[1] "" "" "" "" "" "" ""

The same goes for using both the stringi and stringr package.

> library(stringi)
> library(stringr)
> stri_extract_all_regex(x, '(?=([AC]C))')[[1]]
[1] "" "" "" "" "" "" ""
> str_extract_all(x, perl('(?=([AC]C))'))[[1]]
[1] "" "" "" "" "" "" ""

The correct results that should be returned when executing this are:

[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

Edit

I am well aware that regmatches does not work well with captured matches, but what exactly causes this behavior in regmatches and why are no results returned? I am scavenging for a somewhat detailed answer.

Is the stringi and stringr package not capable of performing this over regmatches?

Please feel free to add to my answer or come up with a different workaround than I have found.

edited May 23 '17 at 11:51

Community♦

11 silver badge

asked Sep 12 '14 at 2:56

hwnd

61.3k4 gold badges59 silver badges102 bronze badges

add a comment |

I have searched and was able to find this forum discussion for achieving the effect of overlapping matches.

I also found the following SO question speaking of finding indexes to perform this task, but was not able to find anything concise about grabbing overlapping matches in the R language.

But, while actually performing this the same way I would in other languages, using perl=T in R, no results yield.

> x <- 'ACCACCACCAC'
> regmatches(x, gregexpr('(?=([AC]C))', x, perl=T))[[1]]
[1] "" "" "" "" "" "" ""

The same goes for using both the stringi and stringr package.

> library(stringi)
> library(stringr)
> stri_extract_all_regex(x, '(?=([AC]C))')[[1]]
[1] "" "" "" "" "" "" ""
> str_extract_all(x, perl('(?=([AC]C))'))[[1]]
[1] "" "" "" "" "" "" ""

The correct results that should be returned when executing this are:

[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

Edit

I am well aware that regmatches does not work well with captured matches, but what exactly causes this behavior in regmatches and why are no results returned? I am scavenging for a somewhat detailed answer.

Is the stringi and stringr package not capable of performing this over regmatches?

Please feel free to add to my answer or come up with a different workaround than I have found.

edited May 23 '17 at 11:51

Community♦

11 silver badge

asked Sep 12 '14 at 2:56

hwnd

61.3k4 gold badges59 silver badges102 bronze badges

add a comment |

I have searched and was able to find this forum discussion for achieving the effect of overlapping matches.

I also found the following SO question speaking of finding indexes to perform this task, but was not able to find anything concise about grabbing overlapping matches in the R language.

But, while actually performing this the same way I would in other languages, using perl=T in R, no results yield.

> x <- 'ACCACCACCAC'
> regmatches(x, gregexpr('(?=([AC]C))', x, perl=T))[[1]]
[1] "" "" "" "" "" "" ""

The same goes for using both the stringi and stringr package.

> library(stringi)
> library(stringr)
> stri_extract_all_regex(x, '(?=([AC]C))')[[1]]
[1] "" "" "" "" "" "" ""
> str_extract_all(x, perl('(?=([AC]C))'))[[1]]
[1] "" "" "" "" "" "" ""

The correct results that should be returned when executing this are:

[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

Edit

I am well aware that regmatches does not work well with captured matches, but what exactly causes this behavior in regmatches and why are no results returned? I am scavenging for a somewhat detailed answer.

Is the stringi and stringr package not capable of performing this over regmatches?

Please feel free to add to my answer or come up with a different workaround than I have found.

edited May 23 '17 at 11:51

Community♦

11 silver badge

asked Sep 12 '14 at 2:56

hwnd

61.3k4 gold badges59 silver badges102 bronze badges

I have searched and was able to find this forum discussion for achieving the effect of overlapping matches.

I also found the following SO question speaking of finding indexes to perform this task, but was not able to find anything concise about grabbing overlapping matches in the R language.

But, while actually performing this the same way I would in other languages, using perl=T in R, no results yield.

> x <- 'ACCACCACCAC'
> regmatches(x, gregexpr('(?=([AC]C))', x, perl=T))[[1]]
[1] "" "" "" "" "" "" ""

The same goes for using both the stringi and stringr package.

> library(stringi)
> library(stringr)
> stri_extract_all_regex(x, '(?=([AC]C))')[[1]]
[1] "" "" "" "" "" "" ""
> str_extract_all(x, perl('(?=([AC]C))'))[[1]]
[1] "" "" "" "" "" "" ""

The correct results that should be returned when executing this are:

[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

Edit

I am well aware that regmatches does not work well with captured matches, but what exactly causes this behavior in regmatches and why are no results returned? I am scavenging for a somewhat detailed answer.

Is the stringi and stringr package not capable of performing this over regmatches?

Please feel free to add to my answer or come up with a different workaround than I have found.

regex r string dna-sequence stringi

edited May 23 '17 at 11:51

Community♦

11 silver badge

asked Sep 12 '14 at 2:56

hwnd

61.3k4 gold badges59 silver badges102 bronze badges

edited May 23 '17 at 11:51

Community♦

11 silver badge

asked Sep 12 '14 at 2:56

hwnd

61.3k4 gold badges59 silver badges102 bronze badges

edited May 23 '17 at 11:51

Community♦

11 silver badge

edited May 23 '17 at 11:51

Community♦

11 silver badge

edited May 23 '17 at 11:51

Community♦

11 silver badge

asked Sep 12 '14 at 2:56

hwnd

61.3k4 gold badges59 silver badges102 bronze badges

asked Sep 12 '14 at 2:56

hwnd

61.3k4 gold badges59 silver badges102 bronze badges

asked Sep 12 '14 at 2:56

hwnd

61.3k4 gold badges59 silver badges102 bronze badges

add a comment |

6 Answers
6

active

oldest

votes

The standard regmatches does not work well with captured matches (specifically multiple captured matches in the same string). And in this case, since you're "matching" a look ahead (ignoring the capture), the match itself is zero-length. There is also a regmatches()<- function that may illustrate this. Obseerve

x <- 'ACCACCACCAC'
m <- gregexpr('(?=([AC]C))', x, perl=T)
regmatches(x, m) <- "~"
x
# [1] "~A~CC~A~CC~A~CC~AC"

Notice how all the letters are preserved, we've just replaced the locations of the zero-length matches with something we can observe.

I've created a regcapturedmatches() function that I often use for such tasks. For example

x <- 'ACCACCACCAC'
regcapturedmatches(x, gregexpr('(?=([AC]C))', x, perl=T))[[1]]

# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

The gregexpr is grabbing all the data just fine so you can extract it from that object anyway you life if you prefer not to use this helper function.

edited Sep 12 '14 at 3:50

answered Sep 12 '14 at 3:37

MrFlick

132k12 gold badges159 silver badges193 bronze badges

+1 Interesting function you've created. I am well aware of zero-width matches, so basically regmatches and the other packages such as stringi, r are not meant to handle this?

– hwnd
Sep 12 '14 at 3:45

I can't speak to stringr as I've never used that myself, but regmatches really focuses on the match rather than the capture (which are highly related by slightly different). I've added an additional sample to try to make it clear what the regmatches() is capturing compared to my function.`

– MrFlick
Sep 12 '14 at 3:50

Yea I've used regmatches()<- like that before hand to observe the effect of the zero-width matches.

– hwnd
Sep 12 '14 at 3:53

add a comment |

As far as a workaround, this is what I have come up with to extract the overlapping matches.

> x <- 'ACCACCACCAC'
> m <- gregexpr('(?=([AC]C))', x, perl=T)
> mapply(function(X) substr(x, X, X+1), m[[1]])
[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

Please feel free to add or comment on a better way to perform this task.

edited Sep 12 '14 at 3:54

answered Sep 12 '14 at 2:56

hwnd

61.3k4 gold badges59 silver badges102 bronze badges

The problem with this solution is that it only works when the captured region is always 2 characters long. A more general solution is this:

– Ken Williams
Aug 10 '15 at 14:46

Oops. I forgot I can't put code blocks in comments. Will make this a separate answer.

– Ken Williams
Aug 10 '15 at 14:48

add a comment |

Another roundabout way of extracting the same information that I've done in the past is to replace the "match.length" with the "capture.length":

x <- c("ACCACCACCAC","ACCACCACCAC")
m <- gregexpr('(?=([AC]C))', x, perl=TRUE)
m <- lapply(m, function(i) 
 attr(i,"match.length") <- attr(i,"capture.length")
 i
 )
regmatches(x,m)

#[[1]]
#[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"
#
#[[2]]
#[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

edited Sep 12 '14 at 5:46

answered Sep 12 '14 at 5:10

thelatemail

71.4k10 gold badges91 silver badges158 bronze badges

+1 Thanks for the additional solution. I've done similar using capture.start and capture.length.

– hwnd
Sep 12 '14 at 5:28

add a comment |

It's not a regex solution, and doesn't really answer any of your more important questions, but you could also get your desired result by using the substrings of two characters at a time and then removing the unwanted CA elements.

x <- 'ACCACCACCAC'
y <- substring(x, 1:(nchar(x)-1), 2:nchar(x))
y[y != "CA"]
# [1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

edited Aug 10 '15 at 15:52

answered Sep 13 '14 at 2:54

Rich Scriven

79.6k8 gold badges117 silver badges186 bronze badges

add a comment |

A stringi solution using a capture group in the look-ahead part:

> stri_match_all_regex('ACCACCACCAC', '(?=([AC]C))')[[1]][,2]
## [1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

edited Mar 24 at 20:16

answered Oct 26 '14 at 19:55

gagolews

10.7k2 gold badges36 silver badges66 bronze badges

Weird, how come it failed to work with stri_extract_all_regex

– hwnd
Oct 26 '14 at 20:00

@hwnd: it's a 0-length match; (?=...) does not advance the input position.

– gagolews
Oct 26 '14 at 20:02

Yes I know it's a zero-width match =) I guess there is a difference between extract_all_regex and match_all_regex

– hwnd
Oct 26 '14 at 20:04

No, the 1st column of the resulting matrix (the whole match) consists only of empty strings :)

– gagolews
Oct 26 '14 at 20:05

Ok now I see and understand what you mean.

– hwnd
Oct 26 '14 at 20:06

add a comment |

An additional answer, based on @hwnd's own answer (the original didn't allow variable-length captured regions), using just built-in R functions:

> x <- 'ACCACCACCAC'
> m <- gregexpr('(?=([AC]C))', x, perl=T)[[1]]
> start <- attr(m,"capture.start")
> end <- attr(m,"capture.start") + attr(m,"capture.length") - 1
> sapply(seq_along(m), function(i) substr(x, start[i], end[i]))
[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

Pretty ugly, which is why the stringr etc. packages exist.

answered Aug 10 '15 at 14:51

Ken Williams

13.5k5 gold badges61 silver badges106 bronze badges

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f25800042%2foverlapping-matches-in-r%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

x <- 'ACCACCACCAC'
m <- gregexpr('(?=([AC]C))', x, perl=T)
regmatches(x, m) <- "~"
x
# [1] "~A~CC~A~CC~A~CC~AC"

Notice how all the letters are preserved, we've just replaced the locations of the zero-length matches with something we can observe.

I've created a regcapturedmatches() function that I often use for such tasks. For example

x <- 'ACCACCACCAC'
regcapturedmatches(x, gregexpr('(?=([AC]C))', x, perl=T))[[1]]

# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

The gregexpr is grabbing all the data just fine so you can extract it from that object anyway you life if you prefer not to use this helper function.

edited Sep 12 '14 at 3:50

answered Sep 12 '14 at 3:37

MrFlick

132k12 gold badges159 silver badges193 bronze badges

+1 Interesting function you've created. I am well aware of zero-width matches, so basically regmatches and the other packages such as stringi, r are not meant to handle this?

– hwnd
Sep 12 '14 at 3:45

I can't speak to stringr as I've never used that myself, but regmatches really focuses on the match rather than the capture (which are highly related by slightly different). I've added an additional sample to try to make it clear what the regmatches() is capturing compared to my function.`

– MrFlick
Sep 12 '14 at 3:50

Yea I've used regmatches()<- like that before hand to observe the effect of the zero-width matches.

– hwnd
Sep 12 '14 at 3:53

add a comment |

x <- 'ACCACCACCAC'
m <- gregexpr('(?=([AC]C))', x, perl=T)
regmatches(x, m) <- "~"
x
# [1] "~A~CC~A~CC~A~CC~AC"

Notice how all the letters are preserved, we've just replaced the locations of the zero-length matches with something we can observe.

I've created a regcapturedmatches() function that I often use for such tasks. For example

x <- 'ACCACCACCAC'
regcapturedmatches(x, gregexpr('(?=([AC]C))', x, perl=T))[[1]]

# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

The gregexpr is grabbing all the data just fine so you can extract it from that object anyway you life if you prefer not to use this helper function.

edited Sep 12 '14 at 3:50

answered Sep 12 '14 at 3:37

MrFlick

132k12 gold badges159 silver badges193 bronze badges

+1 Interesting function you've created. I am well aware of zero-width matches, so basically regmatches and the other packages such as stringi, r are not meant to handle this?

– hwnd
Sep 12 '14 at 3:45

I can't speak to stringr as I've never used that myself, but regmatches really focuses on the match rather than the capture (which are highly related by slightly different). I've added an additional sample to try to make it clear what the regmatches() is capturing compared to my function.`

– MrFlick
Sep 12 '14 at 3:50

Yea I've used regmatches()<- like that before hand to observe the effect of the zero-width matches.

– hwnd
Sep 12 '14 at 3:53

add a comment |

x <- 'ACCACCACCAC'
m <- gregexpr('(?=([AC]C))', x, perl=T)
regmatches(x, m) <- "~"
x
# [1] "~A~CC~A~CC~A~CC~AC"

Notice how all the letters are preserved, we've just replaced the locations of the zero-length matches with something we can observe.

I've created a regcapturedmatches() function that I often use for such tasks. For example

x <- 'ACCACCACCAC'
regcapturedmatches(x, gregexpr('(?=([AC]C))', x, perl=T))[[1]]

# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

The gregexpr is grabbing all the data just fine so you can extract it from that object anyway you life if you prefer not to use this helper function.

edited Sep 12 '14 at 3:50

answered Sep 12 '14 at 3:37

MrFlick

132k12 gold badges159 silver badges193 bronze badges

x <- 'ACCACCACCAC'
m <- gregexpr('(?=([AC]C))', x, perl=T)
regmatches(x, m) <- "~"
x
# [1] "~A~CC~A~CC~A~CC~AC"

Notice how all the letters are preserved, we've just replaced the locations of the zero-length matches with something we can observe.

I've created a regcapturedmatches() function that I often use for such tasks. For example

x <- 'ACCACCACCAC'
regcapturedmatches(x, gregexpr('(?=([AC]C))', x, perl=T))[[1]]

# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

The gregexpr is grabbing all the data just fine so you can extract it from that object anyway you life if you prefer not to use this helper function.

edited Sep 12 '14 at 3:50

answered Sep 12 '14 at 3:37

MrFlick

132k12 gold badges159 silver badges193 bronze badges

edited Sep 12 '14 at 3:50

answered Sep 12 '14 at 3:37

MrFlick

132k12 gold badges159 silver badges193 bronze badges

answered Sep 12 '14 at 3:37

MrFlick

132k12 gold badges159 silver badges193 bronze badges

answered Sep 12 '14 at 3:37

MrFlick

132k12 gold badges159 silver badges193 bronze badges

+1 Interesting function you've created. I am well aware of zero-width matches, so basically regmatches and the other packages such as stringi, r are not meant to handle this?

– hwnd
Sep 12 '14 at 3:45

I can't speak to stringr as I've never used that myself, but regmatches really focuses on the match rather than the capture (which are highly related by slightly different). I've added an additional sample to try to make it clear what the regmatches() is capturing compared to my function.`

– MrFlick
Sep 12 '14 at 3:50

Yea I've used regmatches()<- like that before hand to observe the effect of the zero-width matches.

– hwnd
Sep 12 '14 at 3:53

add a comment |

+1 Interesting function you've created. I am well aware of zero-width matches, so basically regmatches and the other packages such as stringi, r are not meant to handle this?

– hwnd
Sep 12 '14 at 3:45

I can't speak to stringr as I've never used that myself, but regmatches really focuses on the match rather than the capture (which are highly related by slightly different). I've added an additional sample to try to make it clear what the regmatches() is capturing compared to my function.`

– MrFlick
Sep 12 '14 at 3:50

Yea I've used regmatches()<- like that before hand to observe the effect of the zero-width matches.

– hwnd
Sep 12 '14 at 3:53

+1 Interesting function you've created. I am well aware of zero-width matches, so basically regmatches and the other packages such as stringi, r are not meant to handle this?

– hwnd
Sep 12 '14 at 3:45

I can't speak to stringr as I've never used that myself, but regmatches really focuses on the match rather than the capture (which are highly related by slightly different). I've added an additional sample to try to make it clear what the regmatches() is capturing compared to my function.`

– MrFlick
Sep 12 '14 at 3:50

Yea I've used regmatches()<- like that before hand to observe the effect of the zero-width matches.

– hwnd
Sep 12 '14 at 3:53

add a comment |

As far as a workaround, this is what I have come up with to extract the overlapping matches.

> x <- 'ACCACCACCAC'
> m <- gregexpr('(?=([AC]C))', x, perl=T)
> mapply(function(X) substr(x, X, X+1), m[[1]])
[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

Please feel free to add or comment on a better way to perform this task.

edited Sep 12 '14 at 3:54

answered Sep 12 '14 at 2:56

hwnd

61.3k4 gold badges59 silver badges102 bronze badges

The problem with this solution is that it only works when the captured region is always 2 characters long. A more general solution is this:

– Ken Williams
Aug 10 '15 at 14:46

Oops. I forgot I can't put code blocks in comments. Will make this a separate answer.

– Ken Williams
Aug 10 '15 at 14:48

add a comment |

As far as a workaround, this is what I have come up with to extract the overlapping matches.

> x <- 'ACCACCACCAC'
> m <- gregexpr('(?=([AC]C))', x, perl=T)
> mapply(function(X) substr(x, X, X+1), m[[1]])
[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

Please feel free to add or comment on a better way to perform this task.

edited Sep 12 '14 at 3:54

answered Sep 12 '14 at 2:56

hwnd

61.3k4 gold badges59 silver badges102 bronze badges

The problem with this solution is that it only works when the captured region is always 2 characters long. A more general solution is this:

– Ken Williams
Aug 10 '15 at 14:46

Oops. I forgot I can't put code blocks in comments. Will make this a separate answer.

– Ken Williams
Aug 10 '15 at 14:48

add a comment |

As far as a workaround, this is what I have come up with to extract the overlapping matches.

> x <- 'ACCACCACCAC'
> m <- gregexpr('(?=([AC]C))', x, perl=T)
> mapply(function(X) substr(x, X, X+1), m[[1]])
[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

Please feel free to add or comment on a better way to perform this task.

edited Sep 12 '14 at 3:54

answered Sep 12 '14 at 2:56

hwnd

61.3k4 gold badges59 silver badges102 bronze badges

As far as a workaround, this is what I have come up with to extract the overlapping matches.

> x <- 'ACCACCACCAC'
> m <- gregexpr('(?=([AC]C))', x, perl=T)
> mapply(function(X) substr(x, X, X+1), m[[1]])
[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

Please feel free to add or comment on a better way to perform this task.

edited Sep 12 '14 at 3:54

answered Sep 12 '14 at 2:56

hwnd

61.3k4 gold badges59 silver badges102 bronze badges

edited Sep 12 '14 at 3:54

answered Sep 12 '14 at 2:56

hwnd

61.3k4 gold badges59 silver badges102 bronze badges

answered Sep 12 '14 at 2:56

hwnd

61.3k4 gold badges59 silver badges102 bronze badges

answered Sep 12 '14 at 2:56

hwnd

61.3k4 gold badges59 silver badges102 bronze badges

The problem with this solution is that it only works when the captured region is always 2 characters long. A more general solution is this:

– Ken Williams
Aug 10 '15 at 14:46

Oops. I forgot I can't put code blocks in comments. Will make this a separate answer.

– Ken Williams
Aug 10 '15 at 14:48

add a comment |

The problem with this solution is that it only works when the captured region is always 2 characters long. A more general solution is this:

– Ken Williams
Aug 10 '15 at 14:46

Oops. I forgot I can't put code blocks in comments. Will make this a separate answer.

– Ken Williams
Aug 10 '15 at 14:48

The problem with this solution is that it only works when the captured region is always 2 characters long. A more general solution is this:

– Ken Williams
Aug 10 '15 at 14:46

Oops. I forgot I can't put code blocks in comments. Will make this a separate answer.

– Ken Williams
Aug 10 '15 at 14:48

add a comment |

Another roundabout way of extracting the same information that I've done in the past is to replace the "match.length" with the "capture.length":

x <- c("ACCACCACCAC","ACCACCACCAC")
m <- gregexpr('(?=([AC]C))', x, perl=TRUE)
m <- lapply(m, function(i) 
 attr(i,"match.length") <- attr(i,"capture.length")
 i
 )
regmatches(x,m)

#[[1]]
#[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"
#
#[[2]]
#[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

edited Sep 12 '14 at 5:46

answered Sep 12 '14 at 5:10

thelatemail

71.4k10 gold badges91 silver badges158 bronze badges

+1 Thanks for the additional solution. I've done similar using capture.start and capture.length.

– hwnd
Sep 12 '14 at 5:28

add a comment |

Another roundabout way of extracting the same information that I've done in the past is to replace the "match.length" with the "capture.length":

x <- c("ACCACCACCAC","ACCACCACCAC")
m <- gregexpr('(?=([AC]C))', x, perl=TRUE)
m <- lapply(m, function(i) 
 attr(i,"match.length") <- attr(i,"capture.length")
 i
 )
regmatches(x,m)

#[[1]]
#[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"
#
#[[2]]
#[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

edited Sep 12 '14 at 5:46

answered Sep 12 '14 at 5:10

thelatemail

71.4k10 gold badges91 silver badges158 bronze badges

+1 Thanks for the additional solution. I've done similar using capture.start and capture.length.

– hwnd
Sep 12 '14 at 5:28

add a comment |

Another roundabout way of extracting the same information that I've done in the past is to replace the "match.length" with the "capture.length":

x <- c("ACCACCACCAC","ACCACCACCAC")
m <- gregexpr('(?=([AC]C))', x, perl=TRUE)
m <- lapply(m, function(i) 
 attr(i,"match.length") <- attr(i,"capture.length")
 i
 )
regmatches(x,m)

#[[1]]
#[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"
#
#[[2]]
#[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

edited Sep 12 '14 at 5:46

answered Sep 12 '14 at 5:10

thelatemail

71.4k10 gold badges91 silver badges158 bronze badges

Another roundabout way of extracting the same information that I've done in the past is to replace the "match.length" with the "capture.length":

x <- c("ACCACCACCAC","ACCACCACCAC")
m <- gregexpr('(?=([AC]C))', x, perl=TRUE)
m <- lapply(m, function(i) 
 attr(i,"match.length") <- attr(i,"capture.length")
 i
 )
regmatches(x,m)

#[[1]]
#[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"
#
#[[2]]
#[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

edited Sep 12 '14 at 5:46

answered Sep 12 '14 at 5:10

thelatemail

71.4k10 gold badges91 silver badges158 bronze badges

edited Sep 12 '14 at 5:46

answered Sep 12 '14 at 5:10

thelatemail

71.4k10 gold badges91 silver badges158 bronze badges

answered Sep 12 '14 at 5:10

thelatemail

71.4k10 gold badges91 silver badges158 bronze badges

answered Sep 12 '14 at 5:10

thelatemail

71.4k10 gold badges91 silver badges158 bronze badges

+1 Thanks for the additional solution. I've done similar using capture.start and capture.length.

– hwnd
Sep 12 '14 at 5:28

add a comment |

+1 Thanks for the additional solution. I've done similar using capture.start and capture.length.

– hwnd
Sep 12 '14 at 5:28

+1 Thanks for the additional solution. I've done similar using capture.start and capture.length.

– hwnd
Sep 12 '14 at 5:28

add a comment |

x <- 'ACCACCACCAC'
y <- substring(x, 1:(nchar(x)-1), 2:nchar(x))
y[y != "CA"]
# [1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

edited Aug 10 '15 at 15:52

answered Sep 13 '14 at 2:54

Rich Scriven

79.6k8 gold badges117 silver badges186 bronze badges

add a comment |

x <- 'ACCACCACCAC'
y <- substring(x, 1:(nchar(x)-1), 2:nchar(x))
y[y != "CA"]
# [1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

edited Aug 10 '15 at 15:52

answered Sep 13 '14 at 2:54

Rich Scriven

79.6k8 gold badges117 silver badges186 bronze badges

add a comment |

x <- 'ACCACCACCAC'
y <- substring(x, 1:(nchar(x)-1), 2:nchar(x))
y[y != "CA"]
# [1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

edited Aug 10 '15 at 15:52

answered Sep 13 '14 at 2:54

Rich Scriven

79.6k8 gold badges117 silver badges186 bronze badges

x <- 'ACCACCACCAC'
y <- substring(x, 1:(nchar(x)-1), 2:nchar(x))
y[y != "CA"]
# [1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

edited Aug 10 '15 at 15:52

answered Sep 13 '14 at 2:54

Rich Scriven

79.6k8 gold badges117 silver badges186 bronze badges

edited Aug 10 '15 at 15:52

answered Sep 13 '14 at 2:54

Rich Scriven

79.6k8 gold badges117 silver badges186 bronze badges

answered Sep 13 '14 at 2:54

Rich Scriven

79.6k8 gold badges117 silver badges186 bronze badges

answered Sep 13 '14 at 2:54

Rich Scriven

79.6k8 gold badges117 silver badges186 bronze badges

add a comment |

A stringi solution using a capture group in the look-ahead part:

> stri_match_all_regex('ACCACCACCAC', '(?=([AC]C))')[[1]][,2]
## [1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

edited Mar 24 at 20:16

answered Oct 26 '14 at 19:55

gagolews

10.7k2 gold badges36 silver badges66 bronze badges

Weird, how come it failed to work with stri_extract_all_regex

– hwnd
Oct 26 '14 at 20:00

@hwnd: it's a 0-length match; (?=...) does not advance the input position.

– gagolews
Oct 26 '14 at 20:02

Yes I know it's a zero-width match =) I guess there is a difference between extract_all_regex and match_all_regex

– hwnd
Oct 26 '14 at 20:04

No, the 1st column of the resulting matrix (the whole match) consists only of empty strings :)

– gagolews
Oct 26 '14 at 20:05

Ok now I see and understand what you mean.

– hwnd
Oct 26 '14 at 20:06

add a comment |

A stringi solution using a capture group in the look-ahead part:

> stri_match_all_regex('ACCACCACCAC', '(?=([AC]C))')[[1]][,2]
## [1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

edited Mar 24 at 20:16

answered Oct 26 '14 at 19:55

gagolews

10.7k2 gold badges36 silver badges66 bronze badges

Weird, how come it failed to work with stri_extract_all_regex

– hwnd
Oct 26 '14 at 20:00

@hwnd: it's a 0-length match; (?=...) does not advance the input position.

– gagolews
Oct 26 '14 at 20:02

Yes I know it's a zero-width match =) I guess there is a difference between extract_all_regex and match_all_regex

– hwnd
Oct 26 '14 at 20:04

No, the 1st column of the resulting matrix (the whole match) consists only of empty strings :)

– gagolews
Oct 26 '14 at 20:05

Ok now I see and understand what you mean.

– hwnd
Oct 26 '14 at 20:06

add a comment |

A stringi solution using a capture group in the look-ahead part:

> stri_match_all_regex('ACCACCACCAC', '(?=([AC]C))')[[1]][,2]
## [1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

edited Mar 24 at 20:16

answered Oct 26 '14 at 19:55

gagolews

10.7k2 gold badges36 silver badges66 bronze badges

A stringi solution using a capture group in the look-ahead part:

> stri_match_all_regex('ACCACCACCAC', '(?=([AC]C))')[[1]][,2]
## [1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

edited Mar 24 at 20:16

answered Oct 26 '14 at 19:55

gagolews

10.7k2 gold badges36 silver badges66 bronze badges

edited Mar 24 at 20:16

answered Oct 26 '14 at 19:55

gagolews

10.7k2 gold badges36 silver badges66 bronze badges

answered Oct 26 '14 at 19:55

gagolews

10.7k2 gold badges36 silver badges66 bronze badges

answered Oct 26 '14 at 19:55

gagolews

10.7k2 gold badges36 silver badges66 bronze badges

Weird, how come it failed to work with stri_extract_all_regex

– hwnd
Oct 26 '14 at 20:00

@hwnd: it's a 0-length match; (?=...) does not advance the input position.

– gagolews
Oct 26 '14 at 20:02

Yes I know it's a zero-width match =) I guess there is a difference between extract_all_regex and match_all_regex

– hwnd
Oct 26 '14 at 20:04

No, the 1st column of the resulting matrix (the whole match) consists only of empty strings :)

– gagolews
Oct 26 '14 at 20:05

Ok now I see and understand what you mean.

– hwnd
Oct 26 '14 at 20:06

add a comment |

Weird, how come it failed to work with stri_extract_all_regex

– hwnd
Oct 26 '14 at 20:00

@hwnd: it's a 0-length match; (?=...) does not advance the input position.

– gagolews
Oct 26 '14 at 20:02

Yes I know it's a zero-width match =) I guess there is a difference between extract_all_regex and match_all_regex

– hwnd
Oct 26 '14 at 20:04

No, the 1st column of the resulting matrix (the whole match) consists only of empty strings :)

– gagolews
Oct 26 '14 at 20:05

Ok now I see and understand what you mean.

– hwnd
Oct 26 '14 at 20:06

Weird, how come it failed to work with stri_extract_all_regex

– hwnd
Oct 26 '14 at 20:00

@hwnd: it's a 0-length match; (?=...) does not advance the input position.

– gagolews
Oct 26 '14 at 20:02

Yes I know it's a zero-width match =) I guess there is a difference between extract_all_regex and match_all_regex

– hwnd
Oct 26 '14 at 20:04

No, the 1st column of the resulting matrix (the whole match) consists only of empty strings :)

– gagolews
Oct 26 '14 at 20:05

Ok now I see and understand what you mean.

– hwnd
Oct 26 '14 at 20:06

add a comment |

An additional answer, based on @hwnd's own answer (the original didn't allow variable-length captured regions), using just built-in R functions:

> x <- 'ACCACCACCAC'
> m <- gregexpr('(?=([AC]C))', x, perl=T)[[1]]
> start <- attr(m,"capture.start")
> end <- attr(m,"capture.start") + attr(m,"capture.length") - 1
> sapply(seq_along(m), function(i) substr(x, start[i], end[i]))
[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

Pretty ugly, which is why the stringr etc. packages exist.

answered Aug 10 '15 at 14:51

Ken Williams

13.5k5 gold badges61 silver badges106 bronze badges

add a comment |

An additional answer, based on @hwnd's own answer (the original didn't allow variable-length captured regions), using just built-in R functions:

> x <- 'ACCACCACCAC'
> m <- gregexpr('(?=([AC]C))', x, perl=T)[[1]]
> start <- attr(m,"capture.start")
> end <- attr(m,"capture.start") + attr(m,"capture.length") - 1
> sapply(seq_along(m), function(i) substr(x, start[i], end[i]))
[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

Pretty ugly, which is why the stringr etc. packages exist.

answered Aug 10 '15 at 14:51

Ken Williams

13.5k5 gold badges61 silver badges106 bronze badges

add a comment |

An additional answer, based on @hwnd's own answer (the original didn't allow variable-length captured regions), using just built-in R functions:

> x <- 'ACCACCACCAC'
> m <- gregexpr('(?=([AC]C))', x, perl=T)[[1]]
> start <- attr(m,"capture.start")
> end <- attr(m,"capture.start") + attr(m,"capture.length") - 1
> sapply(seq_along(m), function(i) substr(x, start[i], end[i]))
[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

Pretty ugly, which is why the stringr etc. packages exist.

answered Aug 10 '15 at 14:51

Ken Williams

13.5k5 gold badges61 silver badges106 bronze badges

An additional answer, based on @hwnd's own answer (the original didn't allow variable-length captured regions), using just built-in R functions:

> x <- 'ACCACCACCAC'
> m <- gregexpr('(?=([AC]C))', x, perl=T)[[1]]
> start <- attr(m,"capture.start")
> end <- attr(m,"capture.start") + attr(m,"capture.length") - 1
> sapply(seq_along(m), function(i) substr(x, start[i], end[i]))
[1] "AC" "CC" "AC" "CC" "AC" "CC" "AC"

Pretty ugly, which is why the stringr etc. packages exist.

answered Aug 10 '15 at 14:51

Ken Williams

13.5k5 gold badges61 silver badges106 bronze badges

answered Aug 10 '15 at 14:51

Ken Williams

13.5k5 gold badges61 silver badges106 bronze badges

answered Aug 10 '15 at 14:51

Ken Williams

13.5k5 gold badges61 silver badges106 bronze badges

answered Aug 10 '15 at 14:51

Ken Williams

13.5k5 gold badges61 silver badges106 bronze badges

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

Edit

Edit

Edit

Edit

6 Answers
6

Your Answer

Post as a guest

6 Answers
6

6 Answers
6

Post as a guest

Popular posts from this blog

용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh

Edit

Edit

Edit

Edit

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Post as a guest

6 Answers 6

6 Answers 6

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh

6 Answers
6

6 Answers
6

6 Answers
6