How to match a pattern that repeats n times in a string using findall()?How to use strings from one text file to search another, and create a new text file with columns from another?Extract matching string from startPython 3 REGEX assistanceRegex for optional end-part of substringRegex to match dd_ddd onlyFindAll - locating lines with a patternFetch string matching a patternStuck with a regular expression in Python 3Match Word before Matching Pattern Indefinite TimesI need to find a word in a sentence and then capture to the control characters
How to accelerate progress in mathematical research
Does the word voltage exist in academic engineering?
How to make a pipe-divided tuple?
Why does 8 bit truecolor use only 2 bits for blue?
Compiler optimization of bitwise not operation
How do English-speaking kids loudly request something?
What can we do about our 9-month-old putting fingers down his throat?
What is the purpose of the rotating plate in front of the lock?
Male viewpoint in an erotic novel
Where on Earth is it easiest to survive in the wilderness?
Fantasy Military Arms and Armor: the Dwarven Grand Armory
Friend is very nit picky about side comments I don't intend to be taken too seriously
Is it right to use the ideas of non-winning designers in a design contest?
Can taking my 1-week-old on a 6-7 hours journey in the car lead to medical complications?
How is the phase of 120V AC established in a North American home?
Is there some sort of French saying for "a person's signature move"?
Poor management handling of recent sickness and how to approach my return?
How to interpret or parse this confusing 'NOT' and 'AND' legal clause
How to apply a register to a command
What is the "Brake to Exit" feature on the Boeing 777X?
Laptop failure due to constant fluctuation of AC frequency and voltage
Is future tense in English really a myth?
pipe command output to convert?
Did the Byzantines ever attempt to move their capital to Rome?
How to match a pattern that repeats n times in a string using findall()?
How to use strings from one text file to search another, and create a new text file with columns from another?Extract matching string from startPython 3 REGEX assistanceRegex for optional end-part of substringRegex to match dd_ddd onlyFindAll - locating lines with a patternFetch string matching a patternStuck with a regular expression in Python 3Match Word before Matching Pattern Indefinite TimesI need to find a word in a sentence and then capture to the control characters
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I'm trying to parse a repetitive string and find multiple matches from it that match the pattern in my findall()
function.
Essentially what I want to do is extract the names of the players from the string. They're separated by commas and the last name of the player is preceded by an "and". I tried to get the comma part down but for some reason findall()
doesn't seem to repeat the matching pattern even though I added a *
.
x = re.findall('Players(?:s([A-Z]+[a-z]+)),*', 'Players Jeremiah, Lou, Martha and Kay,')
I haven't gotten the and
part down yet since I'm stuck on the commas but from the above code, I feel like, x should capture Jeremiah, Lou and Martha at the very least. My code only manages to capture Jeremiah.
python-3.x
add a comment |
I'm trying to parse a repetitive string and find multiple matches from it that match the pattern in my findall()
function.
Essentially what I want to do is extract the names of the players from the string. They're separated by commas and the last name of the player is preceded by an "and". I tried to get the comma part down but for some reason findall()
doesn't seem to repeat the matching pattern even though I added a *
.
x = re.findall('Players(?:s([A-Z]+[a-z]+)),*', 'Players Jeremiah, Lou, Martha and Kay,')
I haven't gotten the and
part down yet since I'm stuck on the commas but from the above code, I feel like, x should capture Jeremiah, Lou and Martha at the very least. My code only manages to capture Jeremiah.
python-3.x
add a comment |
I'm trying to parse a repetitive string and find multiple matches from it that match the pattern in my findall()
function.
Essentially what I want to do is extract the names of the players from the string. They're separated by commas and the last name of the player is preceded by an "and". I tried to get the comma part down but for some reason findall()
doesn't seem to repeat the matching pattern even though I added a *
.
x = re.findall('Players(?:s([A-Z]+[a-z]+)),*', 'Players Jeremiah, Lou, Martha and Kay,')
I haven't gotten the and
part down yet since I'm stuck on the commas but from the above code, I feel like, x should capture Jeremiah, Lou and Martha at the very least. My code only manages to capture Jeremiah.
python-3.x
I'm trying to parse a repetitive string and find multiple matches from it that match the pattern in my findall()
function.
Essentially what I want to do is extract the names of the players from the string. They're separated by commas and the last name of the player is preceded by an "and". I tried to get the comma part down but for some reason findall()
doesn't seem to repeat the matching pattern even though I added a *
.
x = re.findall('Players(?:s([A-Z]+[a-z]+)),*', 'Players Jeremiah, Lou, Martha and Kay,')
I haven't gotten the and
part down yet since I'm stuck on the commas but from the above code, I feel like, x should capture Jeremiah, Lou and Martha at the very least. My code only manages to capture Jeremiah.
python-3.x
python-3.x
edited Mar 28 at 6:26
Stefan Becker
4,6166 gold badges11 silver badges25 bronze badges
4,6166 gold badges11 silver badges25 bronze badges
asked Mar 28 at 6:11
investigate311investigate311
175 bronze badges
175 bronze badges
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Your pattern starts with Players...
hence it will only match once, because your string has only one Players
in it.
This would be the correct pattern which also handles the and
part:
- a player name is any character that is non-white space or comma,
- it must be followed either by a comma or white space +
and
import re
x = re.findall(r'([^s,]+)(?:,|s+and)',
'Players Jeremiah, Lou, Martha and Kay,')
print(x)
Test run:
$ python3 dummy.py
['Jeremiah', 'Lou', 'Martha', 'Kay']
Update to match further sample sentences given by OP a single regex is no longer enough:
- match a mandatory prefix
Players
and extract the rest - match an optional postfix, e.g.
are ...
, and strip that - detect player names in the remaining sentence
- starts at a word boundary
- starts with an uppercase letter
- followed by one-or-more lowercase letters
- ends at a word boundary
import re;
for input in (
'Jeremiah, Lou, Martha and Kay,',
'Players Jeremiah, Lou, Martha and Kay,',
'The Players are Martha, Joe, Toby and Kay.',
'The Players Martha, Joe and Toby are German.',
'The Players Martha, Joe and Toby are German,',
):
# strip mandatory prefix
match = re.search('Players(.*)', input)
if not match:
continue
#print(match[1])
# strip optional postfix
postfix = re.search('(.*)(?:s+(?:are)s+S+[,.])$', match[1])
if postfix:
match = postfix
#print(match[1])
result = re.findall(r'(b[A-Z][a-z]+b)', match[1])
print(input, '->', result)
Test run:
$ python3 dummy.py
Players Jeremiah, Lou, Martha and Kay, -> ['Jeremiah', 'Lou', 'Martha', 'Kay']
The Players are Martha, Joe, Toby and Kay. -> ['Martha', 'Joe', 'Toby', 'Kay']
The Players Martha, Joe and Toby are German. -> ['Martha', 'Joe', 'Toby']
The Players Martha, Joe and Toby are German, -> ['Martha', 'Joe', 'Toby']
This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.
– investigate311
Mar 28 at 6:57
That goes into the area of "natural language processing". A simple regex won't help there anymore.
– Stefan Becker
Mar 28 at 10:21
I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!
– investigate311
Mar 28 at 11:17
If this helped then please remember What should I do when someone answers my question?
– Stefan Becker
Mar 28 at 11:54
add a comment |
I guess you're basically looking for a pattern for proper nouns. In the pattern you're using, it only captures "Jeremiah" because your pattern 'Players(?:s([A-Z]+[a-z]+)),*'
specifically looks for a proper noun after the word "Players".
Please try this pattern instead:(?!^[A-Z]w+)(?<![.]s)([A-Z]w+)
It looks for proper nouns but exclude capitalized words at the beginning of the sentence.
([A-Z]w+)
matches a capitalized word
(?<![.]s)
ensures that we don't pick anything that follows a full-stop and a space
(?!^[A-Z]w+)
leaves out capitalized words at the beginning of the string (in a new line for example)
Warning
A generalized pattern may not be ideal if you require 100% precision in your results. This pattern could potentially understate the number of names if your sentence begins with a name.
Test it out here
Thanks for the reply! What does 'b' do?
– investigate311
Mar 28 at 7:03
@investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.
– kerwei
Mar 28 at 7:04
I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?
– investigate311
Mar 28 at 8:19
@investigate311 I've updated the pattern :)
– kerwei
Mar 28 at 8:59
1
Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!
– investigate311
Mar 28 at 11:17
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55391151%2fhow-to-match-a-pattern-that-repeats-n-times-in-a-string-using-findall%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Your pattern starts with Players...
hence it will only match once, because your string has only one Players
in it.
This would be the correct pattern which also handles the and
part:
- a player name is any character that is non-white space or comma,
- it must be followed either by a comma or white space +
and
import re
x = re.findall(r'([^s,]+)(?:,|s+and)',
'Players Jeremiah, Lou, Martha and Kay,')
print(x)
Test run:
$ python3 dummy.py
['Jeremiah', 'Lou', 'Martha', 'Kay']
Update to match further sample sentences given by OP a single regex is no longer enough:
- match a mandatory prefix
Players
and extract the rest - match an optional postfix, e.g.
are ...
, and strip that - detect player names in the remaining sentence
- starts at a word boundary
- starts with an uppercase letter
- followed by one-or-more lowercase letters
- ends at a word boundary
import re;
for input in (
'Jeremiah, Lou, Martha and Kay,',
'Players Jeremiah, Lou, Martha and Kay,',
'The Players are Martha, Joe, Toby and Kay.',
'The Players Martha, Joe and Toby are German.',
'The Players Martha, Joe and Toby are German,',
):
# strip mandatory prefix
match = re.search('Players(.*)', input)
if not match:
continue
#print(match[1])
# strip optional postfix
postfix = re.search('(.*)(?:s+(?:are)s+S+[,.])$', match[1])
if postfix:
match = postfix
#print(match[1])
result = re.findall(r'(b[A-Z][a-z]+b)', match[1])
print(input, '->', result)
Test run:
$ python3 dummy.py
Players Jeremiah, Lou, Martha and Kay, -> ['Jeremiah', 'Lou', 'Martha', 'Kay']
The Players are Martha, Joe, Toby and Kay. -> ['Martha', 'Joe', 'Toby', 'Kay']
The Players Martha, Joe and Toby are German. -> ['Martha', 'Joe', 'Toby']
The Players Martha, Joe and Toby are German, -> ['Martha', 'Joe', 'Toby']
This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.
– investigate311
Mar 28 at 6:57
That goes into the area of "natural language processing". A simple regex won't help there anymore.
– Stefan Becker
Mar 28 at 10:21
I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!
– investigate311
Mar 28 at 11:17
If this helped then please remember What should I do when someone answers my question?
– Stefan Becker
Mar 28 at 11:54
add a comment |
Your pattern starts with Players...
hence it will only match once, because your string has only one Players
in it.
This would be the correct pattern which also handles the and
part:
- a player name is any character that is non-white space or comma,
- it must be followed either by a comma or white space +
and
import re
x = re.findall(r'([^s,]+)(?:,|s+and)',
'Players Jeremiah, Lou, Martha and Kay,')
print(x)
Test run:
$ python3 dummy.py
['Jeremiah', 'Lou', 'Martha', 'Kay']
Update to match further sample sentences given by OP a single regex is no longer enough:
- match a mandatory prefix
Players
and extract the rest - match an optional postfix, e.g.
are ...
, and strip that - detect player names in the remaining sentence
- starts at a word boundary
- starts with an uppercase letter
- followed by one-or-more lowercase letters
- ends at a word boundary
import re;
for input in (
'Jeremiah, Lou, Martha and Kay,',
'Players Jeremiah, Lou, Martha and Kay,',
'The Players are Martha, Joe, Toby and Kay.',
'The Players Martha, Joe and Toby are German.',
'The Players Martha, Joe and Toby are German,',
):
# strip mandatory prefix
match = re.search('Players(.*)', input)
if not match:
continue
#print(match[1])
# strip optional postfix
postfix = re.search('(.*)(?:s+(?:are)s+S+[,.])$', match[1])
if postfix:
match = postfix
#print(match[1])
result = re.findall(r'(b[A-Z][a-z]+b)', match[1])
print(input, '->', result)
Test run:
$ python3 dummy.py
Players Jeremiah, Lou, Martha and Kay, -> ['Jeremiah', 'Lou', 'Martha', 'Kay']
The Players are Martha, Joe, Toby and Kay. -> ['Martha', 'Joe', 'Toby', 'Kay']
The Players Martha, Joe and Toby are German. -> ['Martha', 'Joe', 'Toby']
The Players Martha, Joe and Toby are German, -> ['Martha', 'Joe', 'Toby']
This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.
– investigate311
Mar 28 at 6:57
That goes into the area of "natural language processing". A simple regex won't help there anymore.
– Stefan Becker
Mar 28 at 10:21
I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!
– investigate311
Mar 28 at 11:17
If this helped then please remember What should I do when someone answers my question?
– Stefan Becker
Mar 28 at 11:54
add a comment |
Your pattern starts with Players...
hence it will only match once, because your string has only one Players
in it.
This would be the correct pattern which also handles the and
part:
- a player name is any character that is non-white space or comma,
- it must be followed either by a comma or white space +
and
import re
x = re.findall(r'([^s,]+)(?:,|s+and)',
'Players Jeremiah, Lou, Martha and Kay,')
print(x)
Test run:
$ python3 dummy.py
['Jeremiah', 'Lou', 'Martha', 'Kay']
Update to match further sample sentences given by OP a single regex is no longer enough:
- match a mandatory prefix
Players
and extract the rest - match an optional postfix, e.g.
are ...
, and strip that - detect player names in the remaining sentence
- starts at a word boundary
- starts with an uppercase letter
- followed by one-or-more lowercase letters
- ends at a word boundary
import re;
for input in (
'Jeremiah, Lou, Martha and Kay,',
'Players Jeremiah, Lou, Martha and Kay,',
'The Players are Martha, Joe, Toby and Kay.',
'The Players Martha, Joe and Toby are German.',
'The Players Martha, Joe and Toby are German,',
):
# strip mandatory prefix
match = re.search('Players(.*)', input)
if not match:
continue
#print(match[1])
# strip optional postfix
postfix = re.search('(.*)(?:s+(?:are)s+S+[,.])$', match[1])
if postfix:
match = postfix
#print(match[1])
result = re.findall(r'(b[A-Z][a-z]+b)', match[1])
print(input, '->', result)
Test run:
$ python3 dummy.py
Players Jeremiah, Lou, Martha and Kay, -> ['Jeremiah', 'Lou', 'Martha', 'Kay']
The Players are Martha, Joe, Toby and Kay. -> ['Martha', 'Joe', 'Toby', 'Kay']
The Players Martha, Joe and Toby are German. -> ['Martha', 'Joe', 'Toby']
The Players Martha, Joe and Toby are German, -> ['Martha', 'Joe', 'Toby']
Your pattern starts with Players...
hence it will only match once, because your string has only one Players
in it.
This would be the correct pattern which also handles the and
part:
- a player name is any character that is non-white space or comma,
- it must be followed either by a comma or white space +
and
import re
x = re.findall(r'([^s,]+)(?:,|s+and)',
'Players Jeremiah, Lou, Martha and Kay,')
print(x)
Test run:
$ python3 dummy.py
['Jeremiah', 'Lou', 'Martha', 'Kay']
Update to match further sample sentences given by OP a single regex is no longer enough:
- match a mandatory prefix
Players
and extract the rest - match an optional postfix, e.g.
are ...
, and strip that - detect player names in the remaining sentence
- starts at a word boundary
- starts with an uppercase letter
- followed by one-or-more lowercase letters
- ends at a word boundary
import re;
for input in (
'Jeremiah, Lou, Martha and Kay,',
'Players Jeremiah, Lou, Martha and Kay,',
'The Players are Martha, Joe, Toby and Kay.',
'The Players Martha, Joe and Toby are German.',
'The Players Martha, Joe and Toby are German,',
):
# strip mandatory prefix
match = re.search('Players(.*)', input)
if not match:
continue
#print(match[1])
# strip optional postfix
postfix = re.search('(.*)(?:s+(?:are)s+S+[,.])$', match[1])
if postfix:
match = postfix
#print(match[1])
result = re.findall(r'(b[A-Z][a-z]+b)', match[1])
print(input, '->', result)
Test run:
$ python3 dummy.py
Players Jeremiah, Lou, Martha and Kay, -> ['Jeremiah', 'Lou', 'Martha', 'Kay']
The Players are Martha, Joe, Toby and Kay. -> ['Martha', 'Joe', 'Toby', 'Kay']
The Players Martha, Joe and Toby are German. -> ['Martha', 'Joe', 'Toby']
The Players Martha, Joe and Toby are German, -> ['Martha', 'Joe', 'Toby']
edited Mar 28 at 10:28
answered Mar 28 at 6:24
Stefan BeckerStefan Becker
4,6166 gold badges11 silver badges25 bronze badges
4,6166 gold badges11 silver badges25 bronze badges
This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.
– investigate311
Mar 28 at 6:57
That goes into the area of "natural language processing". A simple regex won't help there anymore.
– Stefan Becker
Mar 28 at 10:21
I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!
– investigate311
Mar 28 at 11:17
If this helped then please remember What should I do when someone answers my question?
– Stefan Becker
Mar 28 at 11:54
add a comment |
This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.
– investigate311
Mar 28 at 6:57
That goes into the area of "natural language processing". A simple regex won't help there anymore.
– Stefan Becker
Mar 28 at 10:21
I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!
– investigate311
Mar 28 at 11:17
If this helped then please remember What should I do when someone answers my question?
– Stefan Becker
Mar 28 at 11:54
This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.
– investigate311
Mar 28 at 6:57
This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.
– investigate311
Mar 28 at 6:57
That goes into the area of "natural language processing". A simple regex won't help there anymore.
– Stefan Becker
Mar 28 at 10:21
That goes into the area of "natural language processing". A simple regex won't help there anymore.
– Stefan Becker
Mar 28 at 10:21
I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!
– investigate311
Mar 28 at 11:17
I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!
– investigate311
Mar 28 at 11:17
If this helped then please remember What should I do when someone answers my question?
– Stefan Becker
Mar 28 at 11:54
If this helped then please remember What should I do when someone answers my question?
– Stefan Becker
Mar 28 at 11:54
add a comment |
I guess you're basically looking for a pattern for proper nouns. In the pattern you're using, it only captures "Jeremiah" because your pattern 'Players(?:s([A-Z]+[a-z]+)),*'
specifically looks for a proper noun after the word "Players".
Please try this pattern instead:(?!^[A-Z]w+)(?<![.]s)([A-Z]w+)
It looks for proper nouns but exclude capitalized words at the beginning of the sentence.
([A-Z]w+)
matches a capitalized word
(?<![.]s)
ensures that we don't pick anything that follows a full-stop and a space
(?!^[A-Z]w+)
leaves out capitalized words at the beginning of the string (in a new line for example)
Warning
A generalized pattern may not be ideal if you require 100% precision in your results. This pattern could potentially understate the number of names if your sentence begins with a name.
Test it out here
Thanks for the reply! What does 'b' do?
– investigate311
Mar 28 at 7:03
@investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.
– kerwei
Mar 28 at 7:04
I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?
– investigate311
Mar 28 at 8:19
@investigate311 I've updated the pattern :)
– kerwei
Mar 28 at 8:59
1
Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!
– investigate311
Mar 28 at 11:17
add a comment |
I guess you're basically looking for a pattern for proper nouns. In the pattern you're using, it only captures "Jeremiah" because your pattern 'Players(?:s([A-Z]+[a-z]+)),*'
specifically looks for a proper noun after the word "Players".
Please try this pattern instead:(?!^[A-Z]w+)(?<![.]s)([A-Z]w+)
It looks for proper nouns but exclude capitalized words at the beginning of the sentence.
([A-Z]w+)
matches a capitalized word
(?<![.]s)
ensures that we don't pick anything that follows a full-stop and a space
(?!^[A-Z]w+)
leaves out capitalized words at the beginning of the string (in a new line for example)
Warning
A generalized pattern may not be ideal if you require 100% precision in your results. This pattern could potentially understate the number of names if your sentence begins with a name.
Test it out here
Thanks for the reply! What does 'b' do?
– investigate311
Mar 28 at 7:03
@investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.
– kerwei
Mar 28 at 7:04
I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?
– investigate311
Mar 28 at 8:19
@investigate311 I've updated the pattern :)
– kerwei
Mar 28 at 8:59
1
Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!
– investigate311
Mar 28 at 11:17
add a comment |
I guess you're basically looking for a pattern for proper nouns. In the pattern you're using, it only captures "Jeremiah" because your pattern 'Players(?:s([A-Z]+[a-z]+)),*'
specifically looks for a proper noun after the word "Players".
Please try this pattern instead:(?!^[A-Z]w+)(?<![.]s)([A-Z]w+)
It looks for proper nouns but exclude capitalized words at the beginning of the sentence.
([A-Z]w+)
matches a capitalized word
(?<![.]s)
ensures that we don't pick anything that follows a full-stop and a space
(?!^[A-Z]w+)
leaves out capitalized words at the beginning of the string (in a new line for example)
Warning
A generalized pattern may not be ideal if you require 100% precision in your results. This pattern could potentially understate the number of names if your sentence begins with a name.
Test it out here
I guess you're basically looking for a pattern for proper nouns. In the pattern you're using, it only captures "Jeremiah" because your pattern 'Players(?:s([A-Z]+[a-z]+)),*'
specifically looks for a proper noun after the word "Players".
Please try this pattern instead:(?!^[A-Z]w+)(?<![.]s)([A-Z]w+)
It looks for proper nouns but exclude capitalized words at the beginning of the sentence.
([A-Z]w+)
matches a capitalized word
(?<![.]s)
ensures that we don't pick anything that follows a full-stop and a space
(?!^[A-Z]w+)
leaves out capitalized words at the beginning of the string (in a new line for example)
Warning
A generalized pattern may not be ideal if you require 100% precision in your results. This pattern could potentially understate the number of names if your sentence begins with a name.
Test it out here
edited Mar 28 at 9:20
answered Mar 28 at 6:25
kerweikerwei
1,5391 gold badge9 silver badges20 bronze badges
1,5391 gold badge9 silver badges20 bronze badges
Thanks for the reply! What does 'b' do?
– investigate311
Mar 28 at 7:03
@investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.
– kerwei
Mar 28 at 7:04
I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?
– investigate311
Mar 28 at 8:19
@investigate311 I've updated the pattern :)
– kerwei
Mar 28 at 8:59
1
Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!
– investigate311
Mar 28 at 11:17
add a comment |
Thanks for the reply! What does 'b' do?
– investigate311
Mar 28 at 7:03
@investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.
– kerwei
Mar 28 at 7:04
I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?
– investigate311
Mar 28 at 8:19
@investigate311 I've updated the pattern :)
– kerwei
Mar 28 at 8:59
1
Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!
– investigate311
Mar 28 at 11:17
Thanks for the reply! What does 'b' do?
– investigate311
Mar 28 at 7:03
Thanks for the reply! What does 'b' do?
– investigate311
Mar 28 at 7:03
@investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.
– kerwei
Mar 28 at 7:04
@investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.
– kerwei
Mar 28 at 7:04
I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?
– investigate311
Mar 28 at 8:19
I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?
– investigate311
Mar 28 at 8:19
@investigate311 I've updated the pattern :)
– kerwei
Mar 28 at 8:59
@investigate311 I've updated the pattern :)
– kerwei
Mar 28 at 8:59
1
1
Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!
– investigate311
Mar 28 at 11:17
Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!
– investigate311
Mar 28 at 11:17
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55391151%2fhow-to-match-a-pattern-that-repeats-n-times-in-a-string-using-findall%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown