How to match a pattern that repeats n times in a string using findall()?How to use strings from one text file to search another, and create a new text file with columns from another?Extract matching string from startPython 3 REGEX assistanceRegex for optional end-part of substringRegex to match dd_ddd onlyFindAll - locating lines with a patternFetch string matching a patternStuck with a regular expression in Python 3Match Word before Matching Pattern Indefinite TimesI need to find a word in a sentence and then capture to the control characters

How to accelerate progress in mathematical research

Does the word voltage exist in academic engineering?

How to make a pipe-divided tuple?

Why does 8 bit truecolor use only 2 bits for blue?

Compiler optimization of bitwise not operation

How do English-speaking kids loudly request something?

What can we do about our 9-month-old putting fingers down his throat?

What is the purpose of the rotating plate in front of the lock?

Male viewpoint in an erotic novel

Where on Earth is it easiest to survive in the wilderness?

Fantasy Military Arms and Armor: the Dwarven Grand Armory

Friend is very nit picky about side comments I don't intend to be taken too seriously

Is it right to use the ideas of non-winning designers in a design contest?

Can taking my 1-week-old on a 6-7 hours journey in the car lead to medical complications?

How is the phase of 120V AC established in a North American home?

Is there some sort of French saying for "a person's signature move"?

Poor management handling of recent sickness and how to approach my return?

How to interpret or parse this confusing 'NOT' and 'AND' legal clause

How to apply a register to a command

What is the "Brake to Exit" feature on the Boeing 777X?

Laptop failure due to constant fluctuation of AC frequency and voltage

Is future tense in English really a myth?

pipe command output to convert?

Did the Byzantines ever attempt to move their capital to Rome?

How to match a pattern that repeats n times in a string using findall()?

How to use strings from one text file to search another, and create a new text file with columns from another?Extract matching string from startPython 3 REGEX assistanceRegex for optional end-part of substringRegex to match dd_ddd onlyFindAll - locating lines with a patternFetch string matching a patternStuck with a regular expression in Python 3Match Word before Matching Pattern Indefinite TimesI need to find a word in a sentence and then capture to the control characters

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I'm trying to parse a repetitive string and find multiple matches from it that match the pattern in my findall() function.

Essentially what I want to do is extract the names of the players from the string. They're separated by commas and the last name of the player is preceded by an "and". I tried to get the comma part down but for some reason findall() doesn't seem to repeat the matching pattern even though I added a *.

x = re.findall('Players(?:s([A-Z]+[a-z]+)),*', 'Players Jeremiah, Lou, Martha and Kay,')

I haven't gotten the and part down yet since I'm stuck on the commas but from the above code, I feel like, x should capture Jeremiah, Lou and Martha at the very least. My code only manages to capture Jeremiah.

edited Mar 28 at 6:26

Stefan Becker

4,6166 gold badges11 silver badges25 bronze badges

asked Mar 28 at 6:11

investigate311

175 bronze badges

add a comment |

I'm trying to parse a repetitive string and find multiple matches from it that match the pattern in my findall() function.

x = re.findall('Players(?:s([A-Z]+[a-z]+)),*', 'Players Jeremiah, Lou, Martha and Kay,')

edited Mar 28 at 6:26

Stefan Becker

4,6166 gold badges11 silver badges25 bronze badges

asked Mar 28 at 6:11

investigate311

175 bronze badges

add a comment |

I'm trying to parse a repetitive string and find multiple matches from it that match the pattern in my findall() function.

x = re.findall('Players(?:s([A-Z]+[a-z]+)),*', 'Players Jeremiah, Lou, Martha and Kay,')

edited Mar 28 at 6:26

Stefan Becker

4,6166 gold badges11 silver badges25 bronze badges

asked Mar 28 at 6:11

investigate311

175 bronze badges

I'm trying to parse a repetitive string and find multiple matches from it that match the pattern in my findall() function.

x = re.findall('Players(?:s([A-Z]+[a-z]+)),*', 'Players Jeremiah, Lou, Martha and Kay,')

python-3.x

edited Mar 28 at 6:26

Stefan Becker

4,6166 gold badges11 silver badges25 bronze badges

asked Mar 28 at 6:11

investigate311

175 bronze badges

edited Mar 28 at 6:26

Stefan Becker

4,6166 gold badges11 silver badges25 bronze badges

asked Mar 28 at 6:11

investigate311

175 bronze badges

edited Mar 28 at 6:26

Stefan Becker

4,6166 gold badges11 silver badges25 bronze badges

edited Mar 28 at 6:26

Stefan Becker

4,6166 gold badges11 silver badges25 bronze badges

edited Mar 28 at 6:26

Stefan Becker

4,6166 gold badges11 silver badges25 bronze badges

asked Mar 28 at 6:11

investigate311

175 bronze badges

asked Mar 28 at 6:11

investigate311

175 bronze badges

asked Mar 28 at 6:11

investigate311

175 bronze badges

add a comment |

2 Answers
2

active

oldest

votes

Your pattern starts with Players... hence it will only match once, because your string has only one Players in it.

This would be the correct pattern which also handles the and part:

a player name is any character that is non-white space or comma,

it must be followed either by a comma or white space + and

import re

x = re.findall(r'([^s,]+)(?:,|s+and)',
 'Players Jeremiah, Lou, Martha and Kay,')

print(x)

Test run:

$ python3 dummy.py
['Jeremiah', 'Lou', 'Martha', 'Kay']

Update to match further sample sentences given by OP a single regex is no longer enough:

match a mandatory prefix Players and extract the rest

match an optional postfix, e.g. are ..., and strip that

detect player names in the remaining sentence
- starts at a word boundary
- starts with an uppercase letter
- followed by one-or-more lowercase letters
- ends at a word boundary

import re;

for input in (
 'Jeremiah, Lou, Martha and Kay,',
 'Players Jeremiah, Lou, Martha and Kay,',
 'The Players are Martha, Joe, Toby and Kay.',
 'The Players Martha, Joe and Toby are German.',
 'The Players Martha, Joe and Toby are German,',
 ):

 # strip mandatory prefix
 match = re.search('Players(.*)', input)
 if not match:
 continue
 #print(match[1])

 # strip optional postfix
 postfix = re.search('(.*)(?:s+(?:are)s+S+[,.])$', match[1])
 if postfix:
 match = postfix
 #print(match[1])

 result = re.findall(r'(b[A-Z][a-z]+b)', match[1])
 print(input, '->', result)

Test run:

$ python3 dummy.py
Players Jeremiah, Lou, Martha and Kay, -> ['Jeremiah', 'Lou', 'Martha', 'Kay']
The Players are Martha, Joe, Toby and Kay. -> ['Martha', 'Joe', 'Toby', 'Kay']
The Players Martha, Joe and Toby are German. -> ['Martha', 'Joe', 'Toby']
The Players Martha, Joe and Toby are German, -> ['Martha', 'Joe', 'Toby']

edited Mar 28 at 10:28

answered Mar 28 at 6:24

Stefan Becker

4,6166 gold badges11 silver badges25 bronze badges

This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.

– investigate311
Mar 28 at 6:57

That goes into the area of "natural language processing". A simple regex won't help there anymore.

– Stefan Becker
Mar 28 at 10:21

I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!

– investigate311
Mar 28 at 11:17

If this helped then please remember What should I do when someone answers my question?

– Stefan Becker
Mar 28 at 11:54

add a comment |

I guess you're basically looking for a pattern for proper nouns. In the pattern you're using, it only captures "Jeremiah" because your pattern 'Players(?:s([A-Z]+[a-z]+)),*' specifically looks for a proper noun after the word "Players".

Please try this pattern instead:
(?!^[A-Z]w+)(?<![.]s)([A-Z]w+)

It looks for proper nouns but exclude capitalized words at the beginning of the sentence.

([A-Z]w+) matches a capitalized word

(?<![.]s) ensures that we don't pick anything that follows a full-stop and a space

(?!^[A-Z]w+) leaves out capitalized words at the beginning of the string (in a new line for example)

Warning
A generalized pattern may not be ideal if you require 100% precision in your results. This pattern could potentially understate the number of names if your sentence begins with a name.

Test it out here

edited Mar 28 at 9:20

answered Mar 28 at 6:25

kerwei

1,5391 gold badge9 silver badges20 bronze badges

Thanks for the reply! What does 'b' do?

– investigate311
Mar 28 at 7:03

@investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.

– kerwei
Mar 28 at 7:04

I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?

– investigate311
Mar 28 at 8:19

@investigate311 I've updated the pattern :)

– kerwei
Mar 28 at 8:59

1

Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!

– investigate311
Mar 28 at 11:17

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55391151%2fhow-to-match-a-pattern-that-repeats-n-times-in-a-string-using-findall%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Your pattern starts with Players... hence it will only match once, because your string has only one Players in it.

This would be the correct pattern which also handles the and part:

a player name is any character that is non-white space or comma,

it must be followed either by a comma or white space + and

import re

x = re.findall(r'([^s,]+)(?:,|s+and)',
 'Players Jeremiah, Lou, Martha and Kay,')

print(x)

Test run:

$ python3 dummy.py
['Jeremiah', 'Lou', 'Martha', 'Kay']

Update to match further sample sentences given by OP a single regex is no longer enough:

match a mandatory prefix Players and extract the rest

match an optional postfix, e.g. are ..., and strip that

detect player names in the remaining sentence
- starts at a word boundary
- starts with an uppercase letter
- followed by one-or-more lowercase letters
- ends at a word boundary

import re;

for input in (
 'Jeremiah, Lou, Martha and Kay,',
 'Players Jeremiah, Lou, Martha and Kay,',
 'The Players are Martha, Joe, Toby and Kay.',
 'The Players Martha, Joe and Toby are German.',
 'The Players Martha, Joe and Toby are German,',
 ):

 # strip mandatory prefix
 match = re.search('Players(.*)', input)
 if not match:
 continue
 #print(match[1])

 # strip optional postfix
 postfix = re.search('(.*)(?:s+(?:are)s+S+[,.])$', match[1])
 if postfix:
 match = postfix
 #print(match[1])

 result = re.findall(r'(b[A-Z][a-z]+b)', match[1])
 print(input, '->', result)

Test run:

$ python3 dummy.py
Players Jeremiah, Lou, Martha and Kay, -> ['Jeremiah', 'Lou', 'Martha', 'Kay']
The Players are Martha, Joe, Toby and Kay. -> ['Martha', 'Joe', 'Toby', 'Kay']
The Players Martha, Joe and Toby are German. -> ['Martha', 'Joe', 'Toby']
The Players Martha, Joe and Toby are German, -> ['Martha', 'Joe', 'Toby']

edited Mar 28 at 10:28

answered Mar 28 at 6:24

Stefan Becker

4,6166 gold badges11 silver badges25 bronze badges

This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.

– investigate311
Mar 28 at 6:57

That goes into the area of "natural language processing". A simple regex won't help there anymore.

– Stefan Becker
Mar 28 at 10:21

I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!

– investigate311
Mar 28 at 11:17

If this helped then please remember What should I do when someone answers my question?

– Stefan Becker
Mar 28 at 11:54

add a comment |

Your pattern starts with Players... hence it will only match once, because your string has only one Players in it.

This would be the correct pattern which also handles the and part:

a player name is any character that is non-white space or comma,

it must be followed either by a comma or white space + and

import re

x = re.findall(r'([^s,]+)(?:,|s+and)',
 'Players Jeremiah, Lou, Martha and Kay,')

print(x)

Test run:

$ python3 dummy.py
['Jeremiah', 'Lou', 'Martha', 'Kay']

Update to match further sample sentences given by OP a single regex is no longer enough:

match a mandatory prefix Players and extract the rest

match an optional postfix, e.g. are ..., and strip that

detect player names in the remaining sentence
- starts at a word boundary
- starts with an uppercase letter
- followed by one-or-more lowercase letters
- ends at a word boundary

import re;

for input in (
 'Jeremiah, Lou, Martha and Kay,',
 'Players Jeremiah, Lou, Martha and Kay,',
 'The Players are Martha, Joe, Toby and Kay.',
 'The Players Martha, Joe and Toby are German.',
 'The Players Martha, Joe and Toby are German,',
 ):

 # strip mandatory prefix
 match = re.search('Players(.*)', input)
 if not match:
 continue
 #print(match[1])

 # strip optional postfix
 postfix = re.search('(.*)(?:s+(?:are)s+S+[,.])$', match[1])
 if postfix:
 match = postfix
 #print(match[1])

 result = re.findall(r'(b[A-Z][a-z]+b)', match[1])
 print(input, '->', result)

Test run:

$ python3 dummy.py
Players Jeremiah, Lou, Martha and Kay, -> ['Jeremiah', 'Lou', 'Martha', 'Kay']
The Players are Martha, Joe, Toby and Kay. -> ['Martha', 'Joe', 'Toby', 'Kay']
The Players Martha, Joe and Toby are German. -> ['Martha', 'Joe', 'Toby']
The Players Martha, Joe and Toby are German, -> ['Martha', 'Joe', 'Toby']

edited Mar 28 at 10:28

answered Mar 28 at 6:24

Stefan Becker

4,6166 gold badges11 silver badges25 bronze badges

This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.

– investigate311
Mar 28 at 6:57

That goes into the area of "natural language processing". A simple regex won't help there anymore.

– Stefan Becker
Mar 28 at 10:21

I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!

– investigate311
Mar 28 at 11:17

If this helped then please remember What should I do when someone answers my question?

– Stefan Becker
Mar 28 at 11:54

add a comment |

Your pattern starts with Players... hence it will only match once, because your string has only one Players in it.

This would be the correct pattern which also handles the and part:

a player name is any character that is non-white space or comma,

it must be followed either by a comma or white space + and

import re

x = re.findall(r'([^s,]+)(?:,|s+and)',
 'Players Jeremiah, Lou, Martha and Kay,')

print(x)

Test run:

$ python3 dummy.py
['Jeremiah', 'Lou', 'Martha', 'Kay']

Update to match further sample sentences given by OP a single regex is no longer enough:

match a mandatory prefix Players and extract the rest

match an optional postfix, e.g. are ..., and strip that

detect player names in the remaining sentence
- starts at a word boundary
- starts with an uppercase letter
- followed by one-or-more lowercase letters
- ends at a word boundary

import re;

for input in (
 'Jeremiah, Lou, Martha and Kay,',
 'Players Jeremiah, Lou, Martha and Kay,',
 'The Players are Martha, Joe, Toby and Kay.',
 'The Players Martha, Joe and Toby are German.',
 'The Players Martha, Joe and Toby are German,',
 ):

 # strip mandatory prefix
 match = re.search('Players(.*)', input)
 if not match:
 continue
 #print(match[1])

 # strip optional postfix
 postfix = re.search('(.*)(?:s+(?:are)s+S+[,.])$', match[1])
 if postfix:
 match = postfix
 #print(match[1])

 result = re.findall(r'(b[A-Z][a-z]+b)', match[1])
 print(input, '->', result)

Test run:

$ python3 dummy.py
Players Jeremiah, Lou, Martha and Kay, -> ['Jeremiah', 'Lou', 'Martha', 'Kay']
The Players are Martha, Joe, Toby and Kay. -> ['Martha', 'Joe', 'Toby', 'Kay']
The Players Martha, Joe and Toby are German. -> ['Martha', 'Joe', 'Toby']
The Players Martha, Joe and Toby are German, -> ['Martha', 'Joe', 'Toby']

edited Mar 28 at 10:28

answered Mar 28 at 6:24

Stefan Becker

4,6166 gold badges11 silver badges25 bronze badges

Your pattern starts with Players... hence it will only match once, because your string has only one Players in it.

This would be the correct pattern which also handles the and part:

a player name is any character that is non-white space or comma,

it must be followed either by a comma or white space + and

import re

x = re.findall(r'([^s,]+)(?:,|s+and)',
 'Players Jeremiah, Lou, Martha and Kay,')

print(x)

Test run:

$ python3 dummy.py
['Jeremiah', 'Lou', 'Martha', 'Kay']

Update to match further sample sentences given by OP a single regex is no longer enough:

match a mandatory prefix Players and extract the rest

match an optional postfix, e.g. are ..., and strip that

detect player names in the remaining sentence
- starts at a word boundary
- starts with an uppercase letter
- followed by one-or-more lowercase letters
- ends at a word boundary

import re;

for input in (
 'Jeremiah, Lou, Martha and Kay,',
 'Players Jeremiah, Lou, Martha and Kay,',
 'The Players are Martha, Joe, Toby and Kay.',
 'The Players Martha, Joe and Toby are German.',
 'The Players Martha, Joe and Toby are German,',
 ):

 # strip mandatory prefix
 match = re.search('Players(.*)', input)
 if not match:
 continue
 #print(match[1])

 # strip optional postfix
 postfix = re.search('(.*)(?:s+(?:are)s+S+[,.])$', match[1])
 if postfix:
 match = postfix
 #print(match[1])

 result = re.findall(r'(b[A-Z][a-z]+b)', match[1])
 print(input, '->', result)

Test run:

$ python3 dummy.py
Players Jeremiah, Lou, Martha and Kay, -> ['Jeremiah', 'Lou', 'Martha', 'Kay']
The Players are Martha, Joe, Toby and Kay. -> ['Martha', 'Joe', 'Toby', 'Kay']
The Players Martha, Joe and Toby are German. -> ['Martha', 'Joe', 'Toby']
The Players Martha, Joe and Toby are German, -> ['Martha', 'Joe', 'Toby']

edited Mar 28 at 10:28

answered Mar 28 at 6:24

Stefan Becker

4,6166 gold badges11 silver badges25 bronze badges

edited Mar 28 at 10:28

answered Mar 28 at 6:24

Stefan Becker

4,6166 gold badges11 silver badges25 bronze badges

answered Mar 28 at 6:24

Stefan Becker

4,6166 gold badges11 silver badges25 bronze badges

answered Mar 28 at 6:24

Stefan Becker

4,6166 gold badges11 silver badges25 bronze badges

This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.

– investigate311
Mar 28 at 6:57

That goes into the area of "natural language processing". A simple regex won't help there anymore.

– Stefan Becker
Mar 28 at 10:21

I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!

– investigate311
Mar 28 at 11:17

If this helped then please remember What should I do when someone answers my question?

– Stefan Becker
Mar 28 at 11:54

add a comment |

This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.

– investigate311
Mar 28 at 6:57

That goes into the area of "natural language processing". A simple regex won't help there anymore.

– Stefan Becker
Mar 28 at 10:21

I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!

– investigate311
Mar 28 at 11:17

If this helped then please remember What should I do when someone answers my question?

– Stefan Becker
Mar 28 at 11:54

This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.

– investigate311
Mar 28 at 6:57

That goes into the area of "natural language processing". A simple regex won't help there anymore.

– Stefan Becker
Mar 28 at 10:21

I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!

– investigate311
Mar 28 at 11:17

If this helped then please remember What should I do when someone answers my question?

– Stefan Becker
Mar 28 at 11:54

add a comment |

Please try this pattern instead:
(?!^[A-Z]w+)(?<![.]s)([A-Z]w+)

It looks for proper nouns but exclude capitalized words at the beginning of the sentence.

([A-Z]w+) matches a capitalized word

(?<![.]s) ensures that we don't pick anything that follows a full-stop and a space

(?!^[A-Z]w+) leaves out capitalized words at the beginning of the string (in a new line for example)

Warning
A generalized pattern may not be ideal if you require 100% precision in your results. This pattern could potentially understate the number of names if your sentence begins with a name.

Test it out here

edited Mar 28 at 9:20

answered Mar 28 at 6:25

kerwei

1,5391 gold badge9 silver badges20 bronze badges

Thanks for the reply! What does 'b' do?

– investigate311
Mar 28 at 7:03

@investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.

– kerwei
Mar 28 at 7:04

I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?

– investigate311
Mar 28 at 8:19

@investigate311 I've updated the pattern :)

– kerwei
Mar 28 at 8:59

1

Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!

– investigate311
Mar 28 at 11:17

add a comment |

Please try this pattern instead:
(?!^[A-Z]w+)(?<![.]s)([A-Z]w+)

It looks for proper nouns but exclude capitalized words at the beginning of the sentence.

([A-Z]w+) matches a capitalized word

(?<![.]s) ensures that we don't pick anything that follows a full-stop and a space

(?!^[A-Z]w+) leaves out capitalized words at the beginning of the string (in a new line for example)

Warning
A generalized pattern may not be ideal if you require 100% precision in your results. This pattern could potentially understate the number of names if your sentence begins with a name.

Test it out here

edited Mar 28 at 9:20

answered Mar 28 at 6:25

kerwei

1,5391 gold badge9 silver badges20 bronze badges

Thanks for the reply! What does 'b' do?

– investigate311
Mar 28 at 7:03

@investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.

– kerwei
Mar 28 at 7:04

I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?

– investigate311
Mar 28 at 8:19

@investigate311 I've updated the pattern :)

– kerwei
Mar 28 at 8:59

1

Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!

– investigate311
Mar 28 at 11:17

add a comment |

Please try this pattern instead:
(?!^[A-Z]w+)(?<![.]s)([A-Z]w+)

It looks for proper nouns but exclude capitalized words at the beginning of the sentence.

([A-Z]w+) matches a capitalized word

(?<![.]s) ensures that we don't pick anything that follows a full-stop and a space

(?!^[A-Z]w+) leaves out capitalized words at the beginning of the string (in a new line for example)

Warning
A generalized pattern may not be ideal if you require 100% precision in your results. This pattern could potentially understate the number of names if your sentence begins with a name.

Test it out here

edited Mar 28 at 9:20

answered Mar 28 at 6:25

kerwei

1,5391 gold badge9 silver badges20 bronze badges

Please try this pattern instead:
(?!^[A-Z]w+)(?<![.]s)([A-Z]w+)

It looks for proper nouns but exclude capitalized words at the beginning of the sentence.

([A-Z]w+) matches a capitalized word

(?<![.]s) ensures that we don't pick anything that follows a full-stop and a space

(?!^[A-Z]w+) leaves out capitalized words at the beginning of the string (in a new line for example)

Warning
A generalized pattern may not be ideal if you require 100% precision in your results. This pattern could potentially understate the number of names if your sentence begins with a name.

Test it out here

edited Mar 28 at 9:20

answered Mar 28 at 6:25

kerwei

1,5391 gold badge9 silver badges20 bronze badges

edited Mar 28 at 9:20

answered Mar 28 at 6:25

kerwei

1,5391 gold badge9 silver badges20 bronze badges

answered Mar 28 at 6:25

kerwei

1,5391 gold badge9 silver badges20 bronze badges

answered Mar 28 at 6:25

kerwei

1,5391 gold badge9 silver badges20 bronze badges

Thanks for the reply! What does 'b' do?

– investigate311
Mar 28 at 7:03

@investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.

– kerwei
Mar 28 at 7:04

I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?

– investigate311
Mar 28 at 8:19

@investigate311 I've updated the pattern :)

– kerwei
Mar 28 at 8:59

1

Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!

– investigate311
Mar 28 at 11:17

add a comment |

Thanks for the reply! What does 'b' do?

– investigate311
Mar 28 at 7:03

@investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.

– kerwei
Mar 28 at 7:04

I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?

– investigate311
Mar 28 at 8:19

@investigate311 I've updated the pattern :)

– kerwei
Mar 28 at 8:59

1

Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!

– investigate311
Mar 28 at 11:17

Thanks for the reply! What does 'b' do?

– investigate311
Mar 28 at 7:03

@investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.

– kerwei
Mar 28 at 7:04

I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?

– investigate311
Mar 28 at 8:19

@investigate311 I've updated the pattern :)

– kerwei
Mar 28 at 8:59

Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!

– investigate311
Mar 28 at 11:17

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

2 Answers
2

2 Answers
2

2 Answers
2