How to match a pattern that repeats n times in a string using findall()?How to use strings from one text file to search another, and create a new text file with columns from another?Extract matching string from startPython 3 REGEX assistanceRegex for optional end-part of substringRegex to match dd_ddd onlyFindAll - locating lines with a patternFetch string matching a patternStuck with a regular expression in Python 3Match Word before Matching Pattern Indefinite TimesI need to find a word in a sentence and then capture to the control characters

How to accelerate progress in mathematical research

Does the word voltage exist in academic engineering?

How to make a pipe-divided tuple?

Why does 8 bit truecolor use only 2 bits for blue?

Compiler optimization of bitwise not operation

How do English-speaking kids loudly request something?

What can we do about our 9-month-old putting fingers down his throat?

What is the purpose of the rotating plate in front of the lock?

Male viewpoint in an erotic novel

Where on Earth is it easiest to survive in the wilderness?

Fantasy Military Arms and Armor: the Dwarven Grand Armory

Friend is very nit picky about side comments I don't intend to be taken too seriously

Is it right to use the ideas of non-winning designers in a design contest?

Can taking my 1-week-old on a 6-7 hours journey in the car lead to medical complications?

How is the phase of 120V AC established in a North American home?

Is there some sort of French saying for "a person's signature move"?

Poor management handling of recent sickness and how to approach my return?

How to interpret or parse this confusing 'NOT' and 'AND' legal clause

How to apply a register to a command

What is the "Brake to Exit" feature on the Boeing 777X?

Laptop failure due to constant fluctuation of AC frequency and voltage

Is future tense in English really a myth?

pipe command output to convert?

Did the Byzantines ever attempt to move their capital to Rome?



How to match a pattern that repeats n times in a string using findall()?


How to use strings from one text file to search another, and create a new text file with columns from another?Extract matching string from startPython 3 REGEX assistanceRegex for optional end-part of substringRegex to match dd_ddd onlyFindAll - locating lines with a patternFetch string matching a patternStuck with a regular expression in Python 3Match Word before Matching Pattern Indefinite TimesI need to find a word in a sentence and then capture to the control characters






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















I'm trying to parse a repetitive string and find multiple matches from it that match the pattern in my findall() function.



Essentially what I want to do is extract the names of the players from the string. They're separated by commas and the last name of the player is preceded by an "and". I tried to get the comma part down but for some reason findall() doesn't seem to repeat the matching pattern even though I added a *.



x = re.findall('Players(?:s([A-Z]+[a-z]+)),*', 'Players Jeremiah, Lou, Martha and Kay,')


I haven't gotten the and part down yet since I'm stuck on the commas but from the above code, I feel like, x should capture Jeremiah, Lou and Martha at the very least. My code only manages to capture Jeremiah.










share|improve this question
































    1















    I'm trying to parse a repetitive string and find multiple matches from it that match the pattern in my findall() function.



    Essentially what I want to do is extract the names of the players from the string. They're separated by commas and the last name of the player is preceded by an "and". I tried to get the comma part down but for some reason findall() doesn't seem to repeat the matching pattern even though I added a *.



    x = re.findall('Players(?:s([A-Z]+[a-z]+)),*', 'Players Jeremiah, Lou, Martha and Kay,')


    I haven't gotten the and part down yet since I'm stuck on the commas but from the above code, I feel like, x should capture Jeremiah, Lou and Martha at the very least. My code only manages to capture Jeremiah.










    share|improve this question




























      1












      1








      1








      I'm trying to parse a repetitive string and find multiple matches from it that match the pattern in my findall() function.



      Essentially what I want to do is extract the names of the players from the string. They're separated by commas and the last name of the player is preceded by an "and". I tried to get the comma part down but for some reason findall() doesn't seem to repeat the matching pattern even though I added a *.



      x = re.findall('Players(?:s([A-Z]+[a-z]+)),*', 'Players Jeremiah, Lou, Martha and Kay,')


      I haven't gotten the and part down yet since I'm stuck on the commas but from the above code, I feel like, x should capture Jeremiah, Lou and Martha at the very least. My code only manages to capture Jeremiah.










      share|improve this question
















      I'm trying to parse a repetitive string and find multiple matches from it that match the pattern in my findall() function.



      Essentially what I want to do is extract the names of the players from the string. They're separated by commas and the last name of the player is preceded by an "and". I tried to get the comma part down but for some reason findall() doesn't seem to repeat the matching pattern even though I added a *.



      x = re.findall('Players(?:s([A-Z]+[a-z]+)),*', 'Players Jeremiah, Lou, Martha and Kay,')


      I haven't gotten the and part down yet since I'm stuck on the commas but from the above code, I feel like, x should capture Jeremiah, Lou and Martha at the very least. My code only manages to capture Jeremiah.







      python-3.x






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 28 at 6:26









      Stefan Becker

      4,6166 gold badges11 silver badges25 bronze badges




      4,6166 gold badges11 silver badges25 bronze badges










      asked Mar 28 at 6:11









      investigate311investigate311

      175 bronze badges




      175 bronze badges

























          2 Answers
          2






          active

          oldest

          votes


















          0
















          Your pattern starts with Players... hence it will only match once, because your string has only one Players in it.



          This would be the correct pattern which also handles the and part:



          • a player name is any character that is non-white space or comma,

          • it must be followed either by a comma or white space + and

          import re

          x = re.findall(r'([^s,]+)(?:,|s+and)',
          'Players Jeremiah, Lou, Martha and Kay,')

          print(x)


          Test run:



          $ python3 dummy.py
          ['Jeremiah', 'Lou', 'Martha', 'Kay']



          Update to match further sample sentences given by OP a single regex is no longer enough:



          • match a mandatory prefix Players and extract the rest

          • match an optional postfix, e.g. are ..., and strip that

          • detect player names in the remaining sentence

            • starts at a word boundary

            • starts with an uppercase letter

            • followed by one-or-more lowercase letters

            • ends at a word boundary


          import re;

          for input in (
          'Jeremiah, Lou, Martha and Kay,',
          'Players Jeremiah, Lou, Martha and Kay,',
          'The Players are Martha, Joe, Toby and Kay.',
          'The Players Martha, Joe and Toby are German.',
          'The Players Martha, Joe and Toby are German,',
          ):

          # strip mandatory prefix
          match = re.search('Players(.*)', input)
          if not match:
          continue
          #print(match[1])

          # strip optional postfix
          postfix = re.search('(.*)(?:s+(?:are)s+S+[,.])$', match[1])
          if postfix:
          match = postfix
          #print(match[1])

          result = re.findall(r'(b[A-Z][a-z]+b)', match[1])
          print(input, '->', result)


          Test run:



          $ python3 dummy.py
          Players Jeremiah, Lou, Martha and Kay, -> ['Jeremiah', 'Lou', 'Martha', 'Kay']
          The Players are Martha, Joe, Toby and Kay. -> ['Martha', 'Joe', 'Toby', 'Kay']
          The Players Martha, Joe and Toby are German. -> ['Martha', 'Joe', 'Toby']
          The Players Martha, Joe and Toby are German, -> ['Martha', 'Joe', 'Toby']





          share|improve this answer



























          • This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.

            – investigate311
            Mar 28 at 6:57












          • That goes into the area of "natural language processing". A simple regex won't help there anymore.

            – Stefan Becker
            Mar 28 at 10:21











          • I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!

            – investigate311
            Mar 28 at 11:17











          • If this helped then please remember What should I do when someone answers my question?

            – Stefan Becker
            Mar 28 at 11:54



















          0
















          I guess you're basically looking for a pattern for proper nouns. In the pattern you're using, it only captures "Jeremiah" because your pattern 'Players(?:s([A-Z]+[a-z]+)),*' specifically looks for a proper noun after the word "Players".



          Please try this pattern instead:
          (?!^[A-Z]w+)(?<![.]s)([A-Z]w+)



          It looks for proper nouns but exclude capitalized words at the beginning of the sentence.




          ([A-Z]w+) matches a capitalized word



          (?<![.]s) ensures that we don't pick anything that follows a full-stop and a space



          (?!^[A-Z]w+) leaves out capitalized words at the beginning of the string (in a new line for example)




          Warning
          A generalized pattern may not be ideal if you require 100% precision in your results. This pattern could potentially understate the number of names if your sentence begins with a name.



          Test it out here






          share|improve this answer



























          • Thanks for the reply! What does 'b' do?

            – investigate311
            Mar 28 at 7:03











          • @investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.

            – kerwei
            Mar 28 at 7:04












          • I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?

            – investigate311
            Mar 28 at 8:19











          • @investigate311 I've updated the pattern :)

            – kerwei
            Mar 28 at 8:59






          • 1





            Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!

            – investigate311
            Mar 28 at 11:17













          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );














          draft saved

          draft discarded
















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55391151%2fhow-to-match-a-pattern-that-repeats-n-times-in-a-string-using-findall%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0
















          Your pattern starts with Players... hence it will only match once, because your string has only one Players in it.



          This would be the correct pattern which also handles the and part:



          • a player name is any character that is non-white space or comma,

          • it must be followed either by a comma or white space + and

          import re

          x = re.findall(r'([^s,]+)(?:,|s+and)',
          'Players Jeremiah, Lou, Martha and Kay,')

          print(x)


          Test run:



          $ python3 dummy.py
          ['Jeremiah', 'Lou', 'Martha', 'Kay']



          Update to match further sample sentences given by OP a single regex is no longer enough:



          • match a mandatory prefix Players and extract the rest

          • match an optional postfix, e.g. are ..., and strip that

          • detect player names in the remaining sentence

            • starts at a word boundary

            • starts with an uppercase letter

            • followed by one-or-more lowercase letters

            • ends at a word boundary


          import re;

          for input in (
          'Jeremiah, Lou, Martha and Kay,',
          'Players Jeremiah, Lou, Martha and Kay,',
          'The Players are Martha, Joe, Toby and Kay.',
          'The Players Martha, Joe and Toby are German.',
          'The Players Martha, Joe and Toby are German,',
          ):

          # strip mandatory prefix
          match = re.search('Players(.*)', input)
          if not match:
          continue
          #print(match[1])

          # strip optional postfix
          postfix = re.search('(.*)(?:s+(?:are)s+S+[,.])$', match[1])
          if postfix:
          match = postfix
          #print(match[1])

          result = re.findall(r'(b[A-Z][a-z]+b)', match[1])
          print(input, '->', result)


          Test run:



          $ python3 dummy.py
          Players Jeremiah, Lou, Martha and Kay, -> ['Jeremiah', 'Lou', 'Martha', 'Kay']
          The Players are Martha, Joe, Toby and Kay. -> ['Martha', 'Joe', 'Toby', 'Kay']
          The Players Martha, Joe and Toby are German. -> ['Martha', 'Joe', 'Toby']
          The Players Martha, Joe and Toby are German, -> ['Martha', 'Joe', 'Toby']





          share|improve this answer



























          • This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.

            – investigate311
            Mar 28 at 6:57












          • That goes into the area of "natural language processing". A simple regex won't help there anymore.

            – Stefan Becker
            Mar 28 at 10:21











          • I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!

            – investigate311
            Mar 28 at 11:17











          • If this helped then please remember What should I do when someone answers my question?

            – Stefan Becker
            Mar 28 at 11:54
















          0
















          Your pattern starts with Players... hence it will only match once, because your string has only one Players in it.



          This would be the correct pattern which also handles the and part:



          • a player name is any character that is non-white space or comma,

          • it must be followed either by a comma or white space + and

          import re

          x = re.findall(r'([^s,]+)(?:,|s+and)',
          'Players Jeremiah, Lou, Martha and Kay,')

          print(x)


          Test run:



          $ python3 dummy.py
          ['Jeremiah', 'Lou', 'Martha', 'Kay']



          Update to match further sample sentences given by OP a single regex is no longer enough:



          • match a mandatory prefix Players and extract the rest

          • match an optional postfix, e.g. are ..., and strip that

          • detect player names in the remaining sentence

            • starts at a word boundary

            • starts with an uppercase letter

            • followed by one-or-more lowercase letters

            • ends at a word boundary


          import re;

          for input in (
          'Jeremiah, Lou, Martha and Kay,',
          'Players Jeremiah, Lou, Martha and Kay,',
          'The Players are Martha, Joe, Toby and Kay.',
          'The Players Martha, Joe and Toby are German.',
          'The Players Martha, Joe and Toby are German,',
          ):

          # strip mandatory prefix
          match = re.search('Players(.*)', input)
          if not match:
          continue
          #print(match[1])

          # strip optional postfix
          postfix = re.search('(.*)(?:s+(?:are)s+S+[,.])$', match[1])
          if postfix:
          match = postfix
          #print(match[1])

          result = re.findall(r'(b[A-Z][a-z]+b)', match[1])
          print(input, '->', result)


          Test run:



          $ python3 dummy.py
          Players Jeremiah, Lou, Martha and Kay, -> ['Jeremiah', 'Lou', 'Martha', 'Kay']
          The Players are Martha, Joe, Toby and Kay. -> ['Martha', 'Joe', 'Toby', 'Kay']
          The Players Martha, Joe and Toby are German. -> ['Martha', 'Joe', 'Toby']
          The Players Martha, Joe and Toby are German, -> ['Martha', 'Joe', 'Toby']





          share|improve this answer



























          • This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.

            – investigate311
            Mar 28 at 6:57












          • That goes into the area of "natural language processing". A simple regex won't help there anymore.

            – Stefan Becker
            Mar 28 at 10:21











          • I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!

            – investigate311
            Mar 28 at 11:17











          • If this helped then please remember What should I do when someone answers my question?

            – Stefan Becker
            Mar 28 at 11:54














          0














          0










          0









          Your pattern starts with Players... hence it will only match once, because your string has only one Players in it.



          This would be the correct pattern which also handles the and part:



          • a player name is any character that is non-white space or comma,

          • it must be followed either by a comma or white space + and

          import re

          x = re.findall(r'([^s,]+)(?:,|s+and)',
          'Players Jeremiah, Lou, Martha and Kay,')

          print(x)


          Test run:



          $ python3 dummy.py
          ['Jeremiah', 'Lou', 'Martha', 'Kay']



          Update to match further sample sentences given by OP a single regex is no longer enough:



          • match a mandatory prefix Players and extract the rest

          • match an optional postfix, e.g. are ..., and strip that

          • detect player names in the remaining sentence

            • starts at a word boundary

            • starts with an uppercase letter

            • followed by one-or-more lowercase letters

            • ends at a word boundary


          import re;

          for input in (
          'Jeremiah, Lou, Martha and Kay,',
          'Players Jeremiah, Lou, Martha and Kay,',
          'The Players are Martha, Joe, Toby and Kay.',
          'The Players Martha, Joe and Toby are German.',
          'The Players Martha, Joe and Toby are German,',
          ):

          # strip mandatory prefix
          match = re.search('Players(.*)', input)
          if not match:
          continue
          #print(match[1])

          # strip optional postfix
          postfix = re.search('(.*)(?:s+(?:are)s+S+[,.])$', match[1])
          if postfix:
          match = postfix
          #print(match[1])

          result = re.findall(r'(b[A-Z][a-z]+b)', match[1])
          print(input, '->', result)


          Test run:



          $ python3 dummy.py
          Players Jeremiah, Lou, Martha and Kay, -> ['Jeremiah', 'Lou', 'Martha', 'Kay']
          The Players are Martha, Joe, Toby and Kay. -> ['Martha', 'Joe', 'Toby', 'Kay']
          The Players Martha, Joe and Toby are German. -> ['Martha', 'Joe', 'Toby']
          The Players Martha, Joe and Toby are German, -> ['Martha', 'Joe', 'Toby']





          share|improve this answer















          Your pattern starts with Players... hence it will only match once, because your string has only one Players in it.



          This would be the correct pattern which also handles the and part:



          • a player name is any character that is non-white space or comma,

          • it must be followed either by a comma or white space + and

          import re

          x = re.findall(r'([^s,]+)(?:,|s+and)',
          'Players Jeremiah, Lou, Martha and Kay,')

          print(x)


          Test run:



          $ python3 dummy.py
          ['Jeremiah', 'Lou', 'Martha', 'Kay']



          Update to match further sample sentences given by OP a single regex is no longer enough:



          • match a mandatory prefix Players and extract the rest

          • match an optional postfix, e.g. are ..., and strip that

          • detect player names in the remaining sentence

            • starts at a word boundary

            • starts with an uppercase letter

            • followed by one-or-more lowercase letters

            • ends at a word boundary


          import re;

          for input in (
          'Jeremiah, Lou, Martha and Kay,',
          'Players Jeremiah, Lou, Martha and Kay,',
          'The Players are Martha, Joe, Toby and Kay.',
          'The Players Martha, Joe and Toby are German.',
          'The Players Martha, Joe and Toby are German,',
          ):

          # strip mandatory prefix
          match = re.search('Players(.*)', input)
          if not match:
          continue
          #print(match[1])

          # strip optional postfix
          postfix = re.search('(.*)(?:s+(?:are)s+S+[,.])$', match[1])
          if postfix:
          match = postfix
          #print(match[1])

          result = re.findall(r'(b[A-Z][a-z]+b)', match[1])
          print(input, '->', result)


          Test run:



          $ python3 dummy.py
          Players Jeremiah, Lou, Martha and Kay, -> ['Jeremiah', 'Lou', 'Martha', 'Kay']
          The Players are Martha, Joe, Toby and Kay. -> ['Martha', 'Joe', 'Toby', 'Kay']
          The Players Martha, Joe and Toby are German. -> ['Martha', 'Joe', 'Toby']
          The Players Martha, Joe and Toby are German, -> ['Martha', 'Joe', 'Toby']






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Mar 28 at 10:28

























          answered Mar 28 at 6:24









          Stefan BeckerStefan Becker

          4,6166 gold badges11 silver badges25 bronze badges




          4,6166 gold badges11 silver badges25 bronze badges















          • This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.

            – investigate311
            Mar 28 at 6:57












          • That goes into the area of "natural language processing". A simple regex won't help there anymore.

            – Stefan Becker
            Mar 28 at 10:21











          • I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!

            – investigate311
            Mar 28 at 11:17











          • If this helped then please remember What should I do when someone answers my question?

            – Stefan Becker
            Mar 28 at 11:54


















          • This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.

            – investigate311
            Mar 28 at 6:57












          • That goes into the area of "natural language processing". A simple regex won't help there anymore.

            – Stefan Becker
            Mar 28 at 10:21











          • I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!

            – investigate311
            Mar 28 at 11:17











          • If this helped then please remember What should I do when someone answers my question?

            – Stefan Becker
            Mar 28 at 11:54

















          This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.

          – investigate311
          Mar 28 at 6:57






          This makes sense. Thanks! But the strings could be in different formats as well: -"The Players are Martha, Joe, Toby and Kay." - "The Players Martha, Joe and Toby are German." Essentially, what condition would you use to filter them out? I think I could figure out the syntax myself.

          – investigate311
          Mar 28 at 6:57














          That goes into the area of "natural language processing". A simple regex won't help there anymore.

          – Stefan Becker
          Mar 28 at 10:21





          That goes into the area of "natural language processing". A simple regex won't help there anymore.

          – Stefan Becker
          Mar 28 at 10:21













          I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!

          – investigate311
          Mar 28 at 11:17





          I ended up implementing a far less efficient solution but this was insightful! I suppose this too rests on the assumption that the postfix will contain an "are" at the very least. My particular problem has the guarantee of not having any words where the first letters are capitalised besides the names, the "Players" prefix and a handful of special words. I used this to narrow down my results. Thanks a lot, Stefan!

          – investigate311
          Mar 28 at 11:17













          If this helped then please remember What should I do when someone answers my question?

          – Stefan Becker
          Mar 28 at 11:54






          If this helped then please remember What should I do when someone answers my question?

          – Stefan Becker
          Mar 28 at 11:54














          0
















          I guess you're basically looking for a pattern for proper nouns. In the pattern you're using, it only captures "Jeremiah" because your pattern 'Players(?:s([A-Z]+[a-z]+)),*' specifically looks for a proper noun after the word "Players".



          Please try this pattern instead:
          (?!^[A-Z]w+)(?<![.]s)([A-Z]w+)



          It looks for proper nouns but exclude capitalized words at the beginning of the sentence.




          ([A-Z]w+) matches a capitalized word



          (?<![.]s) ensures that we don't pick anything that follows a full-stop and a space



          (?!^[A-Z]w+) leaves out capitalized words at the beginning of the string (in a new line for example)




          Warning
          A generalized pattern may not be ideal if you require 100% precision in your results. This pattern could potentially understate the number of names if your sentence begins with a name.



          Test it out here






          share|improve this answer



























          • Thanks for the reply! What does 'b' do?

            – investigate311
            Mar 28 at 7:03











          • @investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.

            – kerwei
            Mar 28 at 7:04












          • I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?

            – investigate311
            Mar 28 at 8:19











          • @investigate311 I've updated the pattern :)

            – kerwei
            Mar 28 at 8:59






          • 1





            Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!

            – investigate311
            Mar 28 at 11:17















          0
















          I guess you're basically looking for a pattern for proper nouns. In the pattern you're using, it only captures "Jeremiah" because your pattern 'Players(?:s([A-Z]+[a-z]+)),*' specifically looks for a proper noun after the word "Players".



          Please try this pattern instead:
          (?!^[A-Z]w+)(?<![.]s)([A-Z]w+)



          It looks for proper nouns but exclude capitalized words at the beginning of the sentence.




          ([A-Z]w+) matches a capitalized word



          (?<![.]s) ensures that we don't pick anything that follows a full-stop and a space



          (?!^[A-Z]w+) leaves out capitalized words at the beginning of the string (in a new line for example)




          Warning
          A generalized pattern may not be ideal if you require 100% precision in your results. This pattern could potentially understate the number of names if your sentence begins with a name.



          Test it out here






          share|improve this answer



























          • Thanks for the reply! What does 'b' do?

            – investigate311
            Mar 28 at 7:03











          • @investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.

            – kerwei
            Mar 28 at 7:04












          • I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?

            – investigate311
            Mar 28 at 8:19











          • @investigate311 I've updated the pattern :)

            – kerwei
            Mar 28 at 8:59






          • 1





            Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!

            – investigate311
            Mar 28 at 11:17













          0














          0










          0









          I guess you're basically looking for a pattern for proper nouns. In the pattern you're using, it only captures "Jeremiah" because your pattern 'Players(?:s([A-Z]+[a-z]+)),*' specifically looks for a proper noun after the word "Players".



          Please try this pattern instead:
          (?!^[A-Z]w+)(?<![.]s)([A-Z]w+)



          It looks for proper nouns but exclude capitalized words at the beginning of the sentence.




          ([A-Z]w+) matches a capitalized word



          (?<![.]s) ensures that we don't pick anything that follows a full-stop and a space



          (?!^[A-Z]w+) leaves out capitalized words at the beginning of the string (in a new line for example)




          Warning
          A generalized pattern may not be ideal if you require 100% precision in your results. This pattern could potentially understate the number of names if your sentence begins with a name.



          Test it out here






          share|improve this answer















          I guess you're basically looking for a pattern for proper nouns. In the pattern you're using, it only captures "Jeremiah" because your pattern 'Players(?:s([A-Z]+[a-z]+)),*' specifically looks for a proper noun after the word "Players".



          Please try this pattern instead:
          (?!^[A-Z]w+)(?<![.]s)([A-Z]w+)



          It looks for proper nouns but exclude capitalized words at the beginning of the sentence.




          ([A-Z]w+) matches a capitalized word



          (?<![.]s) ensures that we don't pick anything that follows a full-stop and a space



          (?!^[A-Z]w+) leaves out capitalized words at the beginning of the string (in a new line for example)




          Warning
          A generalized pattern may not be ideal if you require 100% precision in your results. This pattern could potentially understate the number of names if your sentence begins with a name.



          Test it out here







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Mar 28 at 9:20

























          answered Mar 28 at 6:25









          kerweikerwei

          1,5391 gold badge9 silver badges20 bronze badges




          1,5391 gold badge9 silver badges20 bronze badges















          • Thanks for the reply! What does 'b' do?

            – investigate311
            Mar 28 at 7:03











          • @investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.

            – kerwei
            Mar 28 at 7:04












          • I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?

            – investigate311
            Mar 28 at 8:19











          • @investigate311 I've updated the pattern :)

            – kerwei
            Mar 28 at 8:59






          • 1





            Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!

            – investigate311
            Mar 28 at 11:17

















          • Thanks for the reply! What does 'b' do?

            – investigate311
            Mar 28 at 7:03











          • @investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.

            – kerwei
            Mar 28 at 7:04












          • I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?

            – investigate311
            Mar 28 at 8:19











          • @investigate311 I've updated the pattern :)

            – kerwei
            Mar 28 at 8:59






          • 1





            Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!

            – investigate311
            Mar 28 at 11:17
















          Thanks for the reply! What does 'b' do?

          – investigate311
          Mar 28 at 7:03





          Thanks for the reply! What does 'b' do?

          – investigate311
          Mar 28 at 7:03













          @investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.

          – kerwei
          Mar 28 at 7:04






          @investigate311 It's called a word boundary token. I hope you don't mind but here's the exact definition provided by regex101.com Matches, without consuming any characters, immediately between a character matched by w and a character not matched by w (in either order). It cannot be used to separate non words from words.

          – kerwei
          Mar 28 at 7:04














          I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?

          – investigate311
          Mar 28 at 8:19





          I guess essentially I want something that captures all the words where the first letter is capital. But what about the words that are capital because of they're at the beginning of the sentence? How do I filter those out specifically?

          – investigate311
          Mar 28 at 8:19













          @investigate311 I've updated the pattern :)

          – kerwei
          Mar 28 at 8:59





          @investigate311 I've updated the pattern :)

          – kerwei
          Mar 28 at 8:59




          1




          1





          Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!

          – investigate311
          Mar 28 at 11:17





          Thank you! I tinkered with the code here and used a variation of it for my solution! Thanks for all the help!

          – investigate311
          Mar 28 at 11:17


















          draft saved

          draft discarded















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55391151%2fhow-to-match-a-pattern-that-repeats-n-times-in-a-string-using-findall%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

          Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

          Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript