Use Regex with Python to get an specifc part of the iframe srcRegEx match open tags except XHTML self-contained tagsHow to get the current time in PythonGetting the last element of a listHow do I get the number of elements in a list?php preg_replace help iframe srcHow to embed pixelbark.com videosJquery ui tabs load iframe only when clickedCannot display HTML stringHow Can I load a Flashplayer in android?what is the simplest way to customize google map styles to embed in webpages using iframesHow can I download a video. This page must be accessed within an iFrame

Can anyone give a concrete example to illustrate what is an uniform prior?

Assuring luggage isn't lost with short layover

Is it legal for private citizens to "impound" e-scooters?

Could the rotation of a black hole cause other planets to rotate?

Why can't my huge trees be chopped down?

Why was Sauron preparing for war instead of trying to find the ring?

Isolated audio without a transformer

Suggestions for protecting jeans from saddle clamp bolt

Are there any examples of technologies have been lost over time?

Decreasing star count

Is it legal to use cash pulled from a credit card to pay the monthly payment on that credit card?

Why force the nose of 737 Max down in the first place?

To find islands of 1 and 0 in matrix

Pointwise convergence of uniformly continuous functions to zero, but not uniformly

Why does Canada require mandatory bilingualism in all government posts?

Defining a Function programmatically

Why do planes need a roll motion?

Melee or Ranged attacks by Monsters, no distinction in modifiers?

Can a table be formatted so that math mode is in some columns and text is in others by default?

Writing a clean implementation of rock–paper–scissors game in C++

Trapped in an ocean Temple in Minecraft?

Commercial jet accompanied by small plane near Seattle

How to judge a Ph.D. applicant that arrives "out of thin air"

Does the Intel 8086 CPU have user mode and kernel mode?



Use Regex with Python to get an specifc part of the iframe src


RegEx match open tags except XHTML self-contained tagsHow to get the current time in PythonGetting the last element of a listHow do I get the number of elements in a list?php preg_replace help iframe srcHow to embed pixelbark.com videosJquery ui tabs load iframe only when clickedCannot display HTML stringHow Can I load a Flashplayer in android?what is the simplest way to customize google map styles to embed in webpages using iframesHow can I download a video. This page must be accessed within an iFrame






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















I try to capture an iframe src content that I want to change. I don't have direct access to the HTML, I get it HTML from an API.



You can see some iframe example below:



<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>


I have many other type of iframe examples, the only part they have in common is this part of src content https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302



I create the following code to find an element:



// some code
regex_page_embed = r"http.?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/*"
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1)
print(s1.group())


After that I create more code that I can use and effectively change the HTML using the API, I don't think is necessary to put it here.
But when I use:



print(s1)
print(s1.group())


I got the following result:



<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(126, 211), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(227, 312), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/


I want to get the last part of the iframe src content. In the example below



<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">


The f2c5f6ca3a4610c55d70cb211ef9d977 is the part that I want.



print(s1) and print(s1.group()) don't show the last part of the src content, how can I get the last part of the iframe src content?










share|improve this question



















  • 1





    In the regex, change the star at the end to (.*?)(?=").

    – Quixrick
    Mar 26 at 18:49











  • Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344

    – Roca
    Mar 26 at 19:34











  • I'd just use bs4 to parse the iframe and then extract the src text content and go from there...

    – Roca
    Mar 26 at 19:36











  • I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.

    – fabiobh
    Mar 26 at 19:41











  • Ah right, since it's dynamic contents you should be using a different module like selenium or requests-html. I'm actually surprised you are able to get the iframe in the bs4 extracted content at all.

    – Roca
    Mar 26 at 19:46

















1















I try to capture an iframe src content that I want to change. I don't have direct access to the HTML, I get it HTML from an API.



You can see some iframe example below:



<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>


I have many other type of iframe examples, the only part they have in common is this part of src content https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302



I create the following code to find an element:



// some code
regex_page_embed = r"http.?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/*"
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1)
print(s1.group())


After that I create more code that I can use and effectively change the HTML using the API, I don't think is necessary to put it here.
But when I use:



print(s1)
print(s1.group())


I got the following result:



<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(126, 211), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(227, 312), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/


I want to get the last part of the iframe src content. In the example below



<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">


The f2c5f6ca3a4610c55d70cb211ef9d977 is the part that I want.



print(s1) and print(s1.group()) don't show the last part of the src content, how can I get the last part of the iframe src content?










share|improve this question



















  • 1





    In the regex, change the star at the end to (.*?)(?=").

    – Quixrick
    Mar 26 at 18:49











  • Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344

    – Roca
    Mar 26 at 19:34











  • I'd just use bs4 to parse the iframe and then extract the src text content and go from there...

    – Roca
    Mar 26 at 19:36











  • I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.

    – fabiobh
    Mar 26 at 19:41











  • Ah right, since it's dynamic contents you should be using a different module like selenium or requests-html. I'm actually surprised you are able to get the iframe in the bs4 extracted content at all.

    – Roca
    Mar 26 at 19:46













1












1








1








I try to capture an iframe src content that I want to change. I don't have direct access to the HTML, I get it HTML from an API.



You can see some iframe example below:



<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>


I have many other type of iframe examples, the only part they have in common is this part of src content https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302



I create the following code to find an element:



// some code
regex_page_embed = r"http.?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/*"
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1)
print(s1.group())


After that I create more code that I can use and effectively change the HTML using the API, I don't think is necessary to put it here.
But when I use:



print(s1)
print(s1.group())


I got the following result:



<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(126, 211), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(227, 312), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/


I want to get the last part of the iframe src content. In the example below



<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">


The f2c5f6ca3a4610c55d70cb211ef9d977 is the part that I want.



print(s1) and print(s1.group()) don't show the last part of the src content, how can I get the last part of the iframe src content?










share|improve this question
















I try to capture an iframe src content that I want to change. I don't have direct access to the HTML, I get it HTML from an API.



You can see some iframe example below:



<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>


I have many other type of iframe examples, the only part they have in common is this part of src content https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302



I create the following code to find an element:



// some code
regex_page_embed = r"http.?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/*"
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1)
print(s1.group())


After that I create more code that I can use and effectively change the HTML using the API, I don't think is necessary to put it here.
But when I use:



print(s1)
print(s1.group())


I got the following result:



<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(126, 211), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(227, 312), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/


I want to get the last part of the iframe src content. In the example below



<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">


The f2c5f6ca3a4610c55d70cb211ef9d977 is the part that I want.



print(s1) and print(s1.group()) don't show the last part of the src content, how can I get the last part of the iframe src content?







python regex iframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 26 at 19:35







fabiobh

















asked Mar 26 at 18:45









fabiobhfabiobh

1611 gold badge2 silver badges13 bronze badges




1611 gold badge2 silver badges13 bronze badges







  • 1





    In the regex, change the star at the end to (.*?)(?=").

    – Quixrick
    Mar 26 at 18:49











  • Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344

    – Roca
    Mar 26 at 19:34











  • I'd just use bs4 to parse the iframe and then extract the src text content and go from there...

    – Roca
    Mar 26 at 19:36











  • I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.

    – fabiobh
    Mar 26 at 19:41











  • Ah right, since it's dynamic contents you should be using a different module like selenium or requests-html. I'm actually surprised you are able to get the iframe in the bs4 extracted content at all.

    – Roca
    Mar 26 at 19:46












  • 1





    In the regex, change the star at the end to (.*?)(?=").

    – Quixrick
    Mar 26 at 18:49











  • Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344

    – Roca
    Mar 26 at 19:34











  • I'd just use bs4 to parse the iframe and then extract the src text content and go from there...

    – Roca
    Mar 26 at 19:36











  • I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.

    – fabiobh
    Mar 26 at 19:41











  • Ah right, since it's dynamic contents you should be using a different module like selenium or requests-html. I'm actually surprised you are able to get the iframe in the bs4 extracted content at all.

    – Roca
    Mar 26 at 19:46







1




1





In the regex, change the star at the end to (.*?)(?=").

– Quixrick
Mar 26 at 18:49





In the regex, change the star at the end to (.*?)(?=").

– Quixrick
Mar 26 at 18:49













Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344

– Roca
Mar 26 at 19:34





Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344

– Roca
Mar 26 at 19:34













I'd just use bs4 to parse the iframe and then extract the src text content and go from there...

– Roca
Mar 26 at 19:36





I'd just use bs4 to parse the iframe and then extract the src text content and go from there...

– Roca
Mar 26 at 19:36













I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.

– fabiobh
Mar 26 at 19:41





I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.

– fabiobh
Mar 26 at 19:41













Ah right, since it's dynamic contents you should be using a different module like selenium or requests-html. I'm actually surprised you are able to get the iframe in the bs4 extracted content at all.

– Roca
Mar 26 at 19:46





Ah right, since it's dynamic contents you should be using a different module like selenium or requests-html. I'm actually surprised you are able to get the iframe in the bs4 extracted content at all.

– Roca
Mar 26 at 19:46












2 Answers
2






active

oldest

votes


















1














A better regex for capturing the whole url while having any optional content between <iframe tag and src tag is this,



<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)


Match using this regex and capture your url from group1.



Online Demo



Here is your updated Python code,



regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1.group(1)) # extract url using first group





share|improve this answer






























    1














    Use r'<iframe src="[^"]*/([^"]+)"' as the pattern for your search.



    Example:



    >>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
    >>> pat = r'<iframe src="[^"]*/([^"]+)"'
    >>> search = re.search(pat, text)
    >>> search[1]
    'f2c5f6ca3a4610c55d70cb211ef9d977'
    >>>





    share|improve this answer























    • I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

      – fabiobh
      Mar 26 at 19:27














    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55364264%2fuse-regex-with-python-to-get-an-specifc-part-of-the-iframe-src%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    A better regex for capturing the whole url while having any optional content between <iframe tag and src tag is this,



    <iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)


    Match using this regex and capture your url from group1.



    Online Demo



    Here is your updated Python code,



    regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
    soup = BeautifulSoup(page_html, 'html.parser')
    page_elements = list(soup.children)
    for element in page_elements:
    try:
    s1 = re.search(regex_page_embed, str(element))
    if s1:
    print(s1.group(1)) # extract url using first group





    share|improve this answer



























      1














      A better regex for capturing the whole url while having any optional content between <iframe tag and src tag is this,



      <iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)


      Match using this regex and capture your url from group1.



      Online Demo



      Here is your updated Python code,



      regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
      soup = BeautifulSoup(page_html, 'html.parser')
      page_elements = list(soup.children)
      for element in page_elements:
      try:
      s1 = re.search(regex_page_embed, str(element))
      if s1:
      print(s1.group(1)) # extract url using first group





      share|improve this answer

























        1












        1








        1







        A better regex for capturing the whole url while having any optional content between <iframe tag and src tag is this,



        <iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)


        Match using this regex and capture your url from group1.



        Online Demo



        Here is your updated Python code,



        regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
        soup = BeautifulSoup(page_html, 'html.parser')
        page_elements = list(soup.children)
        for element in page_elements:
        try:
        s1 = re.search(regex_page_embed, str(element))
        if s1:
        print(s1.group(1)) # extract url using first group





        share|improve this answer













        A better regex for capturing the whole url while having any optional content between <iframe tag and src tag is this,



        <iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)


        Match using this regex and capture your url from group1.



        Online Demo



        Here is your updated Python code,



        regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
        soup = BeautifulSoup(page_html, 'html.parser')
        page_elements = list(soup.children)
        for element in page_elements:
        try:
        s1 = re.search(regex_page_embed, str(element))
        if s1:
        print(s1.group(1)) # extract url using first group






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 26 at 19:47









        Pushpesh Kumar RajwanshiPushpesh Kumar Rajwanshi

        17.1k2 gold badges13 silver badges33 bronze badges




        17.1k2 gold badges13 silver badges33 bronze badges























            1














            Use r'<iframe src="[^"]*/([^"]+)"' as the pattern for your search.



            Example:



            >>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
            >>> pat = r'<iframe src="[^"]*/([^"]+)"'
            >>> search = re.search(pat, text)
            >>> search[1]
            'f2c5f6ca3a4610c55d70cb211ef9d977'
            >>>





            share|improve this answer























            • I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

              – fabiobh
              Mar 26 at 19:27
















            1














            Use r'<iframe src="[^"]*/([^"]+)"' as the pattern for your search.



            Example:



            >>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
            >>> pat = r'<iframe src="[^"]*/([^"]+)"'
            >>> search = re.search(pat, text)
            >>> search[1]
            'f2c5f6ca3a4610c55d70cb211ef9d977'
            >>>





            share|improve this answer























            • I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

              – fabiobh
              Mar 26 at 19:27














            1












            1








            1







            Use r'<iframe src="[^"]*/([^"]+)"' as the pattern for your search.



            Example:



            >>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
            >>> pat = r'<iframe src="[^"]*/([^"]+)"'
            >>> search = re.search(pat, text)
            >>> search[1]
            'f2c5f6ca3a4610c55d70cb211ef9d977'
            >>>





            share|improve this answer













            Use r'<iframe src="[^"]*/([^"]+)"' as the pattern for your search.



            Example:



            >>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
            >>> pat = r'<iframe src="[^"]*/([^"]+)"'
            >>> search = re.search(pat, text)
            >>> search[1]
            'f2c5f6ca3a4610c55d70cb211ef9d977'
            >>>






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Mar 26 at 19:02









            Russ BrownRuss Brown

            1516 bronze badges




            1516 bronze badges












            • I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

              – fabiobh
              Mar 26 at 19:27


















            • I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

              – fabiobh
              Mar 26 at 19:27

















            I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

            – fabiobh
            Mar 26 at 19:27






            I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

            – fabiobh
            Mar 26 at 19:27


















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55364264%2fuse-regex-with-python-to-get-an-specifc-part-of-the-iframe-src%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

            Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

            Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript