Use Regex with Python to get an specifc part of the iframe srcRegEx match open tags except XHTML self-contained tagsHow to get the current time in PythonGetting the last element of a listHow do I get the number of elements in a list?php preg_replace help iframe srcHow to embed pixelbark.com videosJquery ui tabs load iframe only when clickedCannot display HTML stringHow Can I load a Flashplayer in android?what is the simplest way to customize google map styles to embed in webpages using iframesHow can I download a video. This page must be accessed within an iFrame

Can anyone give a concrete example to illustrate what is an uniform prior?

Assuring luggage isn't lost with short layover

Is it legal for private citizens to "impound" e-scooters?

Could the rotation of a black hole cause other planets to rotate?

Why can't my huge trees be chopped down?

Why was Sauron preparing for war instead of trying to find the ring?

Isolated audio without a transformer

Suggestions for protecting jeans from saddle clamp bolt

Are there any examples of technologies have been lost over time?

Decreasing star count

Is it legal to use cash pulled from a credit card to pay the monthly payment on that credit card?

Why force the nose of 737 Max down in the first place?

To find islands of 1 and 0 in matrix

Pointwise convergence of uniformly continuous functions to zero, but not uniformly

Why does Canada require mandatory bilingualism in all government posts?

Defining a Function programmatically

Why do planes need a roll motion?

Melee or Ranged attacks by Monsters, no distinction in modifiers?

Can a table be formatted so that math mode is in some columns and text is in others by default?

Writing a clean implementation of rock–paper–scissors game in C++

Trapped in an ocean Temple in Minecraft?

Commercial jet accompanied by small plane near Seattle

How to judge a Ph.D. applicant that arrives "out of thin air"

Does the Intel 8086 CPU have user mode and kernel mode?



Use Regex with Python to get an specifc part of the iframe src


RegEx match open tags except XHTML self-contained tagsHow to get the current time in PythonGetting the last element of a listHow do I get the number of elements in a list?php preg_replace help iframe srcHow to embed pixelbark.com videosJquery ui tabs load iframe only when clickedCannot display HTML stringHow Can I load a Flashplayer in android?what is the simplest way to customize google map styles to embed in webpages using iframesHow can I download a video. This page must be accessed within an iFrame






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















I try to capture an iframe src content that I want to change. I don't have direct access to the HTML, I get it HTML from an API.



You can see some iframe example below:



<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>


I have many other type of iframe examples, the only part they have in common is this part of src content https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302



I create the following code to find an element:



// some code
regex_page_embed = r"http.?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/*"
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1)
print(s1.group())


After that I create more code that I can use and effectively change the HTML using the API, I don't think is necessary to put it here.
But when I use:



print(s1)
print(s1.group())


I got the following result:



<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(126, 211), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(227, 312), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/


I want to get the last part of the iframe src content. In the example below



<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">


The f2c5f6ca3a4610c55d70cb211ef9d977 is the part that I want.



print(s1) and print(s1.group()) don't show the last part of the src content, how can I get the last part of the iframe src content?










share|improve this question



















  • 1





    In the regex, change the star at the end to (.*?)(?=").

    – Quixrick
    Mar 26 at 18:49











  • Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344

    – Roca
    Mar 26 at 19:34











  • I'd just use bs4 to parse the iframe and then extract the src text content and go from there...

    – Roca
    Mar 26 at 19:36











  • I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.

    – fabiobh
    Mar 26 at 19:41











  • Ah right, since it's dynamic contents you should be using a different module like selenium or requests-html. I'm actually surprised you are able to get the iframe in the bs4 extracted content at all.

    – Roca
    Mar 26 at 19:46

















1















I try to capture an iframe src content that I want to change. I don't have direct access to the HTML, I get it HTML from an API.



You can see some iframe example below:



<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>


I have many other type of iframe examples, the only part they have in common is this part of src content https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302



I create the following code to find an element:



// some code
regex_page_embed = r"http.?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/*"
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1)
print(s1.group())


After that I create more code that I can use and effectively change the HTML using the API, I don't think is necessary to put it here.
But when I use:



print(s1)
print(s1.group())


I got the following result:



<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(126, 211), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(227, 312), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/


I want to get the last part of the iframe src content. In the example below



<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">


The f2c5f6ca3a4610c55d70cb211ef9d977 is the part that I want.



print(s1) and print(s1.group()) don't show the last part of the src content, how can I get the last part of the iframe src content?










share|improve this question



















  • 1





    In the regex, change the star at the end to (.*?)(?=").

    – Quixrick
    Mar 26 at 18:49











  • Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344

    – Roca
    Mar 26 at 19:34











  • I'd just use bs4 to parse the iframe and then extract the src text content and go from there...

    – Roca
    Mar 26 at 19:36











  • I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.

    – fabiobh
    Mar 26 at 19:41











  • Ah right, since it's dynamic contents you should be using a different module like selenium or requests-html. I'm actually surprised you are able to get the iframe in the bs4 extracted content at all.

    – Roca
    Mar 26 at 19:46













1












1








1








I try to capture an iframe src content that I want to change. I don't have direct access to the HTML, I get it HTML from an API.



You can see some iframe example below:



<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>


I have many other type of iframe examples, the only part they have in common is this part of src content https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302



I create the following code to find an element:



// some code
regex_page_embed = r"http.?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/*"
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1)
print(s1.group())


After that I create more code that I can use and effectively change the HTML using the API, I don't think is necessary to put it here.
But when I use:



print(s1)
print(s1.group())


I got the following result:



<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(126, 211), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(227, 312), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/


I want to get the last part of the iframe src content. In the example below



<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">


The f2c5f6ca3a4610c55d70cb211ef9d977 is the part that I want.



print(s1) and print(s1.group()) don't show the last part of the src content, how can I get the last part of the iframe src content?










share|improve this question
















I try to capture an iframe src content that I want to change. I don't have direct access to the HTML, I get it HTML from an API.



You can see some iframe example below:



<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>


I have many other type of iframe examples, the only part they have in common is this part of src content https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302



I create the following code to find an element:



// some code
regex_page_embed = r"http.?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/*"
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1)
print(s1.group())


After that I create more code that I can use and effectively change the HTML using the API, I don't think is necessary to put it here.
But when I use:



print(s1)
print(s1.group())


I got the following result:



<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(126, 211), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(227, 312), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/


I want to get the last part of the iframe src content. In the example below



<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">


The f2c5f6ca3a4610c55d70cb211ef9d977 is the part that I want.



print(s1) and print(s1.group()) don't show the last part of the src content, how can I get the last part of the iframe src content?







python regex iframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 26 at 19:35







fabiobh

















asked Mar 26 at 18:45









fabiobhfabiobh

1611 gold badge2 silver badges13 bronze badges




1611 gold badge2 silver badges13 bronze badges







  • 1





    In the regex, change the star at the end to (.*?)(?=").

    – Quixrick
    Mar 26 at 18:49











  • Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344

    – Roca
    Mar 26 at 19:34











  • I'd just use bs4 to parse the iframe and then extract the src text content and go from there...

    – Roca
    Mar 26 at 19:36











  • I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.

    – fabiobh
    Mar 26 at 19:41











  • Ah right, since it's dynamic contents you should be using a different module like selenium or requests-html. I'm actually surprised you are able to get the iframe in the bs4 extracted content at all.

    – Roca
    Mar 26 at 19:46












  • 1





    In the regex, change the star at the end to (.*?)(?=").

    – Quixrick
    Mar 26 at 18:49











  • Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344

    – Roca
    Mar 26 at 19:34











  • I'd just use bs4 to parse the iframe and then extract the src text content and go from there...

    – Roca
    Mar 26 at 19:36











  • I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.

    – fabiobh
    Mar 26 at 19:41











  • Ah right, since it's dynamic contents you should be using a different module like selenium or requests-html. I'm actually surprised you are able to get the iframe in the bs4 extracted content at all.

    – Roca
    Mar 26 at 19:46







1




1





In the regex, change the star at the end to (.*?)(?=").

– Quixrick
Mar 26 at 18:49





In the regex, change the star at the end to (.*?)(?=").

– Quixrick
Mar 26 at 18:49













Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344

– Roca
Mar 26 at 19:34





Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344

– Roca
Mar 26 at 19:34













I'd just use bs4 to parse the iframe and then extract the src text content and go from there...

– Roca
Mar 26 at 19:36





I'd just use bs4 to parse the iframe and then extract the src text content and go from there...

– Roca
Mar 26 at 19:36













I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.

– fabiobh
Mar 26 at 19:41





I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.

– fabiobh
Mar 26 at 19:41













Ah right, since it's dynamic contents you should be using a different module like selenium or requests-html. I'm actually surprised you are able to get the iframe in the bs4 extracted content at all.

– Roca
Mar 26 at 19:46





Ah right, since it's dynamic contents you should be using a different module like selenium or requests-html. I'm actually surprised you are able to get the iframe in the bs4 extracted content at all.

– Roca
Mar 26 at 19:46












2 Answers
2






active

oldest

votes


















1














A better regex for capturing the whole url while having any optional content between <iframe tag and src tag is this,



<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)


Match using this regex and capture your url from group1.



Online Demo



Here is your updated Python code,



regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1.group(1)) # extract url using first group





share|improve this answer






























    1














    Use r'<iframe src="[^"]*/([^"]+)"' as the pattern for your search.



    Example:



    >>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
    >>> pat = r'<iframe src="[^"]*/([^"]+)"'
    >>> search = re.search(pat, text)
    >>> search[1]
    'f2c5f6ca3a4610c55d70cb211ef9d977'
    >>>





    share|improve this answer























    • I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

      – fabiobh
      Mar 26 at 19:27














    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55364264%2fuse-regex-with-python-to-get-an-specifc-part-of-the-iframe-src%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    A better regex for capturing the whole url while having any optional content between <iframe tag and src tag is this,



    <iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)


    Match using this regex and capture your url from group1.



    Online Demo



    Here is your updated Python code,



    regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
    soup = BeautifulSoup(page_html, 'html.parser')
    page_elements = list(soup.children)
    for element in page_elements:
    try:
    s1 = re.search(regex_page_embed, str(element))
    if s1:
    print(s1.group(1)) # extract url using first group





    share|improve this answer



























      1














      A better regex for capturing the whole url while having any optional content between <iframe tag and src tag is this,



      <iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)


      Match using this regex and capture your url from group1.



      Online Demo



      Here is your updated Python code,



      regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
      soup = BeautifulSoup(page_html, 'html.parser')
      page_elements = list(soup.children)
      for element in page_elements:
      try:
      s1 = re.search(regex_page_embed, str(element))
      if s1:
      print(s1.group(1)) # extract url using first group





      share|improve this answer

























        1












        1








        1







        A better regex for capturing the whole url while having any optional content between <iframe tag and src tag is this,



        <iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)


        Match using this regex and capture your url from group1.



        Online Demo



        Here is your updated Python code,



        regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
        soup = BeautifulSoup(page_html, 'html.parser')
        page_elements = list(soup.children)
        for element in page_elements:
        try:
        s1 = re.search(regex_page_embed, str(element))
        if s1:
        print(s1.group(1)) # extract url using first group





        share|improve this answer













        A better regex for capturing the whole url while having any optional content between <iframe tag and src tag is this,



        <iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)


        Match using this regex and capture your url from group1.



        Online Demo



        Here is your updated Python code,



        regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
        soup = BeautifulSoup(page_html, 'html.parser')
        page_elements = list(soup.children)
        for element in page_elements:
        try:
        s1 = re.search(regex_page_embed, str(element))
        if s1:
        print(s1.group(1)) # extract url using first group






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 26 at 19:47









        Pushpesh Kumar RajwanshiPushpesh Kumar Rajwanshi

        17.1k2 gold badges13 silver badges33 bronze badges




        17.1k2 gold badges13 silver badges33 bronze badges























            1














            Use r'<iframe src="[^"]*/([^"]+)"' as the pattern for your search.



            Example:



            >>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
            >>> pat = r'<iframe src="[^"]*/([^"]+)"'
            >>> search = re.search(pat, text)
            >>> search[1]
            'f2c5f6ca3a4610c55d70cb211ef9d977'
            >>>





            share|improve this answer























            • I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

              – fabiobh
              Mar 26 at 19:27
















            1














            Use r'<iframe src="[^"]*/([^"]+)"' as the pattern for your search.



            Example:



            >>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
            >>> pat = r'<iframe src="[^"]*/([^"]+)"'
            >>> search = re.search(pat, text)
            >>> search[1]
            'f2c5f6ca3a4610c55d70cb211ef9d977'
            >>>





            share|improve this answer























            • I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

              – fabiobh
              Mar 26 at 19:27














            1












            1








            1







            Use r'<iframe src="[^"]*/([^"]+)"' as the pattern for your search.



            Example:



            >>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
            >>> pat = r'<iframe src="[^"]*/([^"]+)"'
            >>> search = re.search(pat, text)
            >>> search[1]
            'f2c5f6ca3a4610c55d70cb211ef9d977'
            >>>





            share|improve this answer













            Use r'<iframe src="[^"]*/([^"]+)"' as the pattern for your search.



            Example:



            >>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
            >>> pat = r'<iframe src="[^"]*/([^"]+)"'
            >>> search = re.search(pat, text)
            >>> search[1]
            'f2c5f6ca3a4610c55d70cb211ef9d977'
            >>>






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Mar 26 at 19:02









            Russ BrownRuss Brown

            1516 bronze badges




            1516 bronze badges












            • I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

              – fabiobh
              Mar 26 at 19:27


















            • I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

              – fabiobh
              Mar 26 at 19:27

















            I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

            – fabiobh
            Mar 26 at 19:27






            I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

            – fabiobh
            Mar 26 at 19:27


















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55364264%2fuse-regex-with-python-to-get-an-specifc-part-of-the-iframe-src%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

            SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

            은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현