Use Regex with Python to get an specifc part of the iframe srcRegEx match open tags except XHTML self-contained tagsHow to get the current time in PythonGetting the last element of a listHow do I get the number of elements in a list?php preg_replace help iframe srcHow to embed pixelbark.com videosJquery ui tabs load iframe only when clickedCannot display HTML stringHow Can I load a Flashplayer in android?what is the simplest way to customize google map styles to embed in webpages using iframesHow can I download a video. This page must be accessed within an iFrame

Can anyone give a concrete example to illustrate what is an uniform prior?

Assuring luggage isn't lost with short layover

Is it legal for private citizens to "impound" e-scooters?

Could the rotation of a black hole cause other planets to rotate?

Why can't my huge trees be chopped down?

Why was Sauron preparing for war instead of trying to find the ring?

Isolated audio without a transformer

Suggestions for protecting jeans from saddle clamp bolt

Are there any examples of technologies have been lost over time?

Decreasing star count

Is it legal to use cash pulled from a credit card to pay the monthly payment on that credit card?

Why force the nose of 737 Max down in the first place?

To find islands of 1 and 0 in matrix

Pointwise convergence of uniformly continuous functions to zero, but not uniformly

Why does Canada require mandatory bilingualism in all government posts?

Defining a Function programmatically

Why do planes need a roll motion?

Melee or Ranged attacks by Monsters, no distinction in modifiers?

Can a table be formatted so that math mode is in some columns and text is in others by default?

Writing a clean implementation of rock–paper–scissors game in C++

Trapped in an ocean Temple in Minecraft?

Commercial jet accompanied by small plane near Seattle

How to judge a Ph.D. applicant that arrives "out of thin air"

Does the Intel 8086 CPU have user mode and kernel mode?

Use Regex with Python to get an specifc part of the iframe src

RegEx match open tags except XHTML self-contained tagsHow to get the current time in PythonGetting the last element of a listHow do I get the number of elements in a list?php preg_replace help iframe srcHow to embed pixelbark.com videosJquery ui tabs load iframe only when clickedCannot display HTML stringHow Can I load a Flashplayer in android?what is the simplest way to customize google map styles to embed in webpages using iframesHow can I download a video. This page must be accessed within an iFrame

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I try to capture an iframe src content that I want to change. I don't have direct access to the HTML, I get it HTML from an API.

You can see some iframe example below:

<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>

I have many other type of iframe examples, the only part they have in common is this part of src content https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302

I create the following code to find an element:

// some code
regex_page_embed = r"http.?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/*"
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
 try:
 s1 = re.search(regex_page_embed, str(element))
 if s1:
 print(s1)
 print(s1.group())

After that I create more code that I can use and effectively change the HTML using the API, I don't think is necessary to put it here.
But when I use:

print(s1)
print(s1.group())

I got the following result:

<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(126, 211), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(227, 312), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/

I want to get the last part of the iframe src content. In the example below

<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">

The f2c5f6ca3a4610c55d70cb211ef9d977 is the part that I want.

print(s1) and print(s1.group()) don't show the last part of the src content, how can I get the last part of the iframe src content?

edited Mar 26 at 19:35

asked Mar 26 at 18:45

fabiobh

1611 gold badge2 silver badges13 bronze badges

1

In the regex, change the star at the end to (.*?)(?=").

– Quixrick
Mar 26 at 18:49

Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344

– Roca
Mar 26 at 19:34

I'd just use bs4 to parse the iframe and then extract the src text content and go from there...

– Roca
Mar 26 at 19:36

I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.

– fabiobh
Mar 26 at 19:41

Ah right, since it's dynamic contents you should be using a different module like selenium or requests-html. I'm actually surprised you are able to get the iframe in the bs4 extracted content at all.

– Roca
Mar 26 at 19:46

add a comment |

I try to capture an iframe src content that I want to change. I don't have direct access to the HTML, I get it HTML from an API.

You can see some iframe example below:

<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>

I have many other type of iframe examples, the only part they have in common is this part of src content https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302

I create the following code to find an element:

// some code
regex_page_embed = r"http.?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/*"
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
 try:
 s1 = re.search(regex_page_embed, str(element))
 if s1:
 print(s1)
 print(s1.group())

After that I create more code that I can use and effectively change the HTML using the API, I don't think is necessary to put it here.
But when I use:

print(s1)
print(s1.group())

I got the following result:

<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(126, 211), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(227, 312), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/

I want to get the last part of the iframe src content. In the example below

<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">

The f2c5f6ca3a4610c55d70cb211ef9d977 is the part that I want.

print(s1) and print(s1.group()) don't show the last part of the src content, how can I get the last part of the iframe src content?

edited Mar 26 at 19:35

asked Mar 26 at 18:45

fabiobh

1611 gold badge2 silver badges13 bronze badges

1

In the regex, change the star at the end to (.*?)(?=").

– Quixrick
Mar 26 at 18:49

Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344

– Roca
Mar 26 at 19:34

I'd just use bs4 to parse the iframe and then extract the src text content and go from there...

– Roca
Mar 26 at 19:36

I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.

– fabiobh
Mar 26 at 19:41

Ah right, since it's dynamic contents you should be using a different module like selenium or requests-html. I'm actually surprised you are able to get the iframe in the bs4 extracted content at all.

– Roca
Mar 26 at 19:46

add a comment |

I try to capture an iframe src content that I want to change. I don't have direct access to the HTML, I get it HTML from an API.

You can see some iframe example below:

<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>

I have many other type of iframe examples, the only part they have in common is this part of src content https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302

I create the following code to find an element:

// some code
regex_page_embed = r"http.?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/*"
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
 try:
 s1 = re.search(regex_page_embed, str(element))
 if s1:
 print(s1)
 print(s1.group())

After that I create more code that I can use and effectively change the HTML using the API, I don't think is necessary to put it here.
But when I use:

print(s1)
print(s1.group())

I got the following result:

<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(126, 211), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(227, 312), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/

I want to get the last part of the iframe src content. In the example below

<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">

The f2c5f6ca3a4610c55d70cb211ef9d977 is the part that I want.

print(s1) and print(s1.group()) don't show the last part of the src content, how can I get the last part of the iframe src content?

edited Mar 26 at 19:35

asked Mar 26 at 18:45

fabiobh

1611 gold badge2 silver badges13 bronze badges

I try to capture an iframe src content that I want to change. I don't have direct access to the HTML, I get it HTML from an API.

You can see some iframe example below:

<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>

I have many other type of iframe examples, the only part they have in common is this part of src content https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302

I create the following code to find an element:

// some code
regex_page_embed = r"http.?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/*"
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
 try:
 s1 = re.search(regex_page_embed, str(element))
 if s1:
 print(s1)
 print(s1.group())

After that I create more code that I can use and effectively change the HTML using the API, I don't think is necessary to put it here.
But when I use:

print(s1)
print(s1.group())

I got the following result:

<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(126, 211), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(227, 312), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/

I want to get the last part of the iframe src content. In the example below

<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">

The f2c5f6ca3a4610c55d70cb211ef9d977 is the part that I want.

print(s1) and print(s1.group()) don't show the last part of the src content, how can I get the last part of the iframe src content?

python regex iframe

edited Mar 26 at 19:35

asked Mar 26 at 18:45

fabiobh

1611 gold badge2 silver badges13 bronze badges

edited Mar 26 at 19:35

asked Mar 26 at 18:45

fabiobh

1611 gold badge2 silver badges13 bronze badges

edited Mar 26 at 19:35

asked Mar 26 at 18:45

fabiobh

1611 gold badge2 silver badges13 bronze badges

asked Mar 26 at 18:45

fabiobh

1611 gold badge2 silver badges13 bronze badges

asked Mar 26 at 18:45

fabiobh

1611 gold badge2 silver badges13 bronze badges

1

In the regex, change the star at the end to (.*?)(?=").

– Quixrick
Mar 26 at 18:49

Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344

– Roca
Mar 26 at 19:34

I'd just use bs4 to parse the iframe and then extract the src text content and go from there...

– Roca
Mar 26 at 19:36

I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.

– fabiobh
Mar 26 at 19:41

Ah right, since it's dynamic contents you should be using a different module like selenium or requests-html. I'm actually surprised you are able to get the iframe in the bs4 extracted content at all.

– Roca
Mar 26 at 19:46

add a comment |

1

In the regex, change the star at the end to (.*?)(?=").

– Quixrick
Mar 26 at 18:49

Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344

– Roca
Mar 26 at 19:34

I'd just use bs4 to parse the iframe and then extract the src text content and go from there...

– Roca
Mar 26 at 19:36

I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.

– fabiobh
Mar 26 at 19:41

Ah right, since it's dynamic contents you should be using a different module like selenium or requests-html. I'm actually surprised you are able to get the iframe in the bs4 extracted content at all.

– Roca
Mar 26 at 19:46

In the regex, change the star at the end to (.*?)(?=").

– Quixrick
Mar 26 at 18:49

Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344

– Roca
Mar 26 at 19:34

I'd just use bs4 to parse the iframe and then extract the src text content and go from there...

– Roca
Mar 26 at 19:36

I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.

– fabiobh
Mar 26 at 19:41

Ah right, since it's dynamic contents you should be using a different module like selenium or requests-html. I'm actually surprised you are able to get the iframe in the bs4 extracted content at all.

– Roca
Mar 26 at 19:46

add a comment |

2 Answers
2

active

oldest

votes

A better regex for capturing the whole url while having any optional content between <iframe tag and src tag is this,

<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)

Match using this regex and capture your url from group1.

Online Demo

Here is your updated Python code,

regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
 try:
 s1 = re.search(regex_page_embed, str(element))
 if s1:
 print(s1.group(1)) # extract url using first group

answered Mar 26 at 19:47

Pushpesh Kumar Rajwanshi

17.1k2 gold badges13 silver badges33 bronze badges

add a comment |

Use r'<iframe src="[^"]*/([^"]+)"' as the pattern for your search.

Example:

>>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
>>> pat = r'<iframe src="[^"]*/([^"]+)"'
>>> search = re.search(pat, text)
>>> search[1]
'f2c5f6ca3a4610c55d70cb211ef9d977'
>>>

answered Mar 26 at 19:02

Russ Brown

1516 bronze badges

I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

– fabiobh
Mar 26 at 19:27

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55364264%2fuse-regex-with-python-to-get-an-specifc-part-of-the-iframe-src%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

A better regex for capturing the whole url while having any optional content between <iframe tag and src tag is this,

<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)

Match using this regex and capture your url from group1.

Online Demo

Here is your updated Python code,

regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
 try:
 s1 = re.search(regex_page_embed, str(element))
 if s1:
 print(s1.group(1)) # extract url using first group

answered Mar 26 at 19:47

Pushpesh Kumar Rajwanshi

17.1k2 gold badges13 silver badges33 bronze badges

add a comment |

A better regex for capturing the whole url while having any optional content between <iframe tag and src tag is this,

<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)

Match using this regex and capture your url from group1.

Online Demo

Here is your updated Python code,

regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
 try:
 s1 = re.search(regex_page_embed, str(element))
 if s1:
 print(s1.group(1)) # extract url using first group

answered Mar 26 at 19:47

Pushpesh Kumar Rajwanshi

17.1k2 gold badges13 silver badges33 bronze badges

add a comment |

A better regex for capturing the whole url while having any optional content between <iframe tag and src tag is this,

<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)

Match using this regex and capture your url from group1.

Online Demo

Here is your updated Python code,

regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
 try:
 s1 = re.search(regex_page_embed, str(element))
 if s1:
 print(s1.group(1)) # extract url using first group

answered Mar 26 at 19:47

Pushpesh Kumar Rajwanshi

17.1k2 gold badges13 silver badges33 bronze badges

A better regex for capturing the whole url while having any optional content between <iframe tag and src tag is this,

<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)

Match using this regex and capture your url from group1.

Online Demo

Here is your updated Python code,

regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
 try:
 s1 = re.search(regex_page_embed, str(element))
 if s1:
 print(s1.group(1)) # extract url using first group

answered Mar 26 at 19:47

Pushpesh Kumar Rajwanshi

17.1k2 gold badges13 silver badges33 bronze badges

answered Mar 26 at 19:47

Pushpesh Kumar Rajwanshi

17.1k2 gold badges13 silver badges33 bronze badges

answered Mar 26 at 19:47

Pushpesh Kumar Rajwanshi

17.1k2 gold badges13 silver badges33 bronze badges

answered Mar 26 at 19:47

Pushpesh Kumar Rajwanshi

17.1k2 gold badges13 silver badges33 bronze badges

add a comment |

Use r'<iframe src="[^"]*/([^"]+)"' as the pattern for your search.

Example:

>>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
>>> pat = r'<iframe src="[^"]*/([^"]+)"'
>>> search = re.search(pat, text)
>>> search[1]
'f2c5f6ca3a4610c55d70cb211ef9d977'
>>>

answered Mar 26 at 19:02

Russ Brown

1516 bronze badges

I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

– fabiobh
Mar 26 at 19:27

add a comment |

Use r'<iframe src="[^"]*/([^"]+)"' as the pattern for your search.

Example:

>>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
>>> pat = r'<iframe src="[^"]*/([^"]+)"'
>>> search = re.search(pat, text)
>>> search[1]
'f2c5f6ca3a4610c55d70cb211ef9d977'
>>>

answered Mar 26 at 19:02

Russ Brown

1516 bronze badges

I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

– fabiobh
Mar 26 at 19:27

add a comment |

Use r'<iframe src="[^"]*/([^"]+)"' as the pattern for your search.

Example:

>>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
>>> pat = r'<iframe src="[^"]*/([^"]+)"'
>>> search = re.search(pat, text)
>>> search[1]
'f2c5f6ca3a4610c55d70cb211ef9d977'
>>>

answered Mar 26 at 19:02

Russ Brown

1516 bronze badges

Use r'<iframe src="[^"]*/([^"]+)"' as the pattern for your search.

Example:

>>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
>>> pat = r'<iframe src="[^"]*/([^"]+)"'
>>> search = re.search(pat, text)
>>> search[1]
'f2c5f6ca3a4610c55d70cb211ef9d977'
>>>

answered Mar 26 at 19:02

Russ Brown

1516 bronze badges

answered Mar 26 at 19:02

Russ Brown

1516 bronze badges

answered Mar 26 at 19:02

Russ Brown

1516 bronze badges

answered Mar 26 at 19:02

Russ Brown

1516 bronze badges

I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

– fabiobh
Mar 26 at 19:27

add a comment |

I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

– fabiobh
Mar 26 at 19:27

I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.

– fabiobh
Mar 26 at 19:27

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

2 Answers
2

2 Answers
2

2 Answers
2