Use Regex with Python to get an specifc part of the iframe srcRegEx match open tags except XHTML self-contained tagsHow to get the current time in PythonGetting the last element of a listHow do I get the number of elements in a list?php preg_replace help iframe srcHow to embed pixelbark.com videosJquery ui tabs load iframe only when clickedCannot display HTML stringHow Can I load a Flashplayer in android?what is the simplest way to customize google map styles to embed in webpages using iframesHow can I download a video. This page must be accessed within an iFrame
Can anyone give a concrete example to illustrate what is an uniform prior?
Assuring luggage isn't lost with short layover
Is it legal for private citizens to "impound" e-scooters?
Could the rotation of a black hole cause other planets to rotate?
Why can't my huge trees be chopped down?
Why was Sauron preparing for war instead of trying to find the ring?
Isolated audio without a transformer
Suggestions for protecting jeans from saddle clamp bolt
Are there any examples of technologies have been lost over time?
Decreasing star count
Is it legal to use cash pulled from a credit card to pay the monthly payment on that credit card?
Why force the nose of 737 Max down in the first place?
To find islands of 1 and 0 in matrix
Pointwise convergence of uniformly continuous functions to zero, but not uniformly
Why does Canada require mandatory bilingualism in all government posts?
Defining a Function programmatically
Why do planes need a roll motion?
Melee or Ranged attacks by Monsters, no distinction in modifiers?
Can a table be formatted so that math mode is in some columns and text is in others by default?
Writing a clean implementation of rock–paper–scissors game in C++
Trapped in an ocean Temple in Minecraft?
Commercial jet accompanied by small plane near Seattle
How to judge a Ph.D. applicant that arrives "out of thin air"
Does the Intel 8086 CPU have user mode and kernel mode?
Use Regex with Python to get an specifc part of the iframe src
RegEx match open tags except XHTML self-contained tagsHow to get the current time in PythonGetting the last element of a listHow do I get the number of elements in a list?php preg_replace help iframe srcHow to embed pixelbark.com videosJquery ui tabs load iframe only when clickedCannot display HTML stringHow Can I load a Flashplayer in android?what is the simplest way to customize google map styles to embed in webpages using iframesHow can I download a video. This page must be accessed within an iFrame
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I try to capture an iframe src content that I want to change. I don't have direct access to the HTML, I get it HTML from an API.
You can see some iframe example below:
<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>
I have many other type of iframe examples, the only part they have in common is this part of src content https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302
I create the following code to find an element:
// some code
regex_page_embed = r"http.?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/*"
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1)
print(s1.group())
After that I create more code that I can use and effectively change the HTML using the API, I don't think is necessary to put it here.
But when I use:
print(s1)
print(s1.group())
I got the following result:
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(126, 211), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(227, 312), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
I want to get the last part of the iframe src content. In the example below
<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
The f2c5f6ca3a4610c55d70cb211ef9d977 is the part that I want.
print(s1) and print(s1.group()) don't show the last part of the src content, how can I get the last part of the iframe src content?
python regex iframe
add a comment |
I try to capture an iframe src content that I want to change. I don't have direct access to the HTML, I get it HTML from an API.
You can see some iframe example below:
<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>
I have many other type of iframe examples, the only part they have in common is this part of src content https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302
I create the following code to find an element:
// some code
regex_page_embed = r"http.?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/*"
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1)
print(s1.group())
After that I create more code that I can use and effectively change the HTML using the API, I don't think is necessary to put it here.
But when I use:
print(s1)
print(s1.group())
I got the following result:
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(126, 211), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(227, 312), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
I want to get the last part of the iframe src content. In the example below
<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
The f2c5f6ca3a4610c55d70cb211ef9d977 is the part that I want.
print(s1) and print(s1.group()) don't show the last part of the src content, how can I get the last part of the iframe src content?
python regex iframe
1
In the regex, change the star at the end to(.*?)(?=")
.
– Quixrick
Mar 26 at 18:49
Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344
– Roca
Mar 26 at 19:34
I'd just usebs4
to parse the iframe and then extract thesrc
text content and go from there...
– Roca
Mar 26 at 19:36
I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.
– fabiobh
Mar 26 at 19:41
Ah right, since it's dynamic contents you should be using a different module likeselenium
orrequests-html
. I'm actually surprised you are able to get the iframe in thebs4
extracted content at all.
– Roca
Mar 26 at 19:46
add a comment |
I try to capture an iframe src content that I want to change. I don't have direct access to the HTML, I get it HTML from an API.
You can see some iframe example below:
<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>
I have many other type of iframe examples, the only part they have in common is this part of src content https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302
I create the following code to find an element:
// some code
regex_page_embed = r"http.?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/*"
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1)
print(s1.group())
After that I create more code that I can use and effectively change the HTML using the API, I don't think is necessary to put it here.
But when I use:
print(s1)
print(s1.group())
I got the following result:
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(126, 211), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(227, 312), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
I want to get the last part of the iframe src content. In the example below
<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
The f2c5f6ca3a4610c55d70cb211ef9d977 is the part that I want.
print(s1) and print(s1.group()) don't show the last part of the src content, how can I get the last part of the iframe src content?
python regex iframe
I try to capture an iframe src content that I want to change. I don't have direct access to the HTML, I get it HTML from an API.
You can see some iframe example below:
<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
<iframe allowfullscreen="" frameborder="0" height="276" mozallowfullscreen="" scrolling="no" src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/%20f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490"></iframe>
I have many other type of iframe examples, the only part they have in common is this part of src content https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302
I create the following code to find an element:
// some code
regex_page_embed = r"http.?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/*"
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1)
print(s1.group())
After that I create more code that I can use and effectively change the HTML using the API, I don't think is necessary to put it here.
But when I use:
print(s1)
print(s1.group())
I got the following result:
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(126, 211), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(686, 771), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
<_sre.SRE_Match object; span=(227, 312), match='https://fast.player.liquidplatform.com/pApiv2/emb>
https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/
I want to get the last part of the iframe src content. In the example below
<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">
The f2c5f6ca3a4610c55d70cb211ef9d977 is the part that I want.
print(s1) and print(s1.group()) don't show the last part of the src content, how can I get the last part of the iframe src content?
python regex iframe
python regex iframe
edited Mar 26 at 19:35
fabiobh
asked Mar 26 at 18:45
fabiobhfabiobh
1611 gold badge2 silver badges13 bronze badges
1611 gold badge2 silver badges13 bronze badges
1
In the regex, change the star at the end to(.*?)(?=")
.
– Quixrick
Mar 26 at 18:49
Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344
– Roca
Mar 26 at 19:34
I'd just usebs4
to parse the iframe and then extract thesrc
text content and go from there...
– Roca
Mar 26 at 19:36
I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.
– fabiobh
Mar 26 at 19:41
Ah right, since it's dynamic contents you should be using a different module likeselenium
orrequests-html
. I'm actually surprised you are able to get the iframe in thebs4
extracted content at all.
– Roca
Mar 26 at 19:46
add a comment |
1
In the regex, change the star at the end to(.*?)(?=")
.
– Quixrick
Mar 26 at 18:49
Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344
– Roca
Mar 26 at 19:34
I'd just usebs4
to parse the iframe and then extract thesrc
text content and go from there...
– Roca
Mar 26 at 19:36
I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.
– fabiobh
Mar 26 at 19:41
Ah right, since it's dynamic contents you should be using a different module likeselenium
orrequests-html
. I'm actually surprised you are able to get the iframe in thebs4
extracted content at all.
– Roca
Mar 26 at 19:46
1
1
In the regex, change the star at the end to
(.*?)(?=")
.– Quixrick
Mar 26 at 18:49
In the regex, change the star at the end to
(.*?)(?=")
.– Quixrick
Mar 26 at 18:49
Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344
– Roca
Mar 26 at 19:34
Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344
– Roca
Mar 26 at 19:34
I'd just use
bs4
to parse the iframe and then extract the src
text content and go from there...– Roca
Mar 26 at 19:36
I'd just use
bs4
to parse the iframe and then extract the src
text content and go from there...– Roca
Mar 26 at 19:36
I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.
– fabiobh
Mar 26 at 19:41
I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.
– fabiobh
Mar 26 at 19:41
Ah right, since it's dynamic contents you should be using a different module like
selenium
or requests-html
. I'm actually surprised you are able to get the iframe in the bs4
extracted content at all.– Roca
Mar 26 at 19:46
Ah right, since it's dynamic contents you should be using a different module like
selenium
or requests-html
. I'm actually surprised you are able to get the iframe in the bs4
extracted content at all.– Roca
Mar 26 at 19:46
add a comment |
2 Answers
2
active
oldest
votes
A better regex for capturing the whole url while having any optional content between <iframe
tag and src
tag is this,
<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)
Match using this regex and capture your url from group1.
Online Demo
Here is your updated Python code,
regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1.group(1)) # extract url using first group
add a comment |
Use r'<iframe src="[^"]*/([^"]+)"'
as the pattern for your search.
Example:
>>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
>>> pat = r'<iframe src="[^"]*/([^"]+)"'
>>> search = re.search(pat, text)
>>> search[1]
'f2c5f6ca3a4610c55d70cb211ef9d977'
>>>
I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.
– fabiobh
Mar 26 at 19:27
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55364264%2fuse-regex-with-python-to-get-an-specifc-part-of-the-iframe-src%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
A better regex for capturing the whole url while having any optional content between <iframe
tag and src
tag is this,
<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)
Match using this regex and capture your url from group1.
Online Demo
Here is your updated Python code,
regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1.group(1)) # extract url using first group
add a comment |
A better regex for capturing the whole url while having any optional content between <iframe
tag and src
tag is this,
<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)
Match using this regex and capture your url from group1.
Online Demo
Here is your updated Python code,
regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1.group(1)) # extract url using first group
add a comment |
A better regex for capturing the whole url while having any optional content between <iframe
tag and src
tag is this,
<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)
Match using this regex and capture your url from group1.
Online Demo
Here is your updated Python code,
regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1.group(1)) # extract url using first group
A better regex for capturing the whole url while having any optional content between <iframe
tag and src
tag is this,
<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)
Match using this regex and capture your url from group1.
Online Demo
Here is your updated Python code,
regex_page_embed = r'<iframe .*?bsrc="(https?://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/[^"]+)'
soup = BeautifulSoup(page_html, 'html.parser')
page_elements = list(soup.children)
for element in page_elements:
try:
s1 = re.search(regex_page_embed, str(element))
if s1:
print(s1.group(1)) # extract url using first group
answered Mar 26 at 19:47
Pushpesh Kumar RajwanshiPushpesh Kumar Rajwanshi
17.1k2 gold badges13 silver badges33 bronze badges
17.1k2 gold badges13 silver badges33 bronze badges
add a comment |
add a comment |
Use r'<iframe src="[^"]*/([^"]+)"'
as the pattern for your search.
Example:
>>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
>>> pat = r'<iframe src="[^"]*/([^"]+)"'
>>> search = re.search(pat, text)
>>> search[1]
'f2c5f6ca3a4610c55d70cb211ef9d977'
>>>
I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.
– fabiobh
Mar 26 at 19:27
add a comment |
Use r'<iframe src="[^"]*/([^"]+)"'
as the pattern for your search.
Example:
>>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
>>> pat = r'<iframe src="[^"]*/([^"]+)"'
>>> search = re.search(pat, text)
>>> search[1]
'f2c5f6ca3a4610c55d70cb211ef9d977'
>>>
I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.
– fabiobh
Mar 26 at 19:27
add a comment |
Use r'<iframe src="[^"]*/([^"]+)"'
as the pattern for your search.
Example:
>>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
>>> pat = r'<iframe src="[^"]*/([^"]+)"'
>>> search = re.search(pat, text)
>>> search[1]
'f2c5f6ca3a4610c55d70cb211ef9d977'
>>>
Use r'<iframe src="[^"]*/([^"]+)"'
as the pattern for your search.
Example:
>>> text = """<iframe src="https://fast.player.liquidplatform.com/pApiv2/embed/e50a2b66dc19adc532f288eb4bf2d302/f2c5f6ca3a4610c55d70cb211ef9d977" webkitallowfullscreen="" width="490">"""
>>> pat = r'<iframe src="[^"]*/([^"]+)"'
>>> search = re.search(pat, text)
>>> search[1]
'f2c5f6ca3a4610c55d70cb211ef9d977'
>>>
answered Mar 26 at 19:02
Russ BrownRuss Brown
1516 bronze badges
1516 bronze badges
I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.
– fabiobh
Mar 26 at 19:27
add a comment |
I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.
– fabiobh
Mar 26 at 19:27
I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.
– fabiobh
Mar 26 at 19:27
I edit my question now, I include a second iframe example. I forgot to mention that I have another type of iframes include in the HTML. Your answer will be correct if all iframes are only based in the first iframe example. I have another iframe examples in my page that are completely different from the 2 examples that I provide, the only common part is the iframe src content.
– fabiobh
Mar 26 at 19:27
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55364264%2fuse-regex-with-python-to-get-an-specifc-part-of-the-iframe-src%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
In the regex, change the star at the end to
(.*?)(?=")
.– Quixrick
Mar 26 at 18:49
Relevant read on parsing html content with regex: stackoverflow.com/a/1732454/9183344
– Roca
Mar 26 at 19:34
I'd just use
bs4
to parse the iframe and then extract thesrc
text content and go from there...– Roca
Mar 26 at 19:36
I try to use bs4 first to get the content, but I see that I get more results with regex than bs4. I investigate why this is happening and I find that some iframes are inserted in the page using javascript document.write. This way only regex was able to find it, bs4 can't find it as well.
– fabiobh
Mar 26 at 19:41
Ah right, since it's dynamic contents you should be using a different module like
selenium
orrequests-html
. I'm actually surprised you are able to get the iframe in thebs4
extracted content at all.– Roca
Mar 26 at 19:46