Print URL from two different BeautifulSoap outputsIgnore first of the two divs with same class in BeautifulSoupHow to flush output of print function?Printing Python version in outputDifference between __str__ and __repr__?How do you read from stdin?Get difference between two listsImporting files from different folderUsing Beautifulsoup and Urllib2 in Python, how can I find the data surrounded by specific tags?Cannot display HTML stringextracting images from Google images using src and BeautifulSoupPython 3.7 scraping a page using BeautifulSoup issues with Available code on stack exchange
Aligning two sets of equations with alignat?
As an employer, can I compel my employees to vote?
What do these pins mean? Where should I plug them in?
Do things made of adamantine rust?
Why are Fuji lenses more expensive than others?
Did Apollo carry and use WD40?
Debussy as term for bathroom?
Where Does VDD+0.3V Input Limit Come From on IC chips?
Social leper versus social leopard
How does one calculate the distribution of the Matt Colville way of rolling stats?
In a jam session, when asked which key my non-transposing instrument (like a violin) is in, what do I answer?
How can I get a language selector in top bar in Ubuntu 19.04?
Why are some of the Stunts in The Expanse RPG labelled 'Core'?
How to manage expenditure when billing cycles and paycheck cycles are not aligned?
How to create a grid following points in QGIS?
C# Fastest way to do Array Table Lookup with Integer Index
Are actors contractually obligated to certain things like going nude/ Sensual Scenes/ Gory Scenes?
What is a Heptagon Number™?
Simulate a 1D Game-of-Life-ish Model
Best strategy for a combinatorial game
Can planetary bodies have a second axis of rotation?
Pandas aggregate with dynamic column names
How is underwater propagation of sound possible?
Paradox regarding phase transitions in relativistic systems
Print URL from two different BeautifulSoap outputs
Ignore first of the two divs with same class in BeautifulSoupHow to flush output of print function?Printing Python version in outputDifference between __str__ and __repr__?How do you read from stdin?Get difference between two listsImporting files from different folderUsing Beautifulsoup and Urllib2 in Python, how can I find the data surrounded by specific tags?Cannot display HTML stringextracting images from Google images using src and BeautifulSoupPython 3.7 scraping a page using BeautifulSoup issues with Available code on stack exchange
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I am scraping a few URLs in batch using BeautifulSoap.
Here is my script (only relevant stuff):
import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://example.com/foo/bar'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
url_box = soup.find('div', attrs='class': 'player')
print url_box
This gives 2 different kinds of print depending on the HTML of URL (about half pages gives first print and rest give the second print).
Here's first kind of print:
<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>
And here's the other:
<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>
I want to extract the image URL which is poster
in first and src
in second.
Any ideas how I can do that so same script extracts that URL from either kind of print?
P.S The first print also has a mp4 link which I do not need.
python web-scraping beautifulsoup
|
show 1 more comment
I am scraping a few URLs in batch using BeautifulSoap.
Here is my script (only relevant stuff):
import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://example.com/foo/bar'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
url_box = soup.find('div', attrs='class': 'player')
print url_box
This gives 2 different kinds of print depending on the HTML of URL (about half pages gives first print and rest give the second print).
Here's first kind of print:
<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>
And here's the other:
<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>
I want to extract the image URL which is poster
in first and src
in second.
Any ideas how I can do that so same script extracts that URL from either kind of print?
P.S The first print also has a mp4 link which I do not need.
python web-scraping beautifulsoup
Please, check "How to create a Minimal, Complete, and Verifiable example".
– accdias
Mar 28 at 15:00
1
The question only contains absolutely relevant parts of code. Anything not related to the question is already removed.
– mumer91
Mar 28 at 15:03
Your question could get more attention if you, for example, had post the real URL. That way we could verify the code and give you better alternative snippets.
– accdias
Mar 28 at 15:07
Since you are new on SO, I would suggest reading "How to ask".
– accdias
Mar 28 at 15:09
I want to keep my URLs anonymous as I have my reasons. It should be fairly obvious as I have obviously replaced them before posting the question.
– mumer91
Mar 28 at 15:10
|
show 1 more comment
I am scraping a few URLs in batch using BeautifulSoap.
Here is my script (only relevant stuff):
import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://example.com/foo/bar'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
url_box = soup.find('div', attrs='class': 'player')
print url_box
This gives 2 different kinds of print depending on the HTML of URL (about half pages gives first print and rest give the second print).
Here's first kind of print:
<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>
And here's the other:
<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>
I want to extract the image URL which is poster
in first and src
in second.
Any ideas how I can do that so same script extracts that URL from either kind of print?
P.S The first print also has a mp4 link which I do not need.
python web-scraping beautifulsoup
I am scraping a few URLs in batch using BeautifulSoap.
Here is my script (only relevant stuff):
import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://example.com/foo/bar'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
url_box = soup.find('div', attrs='class': 'player')
print url_box
This gives 2 different kinds of print depending on the HTML of URL (about half pages gives first print and rest give the second print).
Here's first kind of print:
<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>
And here's the other:
<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>
I want to extract the image URL which is poster
in first and src
in second.
Any ideas how I can do that so same script extracts that URL from either kind of print?
P.S The first print also has a mp4 link which I do not need.
python web-scraping beautifulsoup
python web-scraping beautifulsoup
asked Mar 28 at 14:57
mumer91mumer91
235 bronze badges
235 bronze badges
Please, check "How to create a Minimal, Complete, and Verifiable example".
– accdias
Mar 28 at 15:00
1
The question only contains absolutely relevant parts of code. Anything not related to the question is already removed.
– mumer91
Mar 28 at 15:03
Your question could get more attention if you, for example, had post the real URL. That way we could verify the code and give you better alternative snippets.
– accdias
Mar 28 at 15:07
Since you are new on SO, I would suggest reading "How to ask".
– accdias
Mar 28 at 15:09
I want to keep my URLs anonymous as I have my reasons. It should be fairly obvious as I have obviously replaced them before posting the question.
– mumer91
Mar 28 at 15:10
|
show 1 more comment
Please, check "How to create a Minimal, Complete, and Verifiable example".
– accdias
Mar 28 at 15:00
1
The question only contains absolutely relevant parts of code. Anything not related to the question is already removed.
– mumer91
Mar 28 at 15:03
Your question could get more attention if you, for example, had post the real URL. That way we could verify the code and give you better alternative snippets.
– accdias
Mar 28 at 15:07
Since you are new on SO, I would suggest reading "How to ask".
– accdias
Mar 28 at 15:09
I want to keep my URLs anonymous as I have my reasons. It should be fairly obvious as I have obviously replaced them before posting the question.
– mumer91
Mar 28 at 15:10
Please, check "How to create a Minimal, Complete, and Verifiable example".
– accdias
Mar 28 at 15:00
Please, check "How to create a Minimal, Complete, and Verifiable example".
– accdias
Mar 28 at 15:00
1
1
The question only contains absolutely relevant parts of code. Anything not related to the question is already removed.
– mumer91
Mar 28 at 15:03
The question only contains absolutely relevant parts of code. Anything not related to the question is already removed.
– mumer91
Mar 28 at 15:03
Your question could get more attention if you, for example, had post the real URL. That way we could verify the code and give you better alternative snippets.
– accdias
Mar 28 at 15:07
Your question could get more attention if you, for example, had post the real URL. That way we could verify the code and give you better alternative snippets.
– accdias
Mar 28 at 15:07
Since you are new on SO, I would suggest reading "How to ask".
– accdias
Mar 28 at 15:09
Since you are new on SO, I would suggest reading "How to ask".
– accdias
Mar 28 at 15:09
I want to keep my URLs anonymous as I have my reasons. It should be fairly obvious as I have obviously replaced them before posting the question.
– mumer91
Mar 28 at 15:10
I want to keep my URLs anonymous as I have my reasons. It should be fairly obvious as I have obviously replaced them before posting the question.
– mumer91
Mar 28 at 15:10
|
show 1 more comment
2 Answers
2
active
oldest
votes
You can use the get()
method to get the value of attrs from the targeted tag.
You should be able to do something like this:
if url_box.find('video'):
url = url_box.find('video').get('poster')
mp4 = ulr_box.find('span').get('data-url')
if url_box.find('img'):
url = url_box.find('img').get('src')
This worked great. Thanks.
– mumer91
Mar 28 at 15:33
add a comment
|
Decide which version you are dealing with and split accordingly:
firstVersion = '''<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>'''
secondVersion = '''<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>'''
def extractImageUrl(htmlInput):
imageUrl = ""
if "poster" in htmlInput:
imageUrl = htmlInput.split('poster="')[1].split('"')[0]
elif "src" in htmlInput:
imageUrl = htmlInput.split('src="')[1].split('"')[0]
return imageUrl
add a comment
|
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55400733%2fprint-url-from-two-different-beautifulsoap-outputs%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use the get()
method to get the value of attrs from the targeted tag.
You should be able to do something like this:
if url_box.find('video'):
url = url_box.find('video').get('poster')
mp4 = ulr_box.find('span').get('data-url')
if url_box.find('img'):
url = url_box.find('img').get('src')
This worked great. Thanks.
– mumer91
Mar 28 at 15:33
add a comment
|
You can use the get()
method to get the value of attrs from the targeted tag.
You should be able to do something like this:
if url_box.find('video'):
url = url_box.find('video').get('poster')
mp4 = ulr_box.find('span').get('data-url')
if url_box.find('img'):
url = url_box.find('img').get('src')
This worked great. Thanks.
– mumer91
Mar 28 at 15:33
add a comment
|
You can use the get()
method to get the value of attrs from the targeted tag.
You should be able to do something like this:
if url_box.find('video'):
url = url_box.find('video').get('poster')
mp4 = ulr_box.find('span').get('data-url')
if url_box.find('img'):
url = url_box.find('img').get('src')
You can use the get()
method to get the value of attrs from the targeted tag.
You should be able to do something like this:
if url_box.find('video'):
url = url_box.find('video').get('poster')
mp4 = ulr_box.find('span').get('data-url')
if url_box.find('img'):
url = url_box.find('img').get('src')
answered Mar 28 at 15:11
MaazMaaz
1,4631 gold badge8 silver badges15 bronze badges
1,4631 gold badge8 silver badges15 bronze badges
This worked great. Thanks.
– mumer91
Mar 28 at 15:33
add a comment
|
This worked great. Thanks.
– mumer91
Mar 28 at 15:33
This worked great. Thanks.
– mumer91
Mar 28 at 15:33
This worked great. Thanks.
– mumer91
Mar 28 at 15:33
add a comment
|
Decide which version you are dealing with and split accordingly:
firstVersion = '''<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>'''
secondVersion = '''<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>'''
def extractImageUrl(htmlInput):
imageUrl = ""
if "poster" in htmlInput:
imageUrl = htmlInput.split('poster="')[1].split('"')[0]
elif "src" in htmlInput:
imageUrl = htmlInput.split('src="')[1].split('"')[0]
return imageUrl
add a comment
|
Decide which version you are dealing with and split accordingly:
firstVersion = '''<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>'''
secondVersion = '''<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>'''
def extractImageUrl(htmlInput):
imageUrl = ""
if "poster" in htmlInput:
imageUrl = htmlInput.split('poster="')[1].split('"')[0]
elif "src" in htmlInput:
imageUrl = htmlInput.split('src="')[1].split('"')[0]
return imageUrl
add a comment
|
Decide which version you are dealing with and split accordingly:
firstVersion = '''<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>'''
secondVersion = '''<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>'''
def extractImageUrl(htmlInput):
imageUrl = ""
if "poster" in htmlInput:
imageUrl = htmlInput.split('poster="')[1].split('"')[0]
elif "src" in htmlInput:
imageUrl = htmlInput.split('src="')[1].split('"')[0]
return imageUrl
Decide which version you are dealing with and split accordingly:
firstVersion = '''<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>'''
secondVersion = '''<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>'''
def extractImageUrl(htmlInput):
imageUrl = ""
if "poster" in htmlInput:
imageUrl = htmlInput.split('poster="')[1].split('"')[0]
elif "src" in htmlInput:
imageUrl = htmlInput.split('src="')[1].split('"')[0]
return imageUrl
answered Mar 28 at 15:10
druminodrumino
654 bronze badges
654 bronze badges
add a comment
|
add a comment
|
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55400733%2fprint-url-from-two-different-beautifulsoap-outputs%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Please, check "How to create a Minimal, Complete, and Verifiable example".
– accdias
Mar 28 at 15:00
1
The question only contains absolutely relevant parts of code. Anything not related to the question is already removed.
– mumer91
Mar 28 at 15:03
Your question could get more attention if you, for example, had post the real URL. That way we could verify the code and give you better alternative snippets.
– accdias
Mar 28 at 15:07
Since you are new on SO, I would suggest reading "How to ask".
– accdias
Mar 28 at 15:09
I want to keep my URLs anonymous as I have my reasons. It should be fairly obvious as I have obviously replaced them before posting the question.
– mumer91
Mar 28 at 15:10