Print URL from two different BeautifulSoap outputsIgnore first of the two divs with same class in BeautifulSoupHow to flush output of print function?Printing Python version in outputDifference between __str__ and __repr__?How do you read from stdin?Get difference between two listsImporting files from different folderUsing Beautifulsoup and Urllib2 in Python, how can I find the data surrounded by specific tags?Cannot display HTML stringextracting images from Google images using src and BeautifulSoupPython 3.7 scraping a page using BeautifulSoup issues with Available code on stack exchange

Print URL from two different BeautifulSoap outputsIgnore first of the two divs with same class in BeautifulSoupHow to flush output of print function?Printing Python version in outputDifference between str and repr?How do you read from stdin?Get difference between two listsImporting files from different folderUsing Beautifulsoup and Urllib2 in Python, how can I find the data surrounded by specific tags?Cannot display HTML stringextracting images from Google images using src and BeautifulSoupPython 3.7 scraping a page using BeautifulSoup issues with Available code on stack exchange

Aligning two sets of equations with alignat?

As an employer, can I compel my employees to vote?

What do these pins mean? Where should I plug them in?

Do things made of adamantine rust?

Why are Fuji lenses more expensive than others?

Did Apollo carry and use WD40?

Debussy as term for bathroom?

Where Does VDD+0.3V Input Limit Come From on IC chips?

Social leper versus social leopard

How does one calculate the distribution of the Matt Colville way of rolling stats?

In a jam session, when asked which key my non-transposing instrument (like a violin) is in, what do I answer?

How can I get a language selector in top bar in Ubuntu 19.04?

Why are some of the Stunts in The Expanse RPG labelled 'Core'?

How to manage expenditure when billing cycles and paycheck cycles are not aligned?

How to create a grid following points in QGIS?

C# Fastest way to do Array Table Lookup with Integer Index

Are actors contractually obligated to certain things like going nude/ Sensual Scenes/ Gory Scenes?

What is a Heptagon Number™?

Simulate a 1D Game-of-Life-ish Model

Best strategy for a combinatorial game

Can planetary bodies have a second axis of rotation?

Pandas aggregate with dynamic column names

How is underwater propagation of sound possible?

Paradox regarding phase transitions in relativistic systems

Print URL from two different BeautifulSoap outputs

Ignore first of the two divs with same class in BeautifulSoupHow to flush output of print function?Printing Python version in outputDifference between __str__ and __repr__?How do you read from stdin?Get difference between two listsImporting files from different folderUsing Beautifulsoup and Urllib2 in Python, how can I find the data surrounded by specific tags?Cannot display HTML stringextracting images from Google images using src and BeautifulSoupPython 3.7 scraping a page using BeautifulSoup issues with Available code on stack exchange

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I am scraping a few URLs in batch using BeautifulSoap.

Here is my script (only relevant stuff):

import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://example.com/foo/bar'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
url_box = soup.find('div', attrs='class': 'player')
print url_box

This gives 2 different kinds of print depending on the HTML of URL (about half pages gives first print and rest give the second print).

Here's first kind of print:

<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>

And here's the other:

<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>

I want to extract the image URL which is poster in first and src in second.

Any ideas how I can do that so same script extracts that URL from either kind of print?

P.S The first print also has a mp4 link which I do not need.

asked Mar 28 at 14:57

mumer91

235 bronze badges

Please, check "How to create a Minimal, Complete, and Verifiable example".

– accdias
Mar 28 at 15:00

1

The question only contains absolutely relevant parts of code. Anything not related to the question is already removed.

– mumer91
Mar 28 at 15:03

Your question could get more attention if you, for example, had post the real URL. That way we could verify the code and give you better alternative snippets.

– accdias
Mar 28 at 15:07

Since you are new on SO, I would suggest reading "How to ask".

– accdias
Mar 28 at 15:09

I want to keep my URLs anonymous as I have my reasons. It should be fairly obvious as I have obviously replaced them before posting the question.

– mumer91
Mar 28 at 15:10

|
show 1 more comment

I am scraping a few URLs in batch using BeautifulSoap.

Here is my script (only relevant stuff):

import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://example.com/foo/bar'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
url_box = soup.find('div', attrs='class': 'player')
print url_box

This gives 2 different kinds of print depending on the HTML of URL (about half pages gives first print and rest give the second print).

Here's first kind of print:

<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>

And here's the other:

<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>

I want to extract the image URL which is poster in first and src in second.

Any ideas how I can do that so same script extracts that URL from either kind of print?

P.S The first print also has a mp4 link which I do not need.

asked Mar 28 at 14:57

mumer91

235 bronze badges

Please, check "How to create a Minimal, Complete, and Verifiable example".

– accdias
Mar 28 at 15:00

1

The question only contains absolutely relevant parts of code. Anything not related to the question is already removed.

– mumer91
Mar 28 at 15:03

Your question could get more attention if you, for example, had post the real URL. That way we could verify the code and give you better alternative snippets.

– accdias
Mar 28 at 15:07

Since you are new on SO, I would suggest reading "How to ask".

– accdias
Mar 28 at 15:09

I want to keep my URLs anonymous as I have my reasons. It should be fairly obvious as I have obviously replaced them before posting the question.

– mumer91
Mar 28 at 15:10

|
show 1 more comment

I am scraping a few URLs in batch using BeautifulSoap.

Here is my script (only relevant stuff):

import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://example.com/foo/bar'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
url_box = soup.find('div', attrs='class': 'player')
print url_box

This gives 2 different kinds of print depending on the HTML of URL (about half pages gives first print and rest give the second print).

Here's first kind of print:

<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>

And here's the other:

<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>

I want to extract the image URL which is poster in first and src in second.

Any ideas how I can do that so same script extracts that URL from either kind of print?

P.S The first print also has a mp4 link which I do not need.

asked Mar 28 at 14:57

mumer91

235 bronze badges

I am scraping a few URLs in batch using BeautifulSoap.

Here is my script (only relevant stuff):

import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://example.com/foo/bar'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
url_box = soup.find('div', attrs='class': 'player')
print url_box

This gives 2 different kinds of print depending on the HTML of URL (about half pages gives first print and rest give the second print).

Here's first kind of print:

<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>

And here's the other:

<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>

I want to extract the image URL which is poster in first and src in second.

Any ideas how I can do that so same script extracts that URL from either kind of print?

P.S The first print also has a mp4 link which I do not need.

python web-scraping beautifulsoup

asked Mar 28 at 14:57

mumer91

235 bronze badges

asked Mar 28 at 14:57

mumer91

235 bronze badges

asked Mar 28 at 14:57

mumer91

235 bronze badges

asked Mar 28 at 14:57

mumer91

235 bronze badges

asked Mar 28 at 14:57

mumer91

235 bronze badges

Please, check "How to create a Minimal, Complete, and Verifiable example".

– accdias
Mar 28 at 15:00

1

The question only contains absolutely relevant parts of code. Anything not related to the question is already removed.

– mumer91
Mar 28 at 15:03

Your question could get more attention if you, for example, had post the real URL. That way we could verify the code and give you better alternative snippets.

– accdias
Mar 28 at 15:07

Since you are new on SO, I would suggest reading "How to ask".

– accdias
Mar 28 at 15:09

I want to keep my URLs anonymous as I have my reasons. It should be fairly obvious as I have obviously replaced them before posting the question.

– mumer91
Mar 28 at 15:10

|
show 1 more comment

Please, check "How to create a Minimal, Complete, and Verifiable example".

– accdias
Mar 28 at 15:00

1

The question only contains absolutely relevant parts of code. Anything not related to the question is already removed.

– mumer91
Mar 28 at 15:03

Your question could get more attention if you, for example, had post the real URL. That way we could verify the code and give you better alternative snippets.

– accdias
Mar 28 at 15:07

Since you are new on SO, I would suggest reading "How to ask".

– accdias
Mar 28 at 15:09

I want to keep my URLs anonymous as I have my reasons. It should be fairly obvious as I have obviously replaced them before posting the question.

– mumer91
Mar 28 at 15:10

Please, check "How to create a Minimal, Complete, and Verifiable example".

– accdias
Mar 28 at 15:00

The question only contains absolutely relevant parts of code. Anything not related to the question is already removed.

– mumer91
Mar 28 at 15:03

Your question could get more attention if you, for example, had post the real URL. That way we could verify the code and give you better alternative snippets.

– accdias
Mar 28 at 15:07

Since you are new on SO, I would suggest reading "How to ask".

– accdias
Mar 28 at 15:09

I want to keep my URLs anonymous as I have my reasons. It should be fairly obvious as I have obviously replaced them before posting the question.

– mumer91
Mar 28 at 15:10

|
show 1 more comment

2 Answers
2

active

oldest

votes

You can use the get() method to get the value of attrs from the targeted tag.

You should be able to do something like this:

if url_box.find('video'):
 url = url_box.find('video').get('poster')
 mp4 = ulr_box.find('span').get('data-url')
if url_box.find('img'):
 url = url_box.find('img').get('src')

answered Mar 28 at 15:11

Maaz

1,4631 gold badge8 silver badges15 bronze badges

This worked great. Thanks.

– mumer91
Mar 28 at 15:33

add a comment
|

Decide which version you are dealing with and split accordingly:


firstVersion = '''<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>'''

secondVersion = '''<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>'''

def extractImageUrl(htmlInput):
 imageUrl = ""
 if "poster" in htmlInput:
 imageUrl = htmlInput.split('poster="')[1].split('"')[0]
 elif "src" in htmlInput:
 imageUrl = htmlInput.split('src="')[1].split('"')[0]
 return imageUrl

answered Mar 28 at 15:10

drumino

654 bronze badges

add a comment
|

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55400733%2fprint-url-from-two-different-beautifulsoap-outputs%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You can use the get() method to get the value of attrs from the targeted tag.

You should be able to do something like this:

if url_box.find('video'):
 url = url_box.find('video').get('poster')
 mp4 = ulr_box.find('span').get('data-url')
if url_box.find('img'):
 url = url_box.find('img').get('src')

answered Mar 28 at 15:11

Maaz

1,4631 gold badge8 silver badges15 bronze badges

This worked great. Thanks.

– mumer91
Mar 28 at 15:33

add a comment
|

You can use the get() method to get the value of attrs from the targeted tag.

You should be able to do something like this:

if url_box.find('video'):
 url = url_box.find('video').get('poster')
 mp4 = ulr_box.find('span').get('data-url')
if url_box.find('img'):
 url = url_box.find('img').get('src')

answered Mar 28 at 15:11

Maaz

1,4631 gold badge8 silver badges15 bronze badges

This worked great. Thanks.

– mumer91
Mar 28 at 15:33

add a comment
|

You can use the get() method to get the value of attrs from the targeted tag.

You should be able to do something like this:

if url_box.find('video'):
 url = url_box.find('video').get('poster')
 mp4 = ulr_box.find('span').get('data-url')
if url_box.find('img'):
 url = url_box.find('img').get('src')

answered Mar 28 at 15:11

Maaz

1,4631 gold badge8 silver badges15 bronze badges

You can use the get() method to get the value of attrs from the targeted tag.

You should be able to do something like this:

if url_box.find('video'):
 url = url_box.find('video').get('poster')
 mp4 = ulr_box.find('span').get('data-url')
if url_box.find('img'):
 url = url_box.find('img').get('src')

answered Mar 28 at 15:11

Maaz

1,4631 gold badge8 silver badges15 bronze badges

answered Mar 28 at 15:11

Maaz

1,4631 gold badge8 silver badges15 bronze badges

answered Mar 28 at 15:11

Maaz

1,4631 gold badge8 silver badges15 bronze badges

answered Mar 28 at 15:11

Maaz

1,4631 gold badge8 silver badges15 bronze badges

This worked great. Thanks.

– mumer91
Mar 28 at 15:33

add a comment
|

This worked great. Thanks.

– mumer91
Mar 28 at 15:33

This worked great. Thanks.

– mumer91
Mar 28 at 15:33

add a comment
|

Decide which version you are dealing with and split accordingly:


firstVersion = '''<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>'''

secondVersion = '''<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>'''

def extractImageUrl(htmlInput):
 imageUrl = ""
 if "poster" in htmlInput:
 imageUrl = htmlInput.split('poster="')[1].split('"')[0]
 elif "src" in htmlInput:
 imageUrl = htmlInput.split('src="')[1].split('"')[0]
 return imageUrl

answered Mar 28 at 15:10

drumino

654 bronze badges

add a comment
|

Decide which version you are dealing with and split accordingly:


firstVersion = '''<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>'''

secondVersion = '''<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>'''

def extractImageUrl(htmlInput):
 imageUrl = ""
 if "poster" in htmlInput:
 imageUrl = htmlInput.split('poster="')[1].split('"')[0]
 elif "src" in htmlInput:
 imageUrl = htmlInput.split('src="')[1].split('"')[0]
 return imageUrl

answered Mar 28 at 15:10

drumino

654 bronze badges

add a comment
|

Decide which version you are dealing with and split accordingly:


firstVersion = '''<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>'''

secondVersion = '''<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>'''

def extractImageUrl(htmlInput):
 imageUrl = ""
 if "poster" in htmlInput:
 imageUrl = htmlInput.split('poster="')[1].split('"')[0]
 elif "src" in htmlInput:
 imageUrl = htmlInput.split('src="')[1].split('"')[0]
 return imageUrl

answered Mar 28 at 15:10

drumino

654 bronze badges

Decide which version you are dealing with and split accordingly:


firstVersion = '''<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>'''

secondVersion = '''<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>'''

def extractImageUrl(htmlInput):
 imageUrl = ""
 if "poster" in htmlInput:
 imageUrl = htmlInput.split('poster="')[1].split('"')[0]
 elif "src" in htmlInput:
 imageUrl = htmlInput.split('src="')[1].split('"')[0]
 return imageUrl

answered Mar 28 at 15:10

drumino

654 bronze badges

answered Mar 28 at 15:10

drumino

654 bronze badges

answered Mar 28 at 15:10

drumino

654 bronze badges

answered Mar 28 at 15:10

drumino

654 bronze badges

add a comment
|

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

2 Answers
2

2 Answers
2

2 Answers
2