Print URL from two different BeautifulSoap outputsIgnore first of the two divs with same class in BeautifulSoupHow to flush output of print function?Printing Python version in outputDifference between __str__ and __repr__?How do you read from stdin?Get difference between two listsImporting files from different folderUsing Beautifulsoup and Urllib2 in Python, how can I find the data surrounded by specific tags?Cannot display HTML stringextracting images from Google images using src and BeautifulSoupPython 3.7 scraping a page using BeautifulSoup issues with Available code on stack exchange

Aligning two sets of equations with alignat?

As an employer, can I compel my employees to vote?

What do these pins mean? Where should I plug them in?

Do things made of adamantine rust?

Why are Fuji lenses more expensive than others?

Did Apollo carry and use WD40?

Debussy as term for bathroom?

Where Does VDD+0.3V Input Limit Come From on IC chips?

Social leper versus social leopard

How does one calculate the distribution of the Matt Colville way of rolling stats?

In a jam session, when asked which key my non-transposing instrument (like a violin) is in, what do I answer?

How can I get a language selector in top bar in Ubuntu 19.04?

Why are some of the Stunts in The Expanse RPG labelled 'Core'?

How to manage expenditure when billing cycles and paycheck cycles are not aligned?

How to create a grid following points in QGIS?

C# Fastest way to do Array Table Lookup with Integer Index

Are actors contractually obligated to certain things like going nude/ Sensual Scenes/ Gory Scenes?

What is a Heptagon Number™?

Simulate a 1D Game-of-Life-ish Model

Best strategy for a combinatorial game

Can planetary bodies have a second axis of rotation?

Pandas aggregate with dynamic column names

How is underwater propagation of sound possible?

Paradox regarding phase transitions in relativistic systems



Print URL from two different BeautifulSoap outputs


Ignore first of the two divs with same class in BeautifulSoupHow to flush output of print function?Printing Python version in outputDifference between __str__ and __repr__?How do you read from stdin?Get difference between two listsImporting files from different folderUsing Beautifulsoup and Urllib2 in Python, how can I find the data surrounded by specific tags?Cannot display HTML stringextracting images from Google images using src and BeautifulSoupPython 3.7 scraping a page using BeautifulSoup issues with Available code on stack exchange






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








0















I am scraping a few URLs in batch using BeautifulSoap.



Here is my script (only relevant stuff):



import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://example.com/foo/bar'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
url_box = soup.find('div', attrs='class': 'player')
print url_box


This gives 2 different kinds of print depending on the HTML of URL (about half pages gives first print and rest give the second print).



Here's first kind of print:



<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>


And here's the other:



<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>


I want to extract the image URL which is poster in first and src in second.



Any ideas how I can do that so same script extracts that URL from either kind of print?



P.S The first print also has a mp4 link which I do not need.










share|improve this question
























  • Please, check "How to create a Minimal, Complete, and Verifiable example".

    – accdias
    Mar 28 at 15:00






  • 1





    The question only contains absolutely relevant parts of code. Anything not related to the question is already removed.

    – mumer91
    Mar 28 at 15:03











  • Your question could get more attention if you, for example, had post the real URL. That way we could verify the code and give you better alternative snippets.

    – accdias
    Mar 28 at 15:07












  • Since you are new on SO, I would suggest reading "How to ask".

    – accdias
    Mar 28 at 15:09












  • I want to keep my URLs anonymous as I have my reasons. It should be fairly obvious as I have obviously replaced them before posting the question.

    – mumer91
    Mar 28 at 15:10


















0















I am scraping a few URLs in batch using BeautifulSoap.



Here is my script (only relevant stuff):



import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://example.com/foo/bar'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
url_box = soup.find('div', attrs='class': 'player')
print url_box


This gives 2 different kinds of print depending on the HTML of URL (about half pages gives first print and rest give the second print).



Here's first kind of print:



<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>


And here's the other:



<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>


I want to extract the image URL which is poster in first and src in second.



Any ideas how I can do that so same script extracts that URL from either kind of print?



P.S The first print also has a mp4 link which I do not need.










share|improve this question
























  • Please, check "How to create a Minimal, Complete, and Verifiable example".

    – accdias
    Mar 28 at 15:00






  • 1





    The question only contains absolutely relevant parts of code. Anything not related to the question is already removed.

    – mumer91
    Mar 28 at 15:03











  • Your question could get more attention if you, for example, had post the real URL. That way we could verify the code and give you better alternative snippets.

    – accdias
    Mar 28 at 15:07












  • Since you are new on SO, I would suggest reading "How to ask".

    – accdias
    Mar 28 at 15:09












  • I want to keep my URLs anonymous as I have my reasons. It should be fairly obvious as I have obviously replaced them before posting the question.

    – mumer91
    Mar 28 at 15:10














0












0








0








I am scraping a few URLs in batch using BeautifulSoap.



Here is my script (only relevant stuff):



import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://example.com/foo/bar'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
url_box = soup.find('div', attrs='class': 'player')
print url_box


This gives 2 different kinds of print depending on the HTML of URL (about half pages gives first print and rest give the second print).



Here's first kind of print:



<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>


And here's the other:



<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>


I want to extract the image URL which is poster in first and src in second.



Any ideas how I can do that so same script extracts that URL from either kind of print?



P.S The first print also has a mp4 link which I do not need.










share|improve this question














I am scraping a few URLs in batch using BeautifulSoap.



Here is my script (only relevant stuff):



import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://example.com/foo/bar'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
url_box = soup.find('div', attrs='class': 'player')
print url_box


This gives 2 different kinds of print depending on the HTML of URL (about half pages gives first print and rest give the second print).



Here's first kind of print:



<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>


And here's the other:



<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>


I want to extract the image URL which is poster in first and src in second.



Any ideas how I can do that so same script extracts that URL from either kind of print?



P.S The first print also has a mp4 link which I do not need.







python web-scraping beautifulsoup






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 28 at 14:57









mumer91mumer91

235 bronze badges




235 bronze badges















  • Please, check "How to create a Minimal, Complete, and Verifiable example".

    – accdias
    Mar 28 at 15:00






  • 1





    The question only contains absolutely relevant parts of code. Anything not related to the question is already removed.

    – mumer91
    Mar 28 at 15:03











  • Your question could get more attention if you, for example, had post the real URL. That way we could verify the code and give you better alternative snippets.

    – accdias
    Mar 28 at 15:07












  • Since you are new on SO, I would suggest reading "How to ask".

    – accdias
    Mar 28 at 15:09












  • I want to keep my URLs anonymous as I have my reasons. It should be fairly obvious as I have obviously replaced them before posting the question.

    – mumer91
    Mar 28 at 15:10


















  • Please, check "How to create a Minimal, Complete, and Verifiable example".

    – accdias
    Mar 28 at 15:00






  • 1





    The question only contains absolutely relevant parts of code. Anything not related to the question is already removed.

    – mumer91
    Mar 28 at 15:03











  • Your question could get more attention if you, for example, had post the real URL. That way we could verify the code and give you better alternative snippets.

    – accdias
    Mar 28 at 15:07












  • Since you are new on SO, I would suggest reading "How to ask".

    – accdias
    Mar 28 at 15:09












  • I want to keep my URLs anonymous as I have my reasons. It should be fairly obvious as I have obviously replaced them before posting the question.

    – mumer91
    Mar 28 at 15:10

















Please, check "How to create a Minimal, Complete, and Verifiable example".

– accdias
Mar 28 at 15:00





Please, check "How to create a Minimal, Complete, and Verifiable example".

– accdias
Mar 28 at 15:00




1




1





The question only contains absolutely relevant parts of code. Anything not related to the question is already removed.

– mumer91
Mar 28 at 15:03





The question only contains absolutely relevant parts of code. Anything not related to the question is already removed.

– mumer91
Mar 28 at 15:03













Your question could get more attention if you, for example, had post the real URL. That way we could verify the code and give you better alternative snippets.

– accdias
Mar 28 at 15:07






Your question could get more attention if you, for example, had post the real URL. That way we could verify the code and give you better alternative snippets.

– accdias
Mar 28 at 15:07














Since you are new on SO, I would suggest reading "How to ask".

– accdias
Mar 28 at 15:09






Since you are new on SO, I would suggest reading "How to ask".

– accdias
Mar 28 at 15:09














I want to keep my URLs anonymous as I have my reasons. It should be fairly obvious as I have obviously replaced them before posting the question.

– mumer91
Mar 28 at 15:10






I want to keep my URLs anonymous as I have my reasons. It should be fairly obvious as I have obviously replaced them before posting the question.

– mumer91
Mar 28 at 15:10













2 Answers
2






active

oldest

votes


















0
















You can use the get() method to get the value of attrs from the targeted tag.



You should be able to do something like this:



if url_box.find('video'):
url = url_box.find('video').get('poster')
mp4 = ulr_box.find('span').get('data-url')
if url_box.find('img'):
url = url_box.find('img').get('src')





share|improve this answer

























  • This worked great. Thanks.

    – mumer91
    Mar 28 at 15:33


















0
















Decide which version you are dealing with and split accordingly:




firstVersion = '''<div class="player">
<video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
<span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
</div>'''

secondVersion = '''<div class="player">
<img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
</div>'''

def extractImageUrl(htmlInput):
imageUrl = ""
if "poster" in htmlInput:
imageUrl = htmlInput.split('poster="')[1].split('"')[0]
elif "src" in htmlInput:
imageUrl = htmlInput.split('src="')[1].split('"')[0]
return imageUrl






share|improve this answer



























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );














    draft saved

    draft discarded
















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55400733%2fprint-url-from-two-different-beautifulsoap-outputs%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0
















    You can use the get() method to get the value of attrs from the targeted tag.



    You should be able to do something like this:



    if url_box.find('video'):
    url = url_box.find('video').get('poster')
    mp4 = ulr_box.find('span').get('data-url')
    if url_box.find('img'):
    url = url_box.find('img').get('src')





    share|improve this answer

























    • This worked great. Thanks.

      – mumer91
      Mar 28 at 15:33















    0
















    You can use the get() method to get the value of attrs from the targeted tag.



    You should be able to do something like this:



    if url_box.find('video'):
    url = url_box.find('video').get('poster')
    mp4 = ulr_box.find('span').get('data-url')
    if url_box.find('img'):
    url = url_box.find('img').get('src')





    share|improve this answer

























    • This worked great. Thanks.

      – mumer91
      Mar 28 at 15:33













    0














    0










    0









    You can use the get() method to get the value of attrs from the targeted tag.



    You should be able to do something like this:



    if url_box.find('video'):
    url = url_box.find('video').get('poster')
    mp4 = ulr_box.find('span').get('data-url')
    if url_box.find('img'):
    url = url_box.find('img').get('src')





    share|improve this answer













    You can use the get() method to get the value of attrs from the targeted tag.



    You should be able to do something like this:



    if url_box.find('video'):
    url = url_box.find('video').get('poster')
    mp4 = ulr_box.find('span').get('data-url')
    if url_box.find('img'):
    url = url_box.find('img').get('src')






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Mar 28 at 15:11









    MaazMaaz

    1,4631 gold badge8 silver badges15 bronze badges




    1,4631 gold badge8 silver badges15 bronze badges















    • This worked great. Thanks.

      – mumer91
      Mar 28 at 15:33

















    • This worked great. Thanks.

      – mumer91
      Mar 28 at 15:33
















    This worked great. Thanks.

    – mumer91
    Mar 28 at 15:33





    This worked great. Thanks.

    – mumer91
    Mar 28 at 15:33













    0
















    Decide which version you are dealing with and split accordingly:




    firstVersion = '''<div class="player">
    <video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
    <span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
    </div>'''

    secondVersion = '''<div class="player">
    <img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
    </div>'''

    def extractImageUrl(htmlInput):
    imageUrl = ""
    if "poster" in htmlInput:
    imageUrl = htmlInput.split('poster="')[1].split('"')[0]
    elif "src" in htmlInput:
    imageUrl = htmlInput.split('src="')[1].split('"')[0]
    return imageUrl






    share|improve this answer





























      0
















      Decide which version you are dealing with and split accordingly:




      firstVersion = '''<div class="player">
      <video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
      <span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
      </div>'''

      secondVersion = '''<div class="player">
      <img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
      </div>'''

      def extractImageUrl(htmlInput):
      imageUrl = ""
      if "poster" in htmlInput:
      imageUrl = htmlInput.split('poster="')[1].split('"')[0]
      elif "src" in htmlInput:
      imageUrl = htmlInput.split('src="')[1].split('"')[0]
      return imageUrl






      share|improve this answer



























        0














        0










        0









        Decide which version you are dealing with and split accordingly:




        firstVersion = '''<div class="player">
        <video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
        <span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
        </div>'''

        secondVersion = '''<div class="player">
        <img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
        </div>'''

        def extractImageUrl(htmlInput):
        imageUrl = ""
        if "poster" in htmlInput:
        imageUrl = htmlInput.split('poster="')[1].split('"')[0]
        elif "src" in htmlInput:
        imageUrl = htmlInput.split('src="')[1].split('"')[0]
        return imageUrl






        share|improve this answer













        Decide which version you are dealing with and split accordingly:




        firstVersion = '''<div class="player">
        <video class="video-js vjs-fluid video-player" height="100%" id="some-player" poster="https://example.com/path/to/jpg/random.jpg" width="100%"></video>
        <span data-type="trailer-src" data-url="https://example.com/path/to/mp4/random.mp4"></span>
        </div>'''

        secondVersion = '''<div class="player">
        <img alt="Image description here" src="https://example.com/path/to/jpg/random.jpg"/>
        </div>'''

        def extractImageUrl(htmlInput):
        imageUrl = ""
        if "poster" in htmlInput:
        imageUrl = htmlInput.split('poster="')[1].split('"')[0]
        elif "src" in htmlInput:
        imageUrl = htmlInput.split('src="')[1].split('"')[0]
        return imageUrl







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 28 at 15:10









        druminodrumino

        654 bronze badges




        654 bronze badges































            draft saved

            draft discarded















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55400733%2fprint-url-from-two-different-beautifulsoap-outputs%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

            Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

            Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript