Looking for a better solution to scrape multiple webpages with beautifulsoupScraping with BeautifulSoup and multiple paragraphsWeb scraping daily tables into CSV with BeautifulSoup in PythonScraping data from multiple previous dates in WSJ stocks websiteSimulate clicking a link when scraping with Python and BeautifulSoupBeautifulSoup Scraping: loading div instead of the contentScraping URLs using BeautifulSoupHow to scrape multiple pages with an unchanging URL - Python & BeautifulSoupHow can I scrape a site with multiple pages using beautifulsoup and python?Scrape table from HTML with beautifulsoupScrape results of multiple search pages with Beautiful Soup and Python

How should I mix small caps with digits or symbols?

tikz: 5 squares on a row, roman numbered 1 -> 5

Separate the element after every 2nd ',' and push into next row in bash

How to play vs. 1.e4 e5 2.Nf3 Nc6 3.Bc4 d6?

Parse a C++14 integer literal

How would a physicist explain this starship engine?

Is being an extrovert a necessary condition to be a manager?

Are there any nuances between "dismiss" and "ignore"?

Keeping the dodos out of the field

pwaS eht tirsf dna tasl setterl fo hace dorw

Difference in 1 user doing 1000 iterations and 1000 users doing 1 iteration in Load testing

What are the domains of the multiplication and unit morphisms of a monoid object?

Best practice for printing and evaluating formulas with the minimal coding

US F1 Visa grace period attending a conference

How do we explain the use of a software on a math paper?

List of lists elementwise greater/smaller than

How is dynamic resistance of a diode modeled for large voltage variations?

Department head said that group project may be rejected. How to mitigate?

400–430 degrees Celsius heated bath

What is metrics.roc_curve and metrics.auc measuring when I'm comparing binary data with probability estimates?

Gambler's Fallacy Dice

Salesforce bug enabled "Modify All"

Way of refund if scammed?

How can sister protect herself from impulse purchases with a credit card?



Looking for a better solution to scrape multiple webpages with beautifulsoup


Scraping with BeautifulSoup and multiple paragraphsWeb scraping daily tables into CSV with BeautifulSoup in PythonScraping data from multiple previous dates in WSJ stocks websiteSimulate clicking a link when scraping with Python and BeautifulSoupBeautifulSoup Scraping: loading div instead of the contentScraping URLs using BeautifulSoupHow to scrape multiple pages with an unchanging URL - Python & BeautifulSoupHow can I scrape a site with multiple pages using beautifulsoup and python?Scrape table from HTML with beautifulsoupScrape results of multiple search pages with Beautiful Soup and Python






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I am trying to scrape the results of sports game off a website. The website has all the results of all the games which is perfect, but they are on many pages. Each page represents one day and I am gathering data over many months of games so it will be quite a lot of urls to enter.



The way I set it up now is that i have a base url, and a list of the dates that I can append using a for loop. This way works fine, but I was curious if there was a better way before I enter the many many dates I will be scraping over.



 url = 'http://www.url.com?'

#this list would hold hundreds of dates
dates = ['month=11&day=1&year=2016', 'month=11&day=2&year=2016', ...]

for i in dates:
page = requests.get(url+i)
soup = BeautifulSoup(page.text, 'html.parser')

#and so on, this part works as intended









share|improve this question
























  • I mean, what you have looks already ok. What is the thing that bothers you with your current implementation? There are few improvements that can be done to improve the readability of the code, but it's not that essential. What's bothering you with your current implementation?

    – Rafael
    Mar 23 at 19:47











  • Whats bothering me is that I would have to manually enter hundreds of dates into the date list. I was just wondering if their was a more efficient way around that, If not then I would do it. any other tips for improvement would be appreciated!

    – Gmuersch
    Mar 23 at 19:49












  • Well, you are only interested in specific dates right? Do you have some constrains on the dates? Because you could automate the creation of the dates but I am not sure if you want to do this.

    – Rafael
    Mar 23 at 19:50












  • The dates have to be in the format of the dates list. As long as there isnt an obvious solution to automate this, I dont mind entering them. I just didnt want to do it this way if it would be considered bad practice.

    – Gmuersch
    Mar 23 at 19:55











  • Does the pages have some references to where the next page is, like a "next issue" link or something?

    – yorodm
    Mar 23 at 19:55

















0















I am trying to scrape the results of sports game off a website. The website has all the results of all the games which is perfect, but they are on many pages. Each page represents one day and I am gathering data over many months of games so it will be quite a lot of urls to enter.



The way I set it up now is that i have a base url, and a list of the dates that I can append using a for loop. This way works fine, but I was curious if there was a better way before I enter the many many dates I will be scraping over.



 url = 'http://www.url.com?'

#this list would hold hundreds of dates
dates = ['month=11&day=1&year=2016', 'month=11&day=2&year=2016', ...]

for i in dates:
page = requests.get(url+i)
soup = BeautifulSoup(page.text, 'html.parser')

#and so on, this part works as intended









share|improve this question
























  • I mean, what you have looks already ok. What is the thing that bothers you with your current implementation? There are few improvements that can be done to improve the readability of the code, but it's not that essential. What's bothering you with your current implementation?

    – Rafael
    Mar 23 at 19:47











  • Whats bothering me is that I would have to manually enter hundreds of dates into the date list. I was just wondering if their was a more efficient way around that, If not then I would do it. any other tips for improvement would be appreciated!

    – Gmuersch
    Mar 23 at 19:49












  • Well, you are only interested in specific dates right? Do you have some constrains on the dates? Because you could automate the creation of the dates but I am not sure if you want to do this.

    – Rafael
    Mar 23 at 19:50












  • The dates have to be in the format of the dates list. As long as there isnt an obvious solution to automate this, I dont mind entering them. I just didnt want to do it this way if it would be considered bad practice.

    – Gmuersch
    Mar 23 at 19:55











  • Does the pages have some references to where the next page is, like a "next issue" link or something?

    – yorodm
    Mar 23 at 19:55













0












0








0








I am trying to scrape the results of sports game off a website. The website has all the results of all the games which is perfect, but they are on many pages. Each page represents one day and I am gathering data over many months of games so it will be quite a lot of urls to enter.



The way I set it up now is that i have a base url, and a list of the dates that I can append using a for loop. This way works fine, but I was curious if there was a better way before I enter the many many dates I will be scraping over.



 url = 'http://www.url.com?'

#this list would hold hundreds of dates
dates = ['month=11&day=1&year=2016', 'month=11&day=2&year=2016', ...]

for i in dates:
page = requests.get(url+i)
soup = BeautifulSoup(page.text, 'html.parser')

#and so on, this part works as intended









share|improve this question
















I am trying to scrape the results of sports game off a website. The website has all the results of all the games which is perfect, but they are on many pages. Each page represents one day and I am gathering data over many months of games so it will be quite a lot of urls to enter.



The way I set it up now is that i have a base url, and a list of the dates that I can append using a for loop. This way works fine, but I was curious if there was a better way before I enter the many many dates I will be scraping over.



 url = 'http://www.url.com?'

#this list would hold hundreds of dates
dates = ['month=11&day=1&year=2016', 'month=11&day=2&year=2016', ...]

for i in dates:
page = requests.get(url+i)
soup = BeautifulSoup(page.text, 'html.parser')

#and so on, this part works as intended






python beautifulsoup






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 23 at 19:50







Gmuersch

















asked Mar 23 at 19:40









GmuerschGmuersch

11




11












  • I mean, what you have looks already ok. What is the thing that bothers you with your current implementation? There are few improvements that can be done to improve the readability of the code, but it's not that essential. What's bothering you with your current implementation?

    – Rafael
    Mar 23 at 19:47











  • Whats bothering me is that I would have to manually enter hundreds of dates into the date list. I was just wondering if their was a more efficient way around that, If not then I would do it. any other tips for improvement would be appreciated!

    – Gmuersch
    Mar 23 at 19:49












  • Well, you are only interested in specific dates right? Do you have some constrains on the dates? Because you could automate the creation of the dates but I am not sure if you want to do this.

    – Rafael
    Mar 23 at 19:50












  • The dates have to be in the format of the dates list. As long as there isnt an obvious solution to automate this, I dont mind entering them. I just didnt want to do it this way if it would be considered bad practice.

    – Gmuersch
    Mar 23 at 19:55











  • Does the pages have some references to where the next page is, like a "next issue" link or something?

    – yorodm
    Mar 23 at 19:55

















  • I mean, what you have looks already ok. What is the thing that bothers you with your current implementation? There are few improvements that can be done to improve the readability of the code, but it's not that essential. What's bothering you with your current implementation?

    – Rafael
    Mar 23 at 19:47











  • Whats bothering me is that I would have to manually enter hundreds of dates into the date list. I was just wondering if their was a more efficient way around that, If not then I would do it. any other tips for improvement would be appreciated!

    – Gmuersch
    Mar 23 at 19:49












  • Well, you are only interested in specific dates right? Do you have some constrains on the dates? Because you could automate the creation of the dates but I am not sure if you want to do this.

    – Rafael
    Mar 23 at 19:50












  • The dates have to be in the format of the dates list. As long as there isnt an obvious solution to automate this, I dont mind entering them. I just didnt want to do it this way if it would be considered bad practice.

    – Gmuersch
    Mar 23 at 19:55











  • Does the pages have some references to where the next page is, like a "next issue" link or something?

    – yorodm
    Mar 23 at 19:55
















I mean, what you have looks already ok. What is the thing that bothers you with your current implementation? There are few improvements that can be done to improve the readability of the code, but it's not that essential. What's bothering you with your current implementation?

– Rafael
Mar 23 at 19:47





I mean, what you have looks already ok. What is the thing that bothers you with your current implementation? There are few improvements that can be done to improve the readability of the code, but it's not that essential. What's bothering you with your current implementation?

– Rafael
Mar 23 at 19:47













Whats bothering me is that I would have to manually enter hundreds of dates into the date list. I was just wondering if their was a more efficient way around that, If not then I would do it. any other tips for improvement would be appreciated!

– Gmuersch
Mar 23 at 19:49






Whats bothering me is that I would have to manually enter hundreds of dates into the date list. I was just wondering if their was a more efficient way around that, If not then I would do it. any other tips for improvement would be appreciated!

– Gmuersch
Mar 23 at 19:49














Well, you are only interested in specific dates right? Do you have some constrains on the dates? Because you could automate the creation of the dates but I am not sure if you want to do this.

– Rafael
Mar 23 at 19:50






Well, you are only interested in specific dates right? Do you have some constrains on the dates? Because you could automate the creation of the dates but I am not sure if you want to do this.

– Rafael
Mar 23 at 19:50














The dates have to be in the format of the dates list. As long as there isnt an obvious solution to automate this, I dont mind entering them. I just didnt want to do it this way if it would be considered bad practice.

– Gmuersch
Mar 23 at 19:55





The dates have to be in the format of the dates list. As long as there isnt an obvious solution to automate this, I dont mind entering them. I just didnt want to do it this way if it would be considered bad practice.

– Gmuersch
Mar 23 at 19:55













Does the pages have some references to where the next page is, like a "next issue" link or something?

– yorodm
Mar 23 at 19:55





Does the pages have some references to where the next page is, like a "next issue" link or something?

– yorodm
Mar 23 at 19:55












1 Answer
1






active

oldest

votes


















0














If you really wish to search on every single day, then datetime and timedelta can be used to iterate over all the possible days. Give it a starting date, this can then be advanced one day at a time until an end date (which could be datetime.now() for today):



from datetime import datetime, timedelta

base_url = "http://www.url.com?month=&day=&year="

search_date = datetime(2016, 11, 1)
end_date = datetime(2017, 1, 1)
one_day = timedelta(days=1)

while search_date < end_date:
url = base_url.format(search_date.month, search_date.day, search_date.year)
print(url)

page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')

search_date += one_day


This would give you something like:



http://www.url.com?month=11&day=1&year=2016
http://www.url.com?month=11&day=2&year=2016
http://www.url.com?month=11&day=3&year=2016
http://www.url.com?month=11&day=4&year=2016
.
.
.
http://www.url.com?month=12&day=29&year=2016
http://www.url.com?month=12&day=30&year=2016
http://www.url.com?month=12&day=31&year=2016


A better approach though would be to make use of the next link on your page. For this the URL of the actual page would be needed. BeautifulSoup could then be used to easily extract the link.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55317655%2flooking-for-a-better-solution-to-scrape-multiple-webpages-with-beautifulsoup%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    If you really wish to search on every single day, then datetime and timedelta can be used to iterate over all the possible days. Give it a starting date, this can then be advanced one day at a time until an end date (which could be datetime.now() for today):



    from datetime import datetime, timedelta

    base_url = "http://www.url.com?month=&day=&year="

    search_date = datetime(2016, 11, 1)
    end_date = datetime(2017, 1, 1)
    one_day = timedelta(days=1)

    while search_date < end_date:
    url = base_url.format(search_date.month, search_date.day, search_date.year)
    print(url)

    page = requests.get(url)
    soup = BeautifulSoup(page.text, 'html.parser')

    search_date += one_day


    This would give you something like:



    http://www.url.com?month=11&day=1&year=2016
    http://www.url.com?month=11&day=2&year=2016
    http://www.url.com?month=11&day=3&year=2016
    http://www.url.com?month=11&day=4&year=2016
    .
    .
    .
    http://www.url.com?month=12&day=29&year=2016
    http://www.url.com?month=12&day=30&year=2016
    http://www.url.com?month=12&day=31&year=2016


    A better approach though would be to make use of the next link on your page. For this the URL of the actual page would be needed. BeautifulSoup could then be used to easily extract the link.






    share|improve this answer



























      0














      If you really wish to search on every single day, then datetime and timedelta can be used to iterate over all the possible days. Give it a starting date, this can then be advanced one day at a time until an end date (which could be datetime.now() for today):



      from datetime import datetime, timedelta

      base_url = "http://www.url.com?month=&day=&year="

      search_date = datetime(2016, 11, 1)
      end_date = datetime(2017, 1, 1)
      one_day = timedelta(days=1)

      while search_date < end_date:
      url = base_url.format(search_date.month, search_date.day, search_date.year)
      print(url)

      page = requests.get(url)
      soup = BeautifulSoup(page.text, 'html.parser')

      search_date += one_day


      This would give you something like:



      http://www.url.com?month=11&day=1&year=2016
      http://www.url.com?month=11&day=2&year=2016
      http://www.url.com?month=11&day=3&year=2016
      http://www.url.com?month=11&day=4&year=2016
      .
      .
      .
      http://www.url.com?month=12&day=29&year=2016
      http://www.url.com?month=12&day=30&year=2016
      http://www.url.com?month=12&day=31&year=2016


      A better approach though would be to make use of the next link on your page. For this the URL of the actual page would be needed. BeautifulSoup could then be used to easily extract the link.






      share|improve this answer

























        0












        0








        0







        If you really wish to search on every single day, then datetime and timedelta can be used to iterate over all the possible days. Give it a starting date, this can then be advanced one day at a time until an end date (which could be datetime.now() for today):



        from datetime import datetime, timedelta

        base_url = "http://www.url.com?month=&day=&year="

        search_date = datetime(2016, 11, 1)
        end_date = datetime(2017, 1, 1)
        one_day = timedelta(days=1)

        while search_date < end_date:
        url = base_url.format(search_date.month, search_date.day, search_date.year)
        print(url)

        page = requests.get(url)
        soup = BeautifulSoup(page.text, 'html.parser')

        search_date += one_day


        This would give you something like:



        http://www.url.com?month=11&day=1&year=2016
        http://www.url.com?month=11&day=2&year=2016
        http://www.url.com?month=11&day=3&year=2016
        http://www.url.com?month=11&day=4&year=2016
        .
        .
        .
        http://www.url.com?month=12&day=29&year=2016
        http://www.url.com?month=12&day=30&year=2016
        http://www.url.com?month=12&day=31&year=2016


        A better approach though would be to make use of the next link on your page. For this the URL of the actual page would be needed. BeautifulSoup could then be used to easily extract the link.






        share|improve this answer













        If you really wish to search on every single day, then datetime and timedelta can be used to iterate over all the possible days. Give it a starting date, this can then be advanced one day at a time until an end date (which could be datetime.now() for today):



        from datetime import datetime, timedelta

        base_url = "http://www.url.com?month=&day=&year="

        search_date = datetime(2016, 11, 1)
        end_date = datetime(2017, 1, 1)
        one_day = timedelta(days=1)

        while search_date < end_date:
        url = base_url.format(search_date.month, search_date.day, search_date.year)
        print(url)

        page = requests.get(url)
        soup = BeautifulSoup(page.text, 'html.parser')

        search_date += one_day


        This would give you something like:



        http://www.url.com?month=11&day=1&year=2016
        http://www.url.com?month=11&day=2&year=2016
        http://www.url.com?month=11&day=3&year=2016
        http://www.url.com?month=11&day=4&year=2016
        .
        .
        .
        http://www.url.com?month=12&day=29&year=2016
        http://www.url.com?month=12&day=30&year=2016
        http://www.url.com?month=12&day=31&year=2016


        A better approach though would be to make use of the next link on your page. For this the URL of the actual page would be needed. BeautifulSoup could then be used to easily extract the link.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Apr 3 at 10:11









        Martin EvansMartin Evans

        29.1k133357




        29.1k133357





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55317655%2flooking-for-a-better-solution-to-scrape-multiple-webpages-with-beautifulsoup%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

            용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh

            155 수학 과학 기타 둘러보기 메뉴eh추가해eh문서를 완성해