Looking for a better solution to scrape multiple webpages with beautifulsoupScraping with BeautifulSoup and multiple paragraphsWeb scraping daily tables into CSV with BeautifulSoup in PythonScraping data from multiple previous dates in WSJ stocks websiteSimulate clicking a link when scraping with Python and BeautifulSoupBeautifulSoup Scraping: loading div instead of the contentScraping URLs using BeautifulSoupHow to scrape multiple pages with an unchanging URL - Python & BeautifulSoupHow can I scrape a site with multiple pages using beautifulsoup and python?Scrape table from HTML with beautifulsoupScrape results of multiple search pages with Beautiful Soup and Python

How should I mix small caps with digits or symbols?

tikz: 5 squares on a row, roman numbered 1 -> 5

Separate the element after every 2nd ',' and push into next row in bash

How to play vs. 1.e4 e5 2.Nf3 Nc6 3.Bc4 d6?

Parse a C++14 integer literal

How would a physicist explain this starship engine?

Is being an extrovert a necessary condition to be a manager?

Are there any nuances between "dismiss" and "ignore"?

Keeping the dodos out of the field

pwaS eht tirsf dna tasl setterl fo hace dorw

Difference in 1 user doing 1000 iterations and 1000 users doing 1 iteration in Load testing

What are the domains of the multiplication and unit morphisms of a monoid object?

Best practice for printing and evaluating formulas with the minimal coding

US F1 Visa grace period attending a conference

How do we explain the use of a software on a math paper?

List of lists elementwise greater/smaller than

How is dynamic resistance of a diode modeled for large voltage variations?

Department head said that group project may be rejected. How to mitigate?

400–430 degrees Celsius heated bath

What is metrics.roc_curve and metrics.auc measuring when I'm comparing binary data with probability estimates?

Gambler's Fallacy Dice

Salesforce bug enabled "Modify All"

Way of refund if scammed?

How can sister protect herself from impulse purchases with a credit card?

Looking for a better solution to scrape multiple webpages with beautifulsoup

Scraping with BeautifulSoup and multiple paragraphsWeb scraping daily tables into CSV with BeautifulSoup in PythonScraping data from multiple previous dates in WSJ stocks websiteSimulate clicking a link when scraping with Python and BeautifulSoupBeautifulSoup Scraping: loading div instead of the contentScraping URLs using BeautifulSoupHow to scrape multiple pages with an unchanging URL - Python & BeautifulSoupHow can I scrape a site with multiple pages using beautifulsoup and python?Scrape table from HTML with beautifulsoupScrape results of multiple search pages with Beautiful Soup and Python

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I am trying to scrape the results of sports game off a website. The website has all the results of all the games which is perfect, but they are on many pages. Each page represents one day and I am gathering data over many months of games so it will be quite a lot of urls to enter.

The way I set it up now is that i have a base url, and a list of the dates that I can append using a for loop. This way works fine, but I was curious if there was a better way before I enter the many many dates I will be scraping over.

 url = 'http://www.url.com?'

 #this list would hold hundreds of dates
 dates = ['month=11&day=1&year=2016', 'month=11&day=2&year=2016', ...]

 for i in dates:
 page = requests.get(url+i)
 soup = BeautifulSoup(page.text, 'html.parser')

 #and so on, this part works as intended

edited Mar 23 at 19:50

asked Mar 23 at 19:40

Gmuersch

I mean, what you have looks already ok. What is the thing that bothers you with your current implementation? There are few improvements that can be done to improve the readability of the code, but it's not that essential. What's bothering you with your current implementation?

– Rafael
Mar 23 at 19:47

Whats bothering me is that I would have to manually enter hundreds of dates into the date list. I was just wondering if their was a more efficient way around that, If not then I would do it. any other tips for improvement would be appreciated!

– Gmuersch
Mar 23 at 19:49

Well, you are only interested in specific dates right? Do you have some constrains on the dates? Because you could automate the creation of the dates but I am not sure if you want to do this.

– Rafael
Mar 23 at 19:50

The dates have to be in the format of the dates list. As long as there isnt an obvious solution to automate this, I dont mind entering them. I just didnt want to do it this way if it would be considered bad practice.

– Gmuersch
Mar 23 at 19:55

Does the pages have some references to where the next page is, like a "next issue" link or something?

– yorodm
Mar 23 at 19:55

|
show 1 more comment

 url = 'http://www.url.com?'

 #this list would hold hundreds of dates
 dates = ['month=11&day=1&year=2016', 'month=11&day=2&year=2016', ...]

 for i in dates:
 page = requests.get(url+i)
 soup = BeautifulSoup(page.text, 'html.parser')

 #and so on, this part works as intended

edited Mar 23 at 19:50

asked Mar 23 at 19:40

Gmuersch

I mean, what you have looks already ok. What is the thing that bothers you with your current implementation? There are few improvements that can be done to improve the readability of the code, but it's not that essential. What's bothering you with your current implementation?

– Rafael
Mar 23 at 19:47

Whats bothering me is that I would have to manually enter hundreds of dates into the date list. I was just wondering if their was a more efficient way around that, If not then I would do it. any other tips for improvement would be appreciated!

– Gmuersch
Mar 23 at 19:49

Well, you are only interested in specific dates right? Do you have some constrains on the dates? Because you could automate the creation of the dates but I am not sure if you want to do this.

– Rafael
Mar 23 at 19:50

The dates have to be in the format of the dates list. As long as there isnt an obvious solution to automate this, I dont mind entering them. I just didnt want to do it this way if it would be considered bad practice.

– Gmuersch
Mar 23 at 19:55

Does the pages have some references to where the next page is, like a "next issue" link or something?

– yorodm
Mar 23 at 19:55

|
show 1 more comment

 url = 'http://www.url.com?'

 #this list would hold hundreds of dates
 dates = ['month=11&day=1&year=2016', 'month=11&day=2&year=2016', ...]

 for i in dates:
 page = requests.get(url+i)
 soup = BeautifulSoup(page.text, 'html.parser')

 #and so on, this part works as intended

edited Mar 23 at 19:50

asked Mar 23 at 19:40

Gmuersch

 url = 'http://www.url.com?'

 #this list would hold hundreds of dates
 dates = ['month=11&day=1&year=2016', 'month=11&day=2&year=2016', ...]

 for i in dates:
 page = requests.get(url+i)
 soup = BeautifulSoup(page.text, 'html.parser')

 #and so on, this part works as intended

python beautifulsoup

edited Mar 23 at 19:50

asked Mar 23 at 19:40

Gmuersch

edited Mar 23 at 19:50

asked Mar 23 at 19:40

Gmuersch

edited Mar 23 at 19:50

asked Mar 23 at 19:40

Gmuersch

asked Mar 23 at 19:40

Gmuersch

asked Mar 23 at 19:40

Gmuersch

I mean, what you have looks already ok. What is the thing that bothers you with your current implementation? There are few improvements that can be done to improve the readability of the code, but it's not that essential. What's bothering you with your current implementation?

– Rafael
Mar 23 at 19:47

Whats bothering me is that I would have to manually enter hundreds of dates into the date list. I was just wondering if their was a more efficient way around that, If not then I would do it. any other tips for improvement would be appreciated!

– Gmuersch
Mar 23 at 19:49

Well, you are only interested in specific dates right? Do you have some constrains on the dates? Because you could automate the creation of the dates but I am not sure if you want to do this.

– Rafael
Mar 23 at 19:50

The dates have to be in the format of the dates list. As long as there isnt an obvious solution to automate this, I dont mind entering them. I just didnt want to do it this way if it would be considered bad practice.

– Gmuersch
Mar 23 at 19:55

Does the pages have some references to where the next page is, like a "next issue" link or something?

– yorodm
Mar 23 at 19:55

|
show 1 more comment

I mean, what you have looks already ok. What is the thing that bothers you with your current implementation? There are few improvements that can be done to improve the readability of the code, but it's not that essential. What's bothering you with your current implementation?

– Rafael
Mar 23 at 19:47

Whats bothering me is that I would have to manually enter hundreds of dates into the date list. I was just wondering if their was a more efficient way around that, If not then I would do it. any other tips for improvement would be appreciated!

– Gmuersch
Mar 23 at 19:49

Well, you are only interested in specific dates right? Do you have some constrains on the dates? Because you could automate the creation of the dates but I am not sure if you want to do this.

– Rafael
Mar 23 at 19:50

The dates have to be in the format of the dates list. As long as there isnt an obvious solution to automate this, I dont mind entering them. I just didnt want to do it this way if it would be considered bad practice.

– Gmuersch
Mar 23 at 19:55

Does the pages have some references to where the next page is, like a "next issue" link or something?

– yorodm
Mar 23 at 19:55

I mean, what you have looks already ok. What is the thing that bothers you with your current implementation? There are few improvements that can be done to improve the readability of the code, but it's not that essential. What's bothering you with your current implementation?

– Rafael
Mar 23 at 19:47

Whats bothering me is that I would have to manually enter hundreds of dates into the date list. I was just wondering if their was a more efficient way around that, If not then I would do it. any other tips for improvement would be appreciated!

– Gmuersch
Mar 23 at 19:49

Well, you are only interested in specific dates right? Do you have some constrains on the dates? Because you could automate the creation of the dates but I am not sure if you want to do this.

– Rafael
Mar 23 at 19:50

The dates have to be in the format of the dates list. As long as there isnt an obvious solution to automate this, I dont mind entering them. I just didnt want to do it this way if it would be considered bad practice.

– Gmuersch
Mar 23 at 19:55

Does the pages have some references to where the next page is, like a "next issue" link or something?

– yorodm
Mar 23 at 19:55

|
show 1 more comment

1 Answer
1

active

oldest

votes

If you really wish to search on every single day, then datetime and timedelta can be used to iterate over all the possible days. Give it a starting date, this can then be advanced one day at a time until an end date (which could be datetime.now() for today):

from datetime import datetime, timedelta

base_url = "http://www.url.com?month=&day=&year="

search_date = datetime(2016, 11, 1)
end_date = datetime(2017, 1, 1)
one_day = timedelta(days=1)

while search_date < end_date:
 url = base_url.format(search_date.month, search_date.day, search_date.year)
 print(url)

 page = requests.get(url)
 soup = BeautifulSoup(page.text, 'html.parser')

 search_date += one_day

This would give you something like:

http://www.url.com?month=11&day=1&year=2016
http://www.url.com?month=11&day=2&year=2016
http://www.url.com?month=11&day=3&year=2016
http://www.url.com?month=11&day=4&year=2016
.
.
.
http://www.url.com?month=12&day=29&year=2016
http://www.url.com?month=12&day=30&year=2016
http://www.url.com?month=12&day=31&year=2016

A better approach though would be to make use of the next link on your page. For this the URL of the actual page would be needed. BeautifulSoup could then be used to easily extract the link.

answered Apr 3 at 10:11

Martin Evans

29.1k133357

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55317655%2flooking-for-a-better-solution-to-scrape-multiple-webpages-with-beautifulsoup%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

from datetime import datetime, timedelta

base_url = "http://www.url.com?month=&day=&year="

search_date = datetime(2016, 11, 1)
end_date = datetime(2017, 1, 1)
one_day = timedelta(days=1)

while search_date < end_date:
 url = base_url.format(search_date.month, search_date.day, search_date.year)
 print(url)

 page = requests.get(url)
 soup = BeautifulSoup(page.text, 'html.parser')

 search_date += one_day

This would give you something like:

http://www.url.com?month=11&day=1&year=2016
http://www.url.com?month=11&day=2&year=2016
http://www.url.com?month=11&day=3&year=2016
http://www.url.com?month=11&day=4&year=2016
.
.
.
http://www.url.com?month=12&day=29&year=2016
http://www.url.com?month=12&day=30&year=2016
http://www.url.com?month=12&day=31&year=2016

A better approach though would be to make use of the next link on your page. For this the URL of the actual page would be needed. BeautifulSoup could then be used to easily extract the link.

answered Apr 3 at 10:11

Martin Evans

29.1k133357

add a comment |

from datetime import datetime, timedelta

base_url = "http://www.url.com?month=&day=&year="

search_date = datetime(2016, 11, 1)
end_date = datetime(2017, 1, 1)
one_day = timedelta(days=1)

while search_date < end_date:
 url = base_url.format(search_date.month, search_date.day, search_date.year)
 print(url)

 page = requests.get(url)
 soup = BeautifulSoup(page.text, 'html.parser')

 search_date += one_day

This would give you something like:

http://www.url.com?month=11&day=1&year=2016
http://www.url.com?month=11&day=2&year=2016
http://www.url.com?month=11&day=3&year=2016
http://www.url.com?month=11&day=4&year=2016
.
.
.
http://www.url.com?month=12&day=29&year=2016
http://www.url.com?month=12&day=30&year=2016
http://www.url.com?month=12&day=31&year=2016

A better approach though would be to make use of the next link on your page. For this the URL of the actual page would be needed. BeautifulSoup could then be used to easily extract the link.

answered Apr 3 at 10:11

Martin Evans

29.1k133357

add a comment |

from datetime import datetime, timedelta

base_url = "http://www.url.com?month=&day=&year="

search_date = datetime(2016, 11, 1)
end_date = datetime(2017, 1, 1)
one_day = timedelta(days=1)

while search_date < end_date:
 url = base_url.format(search_date.month, search_date.day, search_date.year)
 print(url)

 page = requests.get(url)
 soup = BeautifulSoup(page.text, 'html.parser')

 search_date += one_day

This would give you something like:

http://www.url.com?month=11&day=1&year=2016
http://www.url.com?month=11&day=2&year=2016
http://www.url.com?month=11&day=3&year=2016
http://www.url.com?month=11&day=4&year=2016
.
.
.
http://www.url.com?month=12&day=29&year=2016
http://www.url.com?month=12&day=30&year=2016
http://www.url.com?month=12&day=31&year=2016

A better approach though would be to make use of the next link on your page. For this the URL of the actual page would be needed. BeautifulSoup could then be used to easily extract the link.

answered Apr 3 at 10:11

Martin Evans

29.1k133357

from datetime import datetime, timedelta

base_url = "http://www.url.com?month=&day=&year="

search_date = datetime(2016, 11, 1)
end_date = datetime(2017, 1, 1)
one_day = timedelta(days=1)

while search_date < end_date:
 url = base_url.format(search_date.month, search_date.day, search_date.year)
 print(url)

 page = requests.get(url)
 soup = BeautifulSoup(page.text, 'html.parser')

 search_date += one_day

This would give you something like:

http://www.url.com?month=11&day=1&year=2016
http://www.url.com?month=11&day=2&year=2016
http://www.url.com?month=11&day=3&year=2016
http://www.url.com?month=11&day=4&year=2016
.
.
.
http://www.url.com?month=12&day=29&year=2016
http://www.url.com?month=12&day=30&year=2016
http://www.url.com?month=12&day=31&year=2016

A better approach though would be to make use of the next link on your page. For this the URL of the actual page would be needed. BeautifulSoup could then be used to easily extract the link.

answered Apr 3 at 10:11

Martin Evans

29.1k133357

answered Apr 3 at 10:11

Martin Evans

29.1k133357

answered Apr 3 at 10:11

Martin Evans

29.1k133357

answered Apr 3 at 10:11

Martin Evans

29.1k133357

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer
1

1 Answer
1

1 Answer
1