Google search results scraping gives 'Service Unavailable' errorIs it ok to scrape data from Google results?Error with Google search in Python: 503 Service UnavailableCant seem to make cookie authenticated requests using UnirestSet userAgent with selenium-webdriver/firefox (NodeJS)Where is `onconnect` event in express?Issue with respect to Scrapy due to Meta Refreshwhy there is different `User-Agent` for same browserWebsocket does not work over https, using socket io and redisCant get request payload in express js nodeupload file post request with JS and NodeJSErrors MIME Firebase and MDC NODEdata get requests from a website with unsupported browser error

Definition of 'vrit'

How can a warlock learn from a spellbook?

Why is it 出差去 and not 去出差?

Explicit song lyrics checker

What is the most suitable position for a bishop here?

「捨ててしまう」why is there two て’s used here?

Syntax and semantics of XDV commands (XeTeX)

Boundaries and Buddhism

How much steel armor can you wear and still be able to swim?

Story of a Witch Boy

I have found ports on my Samsung smart tv running a display service. What can I do with it?

Can the pre-order traversal of two different trees be the same even though they are different?

What does this Swiss black on yellow rectangular traffic sign with a symbol looking like a dart mean?

No shading in ContourPlot3D

How to best clean this sealed rotary encoder / volume knob?

Draw a symmetric alien head

In an emission spectrum, the limit of convergence at higher frequency corresponds to the first ionization energy

How can I prevent a user from copying files on another hard drive?

How can I restore a master database from its bak file?

What kind of chart is this?

Why do you need to heat the pan before heating the olive oil?

Is Newton's third law really correct?

"What is the maximum that Player 1 can win?"

How did Frodo know where the Bree village was?

Google search results scraping gives 'Service Unavailable' error

Is it ok to scrape data from Google results?Error with Google search in Python: 503 Service UnavailableCant seem to make cookie authenticated requests using UnirestSet userAgent with selenium-webdriver/firefox (NodeJS)Where is `onconnect` event in express?Issue with respect to Scrapy due to Meta Refreshwhy there is different `User-Agent` for same browserWebsocket does not work over https, using socket io and redisCant get request payload in express js nodeupload file post request with JS and NodeJSErrors MIME Firebase and MDC NODEdata get requests from a website with unsupported browser error

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I am trying to scrape Google search results using Cheerio in Node.js. I keep getting the '503 - Service Unavailable' error. A few requests give me proper responses but then this error pops up. I did read similar questions on stackoverflow but couldn't find an answer.

I've tried adding the user-agent and even set the proxies in the headers, but to no success.

How can I get around it if at all that can be done?

Appreciate any help!

Code:

const request = require("request");

var getPage = url => 
 return new Promise((resolve, reject) => 
 request(
 url: url,
 headers: 
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
 //host : "37.59.248.190",
 //port : 8080
 
 , (error, response, html) => 
 console.log(response.statusCode, response.statusMessage);
 if (!error && response.statusCode == 200) 
 resolve(response);
 else 
 reject(response);
 
 );
 );


module.exports = getPage;

edited Mar 25 at 10:47

Roberrrt

6,0072946

asked Mar 25 at 5:55

Rohit Mishra

447

1

can you post your code

– Janith
Mar 25 at 6:19

here you go

– Rohit Mishra
Mar 25 at 6:48

1

what is the URL you are passing

– Janith
Mar 25 at 6:50

const url = `https://www.google.com/search?q=$searchTerm&start=$page` I pick the search term and page number from variables declared earlier.

– Rohit Mishra
Mar 25 at 7:17

add a comment |

I've tried adding the user-agent and even set the proxies in the headers, but to no success.

How can I get around it if at all that can be done?

Appreciate any help!

Code:

const request = require("request");

var getPage = url => 
 return new Promise((resolve, reject) => 
 request(
 url: url,
 headers: 
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
 //host : "37.59.248.190",
 //port : 8080
 
 , (error, response, html) => 
 console.log(response.statusCode, response.statusMessage);
 if (!error && response.statusCode == 200) 
 resolve(response);
 else 
 reject(response);
 
 );
 );


module.exports = getPage;

edited Mar 25 at 10:47

Roberrrt

6,0072946

asked Mar 25 at 5:55

Rohit Mishra

447

1

can you post your code

– Janith
Mar 25 at 6:19

here you go

– Rohit Mishra
Mar 25 at 6:48

1

what is the URL you are passing

– Janith
Mar 25 at 6:50

const url = `https://www.google.com/search?q=$searchTerm&start=$page` I pick the search term and page number from variables declared earlier.

– Rohit Mishra
Mar 25 at 7:17

add a comment |

I've tried adding the user-agent and even set the proxies in the headers, but to no success.

How can I get around it if at all that can be done?

Appreciate any help!

Code:

const request = require("request");

var getPage = url => 
 return new Promise((resolve, reject) => 
 request(
 url: url,
 headers: 
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
 //host : "37.59.248.190",
 //port : 8080
 
 , (error, response, html) => 
 console.log(response.statusCode, response.statusMessage);
 if (!error && response.statusCode == 200) 
 resolve(response);
 else 
 reject(response);
 
 );
 );


module.exports = getPage;

edited Mar 25 at 10:47

Roberrrt

6,0072946

asked Mar 25 at 5:55

Rohit Mishra

447

I've tried adding the user-agent and even set the proxies in the headers, but to no success.

How can I get around it if at all that can be done?

Appreciate any help!

Code:

const request = require("request");

var getPage = url => 
 return new Promise((resolve, reject) => 
 request(
 url: url,
 headers: 
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
 //host : "37.59.248.190",
 //port : 8080
 
 , (error, response, html) => 
 console.log(response.statusCode, response.statusMessage);
 if (!error && response.statusCode == 200) 
 resolve(response);
 else 
 reject(response);
 
 );
 );


module.exports = getPage;

const request = require("request");

var getPage = url => 
 return new Promise((resolve, reject) => 
 request(
 url: url,
 headers: 
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
 //host : "37.59.248.190",
 //port : 8080
 
 , (error, response, html) => 
 console.log(response.statusCode, response.statusMessage);
 if (!error && response.statusCode == 200) 
 resolve(response);
 else 
 reject(response);
 
 );
 );


module.exports = getPage;

const request = require("request");

var getPage = url => 
 return new Promise((resolve, reject) => 
 request(
 url: url,
 headers: 
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
 //host : "37.59.248.190",
 //port : 8080
 
 , (error, response, html) => 
 console.log(response.statusCode, response.statusMessage);
 if (!error && response.statusCode == 200) 
 resolve(response);
 else 
 reject(response);
 
 );
 );


module.exports = getPage;

node.js web-scraping google-search

edited Mar 25 at 10:47

Roberrrt

6,0072946

asked Mar 25 at 5:55

Rohit Mishra

447

edited Mar 25 at 10:47

Roberrrt

6,0072946

asked Mar 25 at 5:55

Rohit Mishra

447

edited Mar 25 at 10:47

Roberrrt

6,0072946

edited Mar 25 at 10:47

Roberrrt

6,0072946

edited Mar 25 at 10:47

Roberrrt

6,0072946

asked Mar 25 at 5:55

Rohit Mishra

447

asked Mar 25 at 5:55

Rohit Mishra

447

asked Mar 25 at 5:55

Rohit Mishra

447

1

can you post your code

– Janith
Mar 25 at 6:19

here you go

– Rohit Mishra
Mar 25 at 6:48

1

what is the URL you are passing

– Janith
Mar 25 at 6:50

const url = `https://www.google.com/search?q=$searchTerm&start=$page` I pick the search term and page number from variables declared earlier.

– Rohit Mishra
Mar 25 at 7:17

add a comment |

1

can you post your code

– Janith
Mar 25 at 6:19

here you go

– Rohit Mishra
Mar 25 at 6:48

1

what is the URL you are passing

– Janith
Mar 25 at 6:50

const url = `https://www.google.com/search?q=$searchTerm&start=$page` I pick the search term and page number from variables declared earlier.

– Rohit Mishra
Mar 25 at 7:17

can you post your code

– Janith
Mar 25 at 6:19

here you go

– Rohit Mishra
Mar 25 at 6:48

what is the URL you are passing

– Janith
Mar 25 at 6:50

const url = `https://www.google.com/search?q=$searchTerm&start=$page` I pick the search term and page number from variables declared earlier.

– Rohit Mishra
Mar 25 at 7:17

add a comment |

1 Answer
1

active

oldest

votes

I have tried your code and it worked fine for me running it 20 times in a row with the same url.

Depending on the search term and the frequency of queries you do it may be the case that Google rejects serving your requests if it suspects irregular client activity. Some sources also state Google has mechanisms to detect scraping. Google may even block your IP if you have exceeded a certain amount of requests. See the following links for more information:

Error with Google search in Python: 503 Service Unavailable

Is it ok to scrape data from Google results?

https://security.stackexchange.com/questions/191470/how-does-google-protect-against-scraping

https://blog.hyperiongray.com/6-golden-rules-google-scraping/

edited Mar 25 at 10:45

answered Mar 25 at 10:38

Marcus

654517

I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!

– Rohit Mishra
Mar 25 at 12:50

1

Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search

– Marcus
Mar 25 at 14:44

yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!

– Rohit Mishra
Mar 26 at 6:00

I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?

– Rohit Mishra
Mar 28 at 4:59

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331946%2fgoogle-search-results-scraping-gives-service-unavailable-error%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I have tried your code and it worked fine for me running it 20 times in a row with the same url.

Error with Google search in Python: 503 Service Unavailable

Is it ok to scrape data from Google results?

https://security.stackexchange.com/questions/191470/how-does-google-protect-against-scraping

https://blog.hyperiongray.com/6-golden-rules-google-scraping/

edited Mar 25 at 10:45

answered Mar 25 at 10:38

Marcus

654517

I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!

– Rohit Mishra
Mar 25 at 12:50

1

Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search

– Marcus
Mar 25 at 14:44

yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!

– Rohit Mishra
Mar 26 at 6:00

I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?

– Rohit Mishra
Mar 28 at 4:59

add a comment |

I have tried your code and it worked fine for me running it 20 times in a row with the same url.

Error with Google search in Python: 503 Service Unavailable

Is it ok to scrape data from Google results?

https://security.stackexchange.com/questions/191470/how-does-google-protect-against-scraping

https://blog.hyperiongray.com/6-golden-rules-google-scraping/

edited Mar 25 at 10:45

answered Mar 25 at 10:38

Marcus

654517

I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!

– Rohit Mishra
Mar 25 at 12:50

1

Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search

– Marcus
Mar 25 at 14:44

yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!

– Rohit Mishra
Mar 26 at 6:00

I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?

– Rohit Mishra
Mar 28 at 4:59

add a comment |

I have tried your code and it worked fine for me running it 20 times in a row with the same url.

Error with Google search in Python: 503 Service Unavailable

Is it ok to scrape data from Google results?

https://security.stackexchange.com/questions/191470/how-does-google-protect-against-scraping

https://blog.hyperiongray.com/6-golden-rules-google-scraping/

edited Mar 25 at 10:45

answered Mar 25 at 10:38

Marcus

654517

I have tried your code and it worked fine for me running it 20 times in a row with the same url.

Error with Google search in Python: 503 Service Unavailable

Is it ok to scrape data from Google results?

https://security.stackexchange.com/questions/191470/how-does-google-protect-against-scraping

https://blog.hyperiongray.com/6-golden-rules-google-scraping/

edited Mar 25 at 10:45

answered Mar 25 at 10:38

Marcus

654517

edited Mar 25 at 10:45

answered Mar 25 at 10:38

Marcus

654517

answered Mar 25 at 10:38

Marcus

654517

answered Mar 25 at 10:38

Marcus

654517

I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!

– Rohit Mishra
Mar 25 at 12:50

1

Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search

– Marcus
Mar 25 at 14:44

yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!

– Rohit Mishra
Mar 26 at 6:00

I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?

– Rohit Mishra
Mar 28 at 4:59

add a comment |

I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!

– Rohit Mishra
Mar 25 at 12:50

1

Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search

– Marcus
Mar 25 at 14:44

yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!

– Rohit Mishra
Mar 26 at 6:00

I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?

– Rohit Mishra
Mar 28 at 4:59

I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!

– Rohit Mishra
Mar 25 at 12:50

Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search

– Marcus
Mar 25 at 14:44

yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!

– Rohit Mishra
Mar 26 at 6:00

I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?

– Rohit Mishra
Mar 28 at 4:59

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

1 Answer
1

1 Answer
1

1 Answer
1