Google search results scraping gives 'Service Unavailable' errorIs it ok to scrape data from Google results?Error with Google search in Python: 503 Service UnavailableCant seem to make cookie authenticated requests using UnirestSet userAgent with selenium-webdriver/firefox (NodeJS)Where is `onconnect` event in express?Issue with respect to Scrapy due to Meta Refreshwhy there is different `User-Agent` for same browserWebsocket does not work over https, using socket io and redisCant get request payload in express js nodeupload file post request with JS and NodeJSErrors MIME Firebase and MDC NODEdata get requests from a website with unsupported browser error

Definition of 'vrit'

How can a warlock learn from a spellbook?

Why is it 出差去 and not 去出差?

Explicit song lyrics checker

What is the most suitable position for a bishop here?

「捨ててしまう」why is there two て’s used here?

Syntax and semantics of XDV commands (XeTeX)

Boundaries and Buddhism

How much steel armor can you wear and still be able to swim?

Story of a Witch Boy

I have found ports on my Samsung smart tv running a display service. What can I do with it?

Can the pre-order traversal of two different trees be the same even though they are different?

What does this Swiss black on yellow rectangular traffic sign with a symbol looking like a dart mean?

No shading in ContourPlot3D

How to best clean this sealed rotary encoder / volume knob?

Draw a symmetric alien head

In an emission spectrum, the limit of convergence at higher frequency corresponds to the first ionization energy

How can I prevent a user from copying files on another hard drive?

How can I restore a master database from its bak file?

What kind of chart is this?

Why do you need to heat the pan before heating the olive oil?

Is Newton's third law really correct?

"What is the maximum that Player 1 can win?"

How did Frodo know where the Bree village was?



Google search results scraping gives 'Service Unavailable' error


Is it ok to scrape data from Google results?Error with Google search in Python: 503 Service UnavailableCant seem to make cookie authenticated requests using UnirestSet userAgent with selenium-webdriver/firefox (NodeJS)Where is `onconnect` event in express?Issue with respect to Scrapy due to Meta Refreshwhy there is different `User-Agent` for same browserWebsocket does not work over https, using socket io and redisCant get request payload in express js nodeupload file post request with JS and NodeJSErrors MIME Firebase and MDC NODEdata get requests from a website with unsupported browser error






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















I am trying to scrape Google search results using Cheerio in Node.js. I keep getting the '503 - Service Unavailable' error. A few requests give me proper responses but then this error pops up. I did read similar questions on stackoverflow but couldn't find an answer.



I've tried adding the user-agent and even set the proxies in the headers, but to no success.



How can I get around it if at all that can be done?



Appreciate any help!



Code:






const request = require("request");

var getPage = url =>
return new Promise((resolve, reject) =>
request(
url: url,
headers:
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
//host : "37.59.248.190",
//port : 8080

, (error, response, html) =>
console.log(response.statusCode, response.statusMessage);
if (!error && response.statusCode == 200)
resolve(response);
else
reject(response);

);
);


module.exports = getPage;












share|improve this question



















  • 1





    can you post your code

    – Janith
    Mar 25 at 6:19











  • here you go

    – Rohit Mishra
    Mar 25 at 6:48







  • 1





    what is the URL you are passing

    – Janith
    Mar 25 at 6:50











  • const url = `https://www.google.com/search?q=$searchTerm&start=$page` I pick the search term and page number from variables declared earlier.

    – Rohit Mishra
    Mar 25 at 7:17


















1















I am trying to scrape Google search results using Cheerio in Node.js. I keep getting the '503 - Service Unavailable' error. A few requests give me proper responses but then this error pops up. I did read similar questions on stackoverflow but couldn't find an answer.



I've tried adding the user-agent and even set the proxies in the headers, but to no success.



How can I get around it if at all that can be done?



Appreciate any help!



Code:






const request = require("request");

var getPage = url =>
return new Promise((resolve, reject) =>
request(
url: url,
headers:
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
//host : "37.59.248.190",
//port : 8080

, (error, response, html) =>
console.log(response.statusCode, response.statusMessage);
if (!error && response.statusCode == 200)
resolve(response);
else
reject(response);

);
);


module.exports = getPage;












share|improve this question



















  • 1





    can you post your code

    – Janith
    Mar 25 at 6:19











  • here you go

    – Rohit Mishra
    Mar 25 at 6:48







  • 1





    what is the URL you are passing

    – Janith
    Mar 25 at 6:50











  • const url = `https://www.google.com/search?q=$searchTerm&start=$page` I pick the search term and page number from variables declared earlier.

    – Rohit Mishra
    Mar 25 at 7:17














1












1








1








I am trying to scrape Google search results using Cheerio in Node.js. I keep getting the '503 - Service Unavailable' error. A few requests give me proper responses but then this error pops up. I did read similar questions on stackoverflow but couldn't find an answer.



I've tried adding the user-agent and even set the proxies in the headers, but to no success.



How can I get around it if at all that can be done?



Appreciate any help!



Code:






const request = require("request");

var getPage = url =>
return new Promise((resolve, reject) =>
request(
url: url,
headers:
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
//host : "37.59.248.190",
//port : 8080

, (error, response, html) =>
console.log(response.statusCode, response.statusMessage);
if (!error && response.statusCode == 200)
resolve(response);
else
reject(response);

);
);


module.exports = getPage;












share|improve this question
















I am trying to scrape Google search results using Cheerio in Node.js. I keep getting the '503 - Service Unavailable' error. A few requests give me proper responses but then this error pops up. I did read similar questions on stackoverflow but couldn't find an answer.



I've tried adding the user-agent and even set the proxies in the headers, but to no success.



How can I get around it if at all that can be done?



Appreciate any help!



Code:






const request = require("request");

var getPage = url =>
return new Promise((resolve, reject) =>
request(
url: url,
headers:
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
//host : "37.59.248.190",
//port : 8080

, (error, response, html) =>
console.log(response.statusCode, response.statusMessage);
if (!error && response.statusCode == 200)
resolve(response);
else
reject(response);

);
);


module.exports = getPage;








const request = require("request");

var getPage = url =>
return new Promise((resolve, reject) =>
request(
url: url,
headers:
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
//host : "37.59.248.190",
//port : 8080

, (error, response, html) =>
console.log(response.statusCode, response.statusMessage);
if (!error && response.statusCode == 200)
resolve(response);
else
reject(response);

);
);


module.exports = getPage;





const request = require("request");

var getPage = url =>
return new Promise((resolve, reject) =>
request(
url: url,
headers:
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
//host : "37.59.248.190",
//port : 8080

, (error, response, html) =>
console.log(response.statusCode, response.statusMessage);
if (!error && response.statusCode == 200)
resolve(response);
else
reject(response);

);
);


module.exports = getPage;






node.js web-scraping google-search






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 25 at 10:47









Roberrrt

6,0072946




6,0072946










asked Mar 25 at 5:55









Rohit MishraRohit Mishra

447




447







  • 1





    can you post your code

    – Janith
    Mar 25 at 6:19











  • here you go

    – Rohit Mishra
    Mar 25 at 6:48







  • 1





    what is the URL you are passing

    – Janith
    Mar 25 at 6:50











  • const url = `https://www.google.com/search?q=$searchTerm&start=$page` I pick the search term and page number from variables declared earlier.

    – Rohit Mishra
    Mar 25 at 7:17













  • 1





    can you post your code

    – Janith
    Mar 25 at 6:19











  • here you go

    – Rohit Mishra
    Mar 25 at 6:48







  • 1





    what is the URL you are passing

    – Janith
    Mar 25 at 6:50











  • const url = `https://www.google.com/search?q=$searchTerm&start=$page` I pick the search term and page number from variables declared earlier.

    – Rohit Mishra
    Mar 25 at 7:17








1




1





can you post your code

– Janith
Mar 25 at 6:19





can you post your code

– Janith
Mar 25 at 6:19













here you go

– Rohit Mishra
Mar 25 at 6:48






here you go

– Rohit Mishra
Mar 25 at 6:48





1




1





what is the URL you are passing

– Janith
Mar 25 at 6:50





what is the URL you are passing

– Janith
Mar 25 at 6:50













const url = `https://www.google.com/search?q=$searchTerm&start=$page` I pick the search term and page number from variables declared earlier.

– Rohit Mishra
Mar 25 at 7:17






const url = `https://www.google.com/search?q=$searchTerm&start=$page` I pick the search term and page number from variables declared earlier.

– Rohit Mishra
Mar 25 at 7:17













1 Answer
1






active

oldest

votes


















1














I have tried your code and it worked fine for me running it 20 times in a row with the same url.



Depending on the search term and the frequency of queries you do it may be the case that Google rejects serving your requests if it suspects irregular client activity. Some sources also state Google has mechanisms to detect scraping. Google may even block your IP if you have exceeded a certain amount of requests. See the following links for more information:



  • Error with Google search in Python: 503 Service Unavailable

  • Is it ok to scrape data from Google results?

  • https://security.stackexchange.com/questions/191470/how-does-google-protect-against-scraping

  • https://blog.hyperiongray.com/6-golden-rules-google-scraping/





share|improve this answer

























  • I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!

    – Rohit Mishra
    Mar 25 at 12:50






  • 1





    Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search

    – Marcus
    Mar 25 at 14:44











  • yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!

    – Rohit Mishra
    Mar 26 at 6:00











  • I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?

    – Rohit Mishra
    Mar 28 at 4:59











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331946%2fgoogle-search-results-scraping-gives-service-unavailable-error%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














I have tried your code and it worked fine for me running it 20 times in a row with the same url.



Depending on the search term and the frequency of queries you do it may be the case that Google rejects serving your requests if it suspects irregular client activity. Some sources also state Google has mechanisms to detect scraping. Google may even block your IP if you have exceeded a certain amount of requests. See the following links for more information:



  • Error with Google search in Python: 503 Service Unavailable

  • Is it ok to scrape data from Google results?

  • https://security.stackexchange.com/questions/191470/how-does-google-protect-against-scraping

  • https://blog.hyperiongray.com/6-golden-rules-google-scraping/





share|improve this answer

























  • I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!

    – Rohit Mishra
    Mar 25 at 12:50






  • 1





    Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search

    – Marcus
    Mar 25 at 14:44











  • yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!

    – Rohit Mishra
    Mar 26 at 6:00











  • I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?

    – Rohit Mishra
    Mar 28 at 4:59















1














I have tried your code and it worked fine for me running it 20 times in a row with the same url.



Depending on the search term and the frequency of queries you do it may be the case that Google rejects serving your requests if it suspects irregular client activity. Some sources also state Google has mechanisms to detect scraping. Google may even block your IP if you have exceeded a certain amount of requests. See the following links for more information:



  • Error with Google search in Python: 503 Service Unavailable

  • Is it ok to scrape data from Google results?

  • https://security.stackexchange.com/questions/191470/how-does-google-protect-against-scraping

  • https://blog.hyperiongray.com/6-golden-rules-google-scraping/





share|improve this answer

























  • I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!

    – Rohit Mishra
    Mar 25 at 12:50






  • 1





    Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search

    – Marcus
    Mar 25 at 14:44











  • yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!

    – Rohit Mishra
    Mar 26 at 6:00











  • I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?

    – Rohit Mishra
    Mar 28 at 4:59













1












1








1







I have tried your code and it worked fine for me running it 20 times in a row with the same url.



Depending on the search term and the frequency of queries you do it may be the case that Google rejects serving your requests if it suspects irregular client activity. Some sources also state Google has mechanisms to detect scraping. Google may even block your IP if you have exceeded a certain amount of requests. See the following links for more information:



  • Error with Google search in Python: 503 Service Unavailable

  • Is it ok to scrape data from Google results?

  • https://security.stackexchange.com/questions/191470/how-does-google-protect-against-scraping

  • https://blog.hyperiongray.com/6-golden-rules-google-scraping/





share|improve this answer















I have tried your code and it worked fine for me running it 20 times in a row with the same url.



Depending on the search term and the frequency of queries you do it may be the case that Google rejects serving your requests if it suspects irregular client activity. Some sources also state Google has mechanisms to detect scraping. Google may even block your IP if you have exceeded a certain amount of requests. See the following links for more information:



  • Error with Google search in Python: 503 Service Unavailable

  • Is it ok to scrape data from Google results?

  • https://security.stackexchange.com/questions/191470/how-does-google-protect-against-scraping

  • https://blog.hyperiongray.com/6-golden-rules-google-scraping/






share|improve this answer














share|improve this answer



share|improve this answer








edited Mar 25 at 10:45

























answered Mar 25 at 10:38









MarcusMarcus

654517




654517












  • I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!

    – Rohit Mishra
    Mar 25 at 12:50






  • 1





    Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search

    – Marcus
    Mar 25 at 14:44











  • yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!

    – Rohit Mishra
    Mar 26 at 6:00











  • I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?

    – Rohit Mishra
    Mar 28 at 4:59

















  • I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!

    – Rohit Mishra
    Mar 25 at 12:50






  • 1





    Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search

    – Marcus
    Mar 25 at 14:44











  • yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!

    – Rohit Mishra
    Mar 26 at 6:00











  • I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?

    – Rohit Mishra
    Mar 28 at 4:59
















I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!

– Rohit Mishra
Mar 25 at 12:50





I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!

– Rohit Mishra
Mar 25 at 12:50




1




1





Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search

– Marcus
Mar 25 at 14:44





Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search

– Marcus
Mar 25 at 14:44













yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!

– Rohit Mishra
Mar 26 at 6:00





yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!

– Rohit Mishra
Mar 26 at 6:00













I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?

– Rohit Mishra
Mar 28 at 4:59





I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?

– Rohit Mishra
Mar 28 at 4:59



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331946%2fgoogle-search-results-scraping-gives-service-unavailable-error%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript