Google search results scraping gives 'Service Unavailable' errorIs it ok to scrape data from Google results?Error with Google search in Python: 503 Service UnavailableCant seem to make cookie authenticated requests using UnirestSet userAgent with selenium-webdriver/firefox (NodeJS)Where is `onconnect` event in express?Issue with respect to Scrapy due to Meta Refreshwhy there is different `User-Agent` for same browserWebsocket does not work over https, using socket io and redisCant get request payload in express js nodeupload file post request with JS and NodeJSErrors MIME Firebase and MDC NODEdata get requests from a website with unsupported browser error
Definition of 'vrit'
How can a warlock learn from a spellbook?
Why is it 出差去 and not 去出差?
Explicit song lyrics checker
What is the most suitable position for a bishop here?
「捨ててしまう」why is there two て’s used here?
Syntax and semantics of XDV commands (XeTeX)
Boundaries and Buddhism
How much steel armor can you wear and still be able to swim?
Story of a Witch Boy
I have found ports on my Samsung smart tv running a display service. What can I do with it?
Can the pre-order traversal of two different trees be the same even though they are different?
What does this Swiss black on yellow rectangular traffic sign with a symbol looking like a dart mean?
No shading in ContourPlot3D
How to best clean this sealed rotary encoder / volume knob?
Draw a symmetric alien head
In an emission spectrum, the limit of convergence at higher frequency corresponds to the first ionization energy
How can I prevent a user from copying files on another hard drive?
How can I restore a master database from its bak file?
What kind of chart is this?
Why do you need to heat the pan before heating the olive oil?
Is Newton's third law really correct?
"What is the maximum that Player 1 can win?"
How did Frodo know where the Bree village was?
Google search results scraping gives 'Service Unavailable' error
Is it ok to scrape data from Google results?Error with Google search in Python: 503 Service UnavailableCant seem to make cookie authenticated requests using UnirestSet userAgent with selenium-webdriver/firefox (NodeJS)Where is `onconnect` event in express?Issue with respect to Scrapy due to Meta Refreshwhy there is different `User-Agent` for same browserWebsocket does not work over https, using socket io and redisCant get request payload in express js nodeupload file post request with JS and NodeJSErrors MIME Firebase and MDC NODEdata get requests from a website with unsupported browser error
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I am trying to scrape Google search results using Cheerio in Node.js. I keep getting the '503 - Service Unavailable' error. A few requests give me proper responses but then this error pops up. I did read similar questions on stackoverflow but couldn't find an answer.
I've tried adding the user-agent and even set the proxies in the headers, but to no success.
How can I get around it if at all that can be done?
Appreciate any help!
Code:
const request = require("request");
var getPage = url =>
return new Promise((resolve, reject) =>
request(
url: url,
headers:
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
//host : "37.59.248.190",
//port : 8080
, (error, response, html) =>
console.log(response.statusCode, response.statusMessage);
if (!error && response.statusCode == 200)
resolve(response);
else
reject(response);
);
);
module.exports = getPage;
node.js web-scraping google-search
add a comment |
I am trying to scrape Google search results using Cheerio in Node.js. I keep getting the '503 - Service Unavailable' error. A few requests give me proper responses but then this error pops up. I did read similar questions on stackoverflow but couldn't find an answer.
I've tried adding the user-agent and even set the proxies in the headers, but to no success.
How can I get around it if at all that can be done?
Appreciate any help!
Code:
const request = require("request");
var getPage = url =>
return new Promise((resolve, reject) =>
request(
url: url,
headers:
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
//host : "37.59.248.190",
//port : 8080
, (error, response, html) =>
console.log(response.statusCode, response.statusMessage);
if (!error && response.statusCode == 200)
resolve(response);
else
reject(response);
);
);
module.exports = getPage;
node.js web-scraping google-search
1
can you post your code
– Janith
Mar 25 at 6:19
here you go
– Rohit Mishra
Mar 25 at 6:48
1
what is the URL you are passing
– Janith
Mar 25 at 6:50
const url = `https://www.google.com/search?q=$searchTerm&start=$page`
I pick the search term and page number from variables declared earlier.
– Rohit Mishra
Mar 25 at 7:17
add a comment |
I am trying to scrape Google search results using Cheerio in Node.js. I keep getting the '503 - Service Unavailable' error. A few requests give me proper responses but then this error pops up. I did read similar questions on stackoverflow but couldn't find an answer.
I've tried adding the user-agent and even set the proxies in the headers, but to no success.
How can I get around it if at all that can be done?
Appreciate any help!
Code:
const request = require("request");
var getPage = url =>
return new Promise((resolve, reject) =>
request(
url: url,
headers:
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
//host : "37.59.248.190",
//port : 8080
, (error, response, html) =>
console.log(response.statusCode, response.statusMessage);
if (!error && response.statusCode == 200)
resolve(response);
else
reject(response);
);
);
module.exports = getPage;
node.js web-scraping google-search
I am trying to scrape Google search results using Cheerio in Node.js. I keep getting the '503 - Service Unavailable' error. A few requests give me proper responses but then this error pops up. I did read similar questions on stackoverflow but couldn't find an answer.
I've tried adding the user-agent and even set the proxies in the headers, but to no success.
How can I get around it if at all that can be done?
Appreciate any help!
Code:
const request = require("request");
var getPage = url =>
return new Promise((resolve, reject) =>
request(
url: url,
headers:
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
//host : "37.59.248.190",
//port : 8080
, (error, response, html) =>
console.log(response.statusCode, response.statusMessage);
if (!error && response.statusCode == 200)
resolve(response);
else
reject(response);
);
);
module.exports = getPage;
const request = require("request");
var getPage = url =>
return new Promise((resolve, reject) =>
request(
url: url,
headers:
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
//host : "37.59.248.190",
//port : 8080
, (error, response, html) =>
console.log(response.statusCode, response.statusMessage);
if (!error && response.statusCode == 200)
resolve(response);
else
reject(response);
);
);
module.exports = getPage;
const request = require("request");
var getPage = url =>
return new Promise((resolve, reject) =>
request(
url: url,
headers:
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763",
//host : "37.59.248.190",
//port : 8080
, (error, response, html) =>
console.log(response.statusCode, response.statusMessage);
if (!error && response.statusCode == 200)
resolve(response);
else
reject(response);
);
);
module.exports = getPage;
node.js web-scraping google-search
node.js web-scraping google-search
edited Mar 25 at 10:47
Roberrrt
6,0072946
6,0072946
asked Mar 25 at 5:55
Rohit MishraRohit Mishra
447
447
1
can you post your code
– Janith
Mar 25 at 6:19
here you go
– Rohit Mishra
Mar 25 at 6:48
1
what is the URL you are passing
– Janith
Mar 25 at 6:50
const url = `https://www.google.com/search?q=$searchTerm&start=$page`
I pick the search term and page number from variables declared earlier.
– Rohit Mishra
Mar 25 at 7:17
add a comment |
1
can you post your code
– Janith
Mar 25 at 6:19
here you go
– Rohit Mishra
Mar 25 at 6:48
1
what is the URL you are passing
– Janith
Mar 25 at 6:50
const url = `https://www.google.com/search?q=$searchTerm&start=$page`
I pick the search term and page number from variables declared earlier.
– Rohit Mishra
Mar 25 at 7:17
1
1
can you post your code
– Janith
Mar 25 at 6:19
can you post your code
– Janith
Mar 25 at 6:19
here you go
– Rohit Mishra
Mar 25 at 6:48
here you go
– Rohit Mishra
Mar 25 at 6:48
1
1
what is the URL you are passing
– Janith
Mar 25 at 6:50
what is the URL you are passing
– Janith
Mar 25 at 6:50
const url = `https://www.google.com/search?q=$searchTerm&start=$page`
I pick the search term and page number from variables declared earlier.– Rohit Mishra
Mar 25 at 7:17
const url = `https://www.google.com/search?q=$searchTerm&start=$page`
I pick the search term and page number from variables declared earlier.– Rohit Mishra
Mar 25 at 7:17
add a comment |
1 Answer
1
active
oldest
votes
I have tried your code and it worked fine for me running it 20 times in a row with the same url.
Depending on the search term and the frequency of queries you do it may be the case that Google rejects serving your requests if it suspects irregular client activity. Some sources also state Google has mechanisms to detect scraping. Google may even block your IP if you have exceeded a certain amount of requests. See the following links for more information:
- Error with Google search in Python: 503 Service Unavailable
- Is it ok to scrape data from Google results?
- https://security.stackexchange.com/questions/191470/how-does-google-protect-against-scraping
- https://blog.hyperiongray.com/6-golden-rules-google-scraping/
I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!
– Rohit Mishra
Mar 25 at 12:50
1
Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search
– Marcus
Mar 25 at 14:44
yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!
– Rohit Mishra
Mar 26 at 6:00
I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?
– Rohit Mishra
Mar 28 at 4:59
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331946%2fgoogle-search-results-scraping-gives-service-unavailable-error%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I have tried your code and it worked fine for me running it 20 times in a row with the same url.
Depending on the search term and the frequency of queries you do it may be the case that Google rejects serving your requests if it suspects irregular client activity. Some sources also state Google has mechanisms to detect scraping. Google may even block your IP if you have exceeded a certain amount of requests. See the following links for more information:
- Error with Google search in Python: 503 Service Unavailable
- Is it ok to scrape data from Google results?
- https://security.stackexchange.com/questions/191470/how-does-google-protect-against-scraping
- https://blog.hyperiongray.com/6-golden-rules-google-scraping/
I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!
– Rohit Mishra
Mar 25 at 12:50
1
Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search
– Marcus
Mar 25 at 14:44
yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!
– Rohit Mishra
Mar 26 at 6:00
I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?
– Rohit Mishra
Mar 28 at 4:59
add a comment |
I have tried your code and it worked fine for me running it 20 times in a row with the same url.
Depending on the search term and the frequency of queries you do it may be the case that Google rejects serving your requests if it suspects irregular client activity. Some sources also state Google has mechanisms to detect scraping. Google may even block your IP if you have exceeded a certain amount of requests. See the following links for more information:
- Error with Google search in Python: 503 Service Unavailable
- Is it ok to scrape data from Google results?
- https://security.stackexchange.com/questions/191470/how-does-google-protect-against-scraping
- https://blog.hyperiongray.com/6-golden-rules-google-scraping/
I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!
– Rohit Mishra
Mar 25 at 12:50
1
Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search
– Marcus
Mar 25 at 14:44
yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!
– Rohit Mishra
Mar 26 at 6:00
I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?
– Rohit Mishra
Mar 28 at 4:59
add a comment |
I have tried your code and it worked fine for me running it 20 times in a row with the same url.
Depending on the search term and the frequency of queries you do it may be the case that Google rejects serving your requests if it suspects irregular client activity. Some sources also state Google has mechanisms to detect scraping. Google may even block your IP if you have exceeded a certain amount of requests. See the following links for more information:
- Error with Google search in Python: 503 Service Unavailable
- Is it ok to scrape data from Google results?
- https://security.stackexchange.com/questions/191470/how-does-google-protect-against-scraping
- https://blog.hyperiongray.com/6-golden-rules-google-scraping/
I have tried your code and it worked fine for me running it 20 times in a row with the same url.
Depending on the search term and the frequency of queries you do it may be the case that Google rejects serving your requests if it suspects irregular client activity. Some sources also state Google has mechanisms to detect scraping. Google may even block your IP if you have exceeded a certain amount of requests. See the following links for more information:
- Error with Google search in Python: 503 Service Unavailable
- Is it ok to scrape data from Google results?
- https://security.stackexchange.com/questions/191470/how-does-google-protect-against-scraping
- https://blog.hyperiongray.com/6-golden-rules-google-scraping/
edited Mar 25 at 10:45
answered Mar 25 at 10:38
MarcusMarcus
654517
654517
I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!
– Rohit Mishra
Mar 25 at 12:50
1
Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search
– Marcus
Mar 25 at 14:44
yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!
– Rohit Mishra
Mar 26 at 6:00
I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?
– Rohit Mishra
Mar 28 at 4:59
add a comment |
I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!
– Rohit Mishra
Mar 25 at 12:50
1
Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search
– Marcus
Mar 25 at 14:44
yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!
– Rohit Mishra
Mar 26 at 6:00
I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?
– Rohit Mishra
Mar 28 at 4:59
I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!
– Rohit Mishra
Mar 25 at 12:50
I'm running at a 1-minute interval and scraping about 16-17 pages of results. I think they have probably blocked my IP and so I tried to add the user agent and keep switching the IP using this. I am yet to try this IP switching technique and will know later if it would work!
– Rohit Mishra
Mar 25 at 12:50
1
1
Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search
– Marcus
Mar 25 at 14:44
Another approach to consider is to use the Google Custom Search API instead of using a scraper. According to the current pricing model you can perform up to 100 search queries per day for free. developers.google.com/custom-search
– Marcus
Mar 25 at 14:44
yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!
– Rohit Mishra
Mar 26 at 6:00
yeah I came across that but found out that I would need more searches than that. i'll just try to add change the IP and see if it works. thanks for the reply though!
– Rohit Mishra
Mar 26 at 6:00
I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?
– Rohit Mishra
Mar 28 at 4:59
I tried switching the ip address and port and got the following error - " Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: 179.106.88.162 is not in the cert's list:". do you have any idea as to how this could be resolved?
– Rohit Mishra
Mar 28 at 4:59
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55331946%2fgoogle-search-results-scraping-gives-service-unavailable-error%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
can you post your code
– Janith
Mar 25 at 6:19
here you go
– Rohit Mishra
Mar 25 at 6:48
1
what is the URL you are passing
– Janith
Mar 25 at 6:50
const url = `https://www.google.com/search?q=$searchTerm&start=$page`
I pick the search term and page number from variables declared earlier.– Rohit Mishra
Mar 25 at 7:17