scrapy1.5.1 CrawlSpider handle 302 redirect The 2019 Stack Overflow Developer Survey Results Are InHow to manage a redirect request after a jQuery Ajax callHow do I redirect to another webpage?Redirect stderr and stdout in BashHow do I make a redirect in PHP?How can I redirect and append both stdout and stderr to a file with Bash?HTTP redirect: 301 (permanent) vs. 302 (temporary)Will a 302 redirect maintain the referer string?How do I redirect with JavaScript?SgmlLinkExtractor and regular expression for match word in a stringValueError: Missing scheme in request url: h
Write faster on AT24C32
How technical should a Scrum Master be to effectively remove impediments?
Why can Shazam fly?
Is this app Icon Browser Safe/Legit?
How to type this arrow in math mode?
What is the motivation for a law requiring 2 parties to consent for recording a conversation
What do the Banks children have against barley water?
How to answer pointed "are you quitting" questioning when I don't want them to suspect
Why isn't airport relocation done gradually?
Does the shape of a die affect the probability of a number being rolled?
Is "plugging out" electronic devices an American expression?
Deal with toxic manager when you can't quit
Can one be advised by a professor who is very far away?
Are children permitted to help build the Beis Hamikdash?
Time travel alters history but people keep saying nothing's changed
A poker game description that does not feel gimmicky
Did Scotland spend $250,000 for the slogan "Welcome to Scotland"?
What is the meaning of Triage in Cybersec world?
Output the Arecibo Message
What does Linus Torvalds mean when he says that Git "never ever" tracks a file?
What did it mean to "align" a radio?
What is the meaning of the verb "bear" in this context?
Geography at the pixel level
Pokemon Turn Based battle (Python)
scrapy1.5.1 CrawlSpider handle 302 redirect
The 2019 Stack Overflow Developer Survey Results Are InHow to manage a redirect request after a jQuery Ajax callHow do I redirect to another webpage?Redirect stderr and stdout in BashHow do I make a redirect in PHP?How can I redirect and append both stdout and stderr to a file with Bash?HTTP redirect: 301 (permanent) vs. 302 (temporary)Will a 302 redirect maintain the referer string?How do I redirect with JavaScript?SgmlLinkExtractor and regular expression for match word in a stringValueError: Missing scheme in request url: h
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have tried to handle 302 http code in scrapy crawlspider.I searched in google and this site ,also in scrapy document https://docs.scrapy.org/en/latest/topics/downloader-middleware.html?highlight=302 ,and tried it with following codes
handle_httpstatus_list = [302]
meta = 'dont_redirect': True, "handle_httpstatus_list": [302]
# and
custom_settings = 'REDIRECT_ENABLED': False
All of it do not work for me.
Here is my code
class LagouSpider(CrawlSpider):
handle_httpstatus_list = [302]
meta = 'dont_redirect': True, "handle_httpstatus_list": [302]
name = 'lagou'
allowed_domains = ['www.lagou.com']
start_urls = ['https://www.lagou.com']
login_url = "https://passport.lagou.com/login/login.html"
custom_settings = 'REDIRECT_ENABLED': False
rules = (
Rule(LinkExtractor(allow=("zhaopin/.*",)), follow=True),
Rule(LinkExtractor(allow=("gongsi/jd+.html",)), follow=True),
Rule(LinkExtractor(allow=r'jobs/d+.html'), callback='parse_job', follow=True),
)
headers =
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Host': 'www.lagou.com',
'Referer': 'https://www.lagou.com/',
'X-Anit-Forge-Code': '0',
'X-Anit-Forge-Token': 'None',
'Accept-Encoding': 'gzip, deflate, br',
'X-Requested-With': 'XMLHttpRequest'
def start_requests(self):
global rc, im
browser = webdriver.Chrome(executable_path="/home/wqh/下载/chromedriver")
browser.get(self.login_url)
# ··········(some code)
return [scrapy.Request(self.start_urls[0], cookies=cookie_dict,
dont_filter=True)]
# I have tried to use meta in scrapy.request and it failed.
# return [scrapy.Request(self.start_urls[0], cookies=cookie_dict,
meta=self.meta)]
def parse_job(self, response):
if response.status == 302:
print("302")
time.sleep(100)
And it never print 302 when page 302 status occurred.
selenium redirect scrapy http-status-code-302
add a comment |
I have tried to handle 302 http code in scrapy crawlspider.I searched in google and this site ,also in scrapy document https://docs.scrapy.org/en/latest/topics/downloader-middleware.html?highlight=302 ,and tried it with following codes
handle_httpstatus_list = [302]
meta = 'dont_redirect': True, "handle_httpstatus_list": [302]
# and
custom_settings = 'REDIRECT_ENABLED': False
All of it do not work for me.
Here is my code
class LagouSpider(CrawlSpider):
handle_httpstatus_list = [302]
meta = 'dont_redirect': True, "handle_httpstatus_list": [302]
name = 'lagou'
allowed_domains = ['www.lagou.com']
start_urls = ['https://www.lagou.com']
login_url = "https://passport.lagou.com/login/login.html"
custom_settings = 'REDIRECT_ENABLED': False
rules = (
Rule(LinkExtractor(allow=("zhaopin/.*",)), follow=True),
Rule(LinkExtractor(allow=("gongsi/jd+.html",)), follow=True),
Rule(LinkExtractor(allow=r'jobs/d+.html'), callback='parse_job', follow=True),
)
headers =
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Host': 'www.lagou.com',
'Referer': 'https://www.lagou.com/',
'X-Anit-Forge-Code': '0',
'X-Anit-Forge-Token': 'None',
'Accept-Encoding': 'gzip, deflate, br',
'X-Requested-With': 'XMLHttpRequest'
def start_requests(self):
global rc, im
browser = webdriver.Chrome(executable_path="/home/wqh/下载/chromedriver")
browser.get(self.login_url)
# ··········(some code)
return [scrapy.Request(self.start_urls[0], cookies=cookie_dict,
dont_filter=True)]
# I have tried to use meta in scrapy.request and it failed.
# return [scrapy.Request(self.start_urls[0], cookies=cookie_dict,
meta=self.meta)]
def parse_job(self, response):
if response.status == 302:
print("302")
time.sleep(100)
And it never print 302 when page 302 status occurred.
selenium redirect scrapy http-status-code-302
I works. Thank you!!!!! I wrote the wrong code . My apologize.
– qihuan wu
Mar 23 at 3:16
add a comment |
I have tried to handle 302 http code in scrapy crawlspider.I searched in google and this site ,also in scrapy document https://docs.scrapy.org/en/latest/topics/downloader-middleware.html?highlight=302 ,and tried it with following codes
handle_httpstatus_list = [302]
meta = 'dont_redirect': True, "handle_httpstatus_list": [302]
# and
custom_settings = 'REDIRECT_ENABLED': False
All of it do not work for me.
Here is my code
class LagouSpider(CrawlSpider):
handle_httpstatus_list = [302]
meta = 'dont_redirect': True, "handle_httpstatus_list": [302]
name = 'lagou'
allowed_domains = ['www.lagou.com']
start_urls = ['https://www.lagou.com']
login_url = "https://passport.lagou.com/login/login.html"
custom_settings = 'REDIRECT_ENABLED': False
rules = (
Rule(LinkExtractor(allow=("zhaopin/.*",)), follow=True),
Rule(LinkExtractor(allow=("gongsi/jd+.html",)), follow=True),
Rule(LinkExtractor(allow=r'jobs/d+.html'), callback='parse_job', follow=True),
)
headers =
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Host': 'www.lagou.com',
'Referer': 'https://www.lagou.com/',
'X-Anit-Forge-Code': '0',
'X-Anit-Forge-Token': 'None',
'Accept-Encoding': 'gzip, deflate, br',
'X-Requested-With': 'XMLHttpRequest'
def start_requests(self):
global rc, im
browser = webdriver.Chrome(executable_path="/home/wqh/下载/chromedriver")
browser.get(self.login_url)
# ··········(some code)
return [scrapy.Request(self.start_urls[0], cookies=cookie_dict,
dont_filter=True)]
# I have tried to use meta in scrapy.request and it failed.
# return [scrapy.Request(self.start_urls[0], cookies=cookie_dict,
meta=self.meta)]
def parse_job(self, response):
if response.status == 302:
print("302")
time.sleep(100)
And it never print 302 when page 302 status occurred.
selenium redirect scrapy http-status-code-302
I have tried to handle 302 http code in scrapy crawlspider.I searched in google and this site ,also in scrapy document https://docs.scrapy.org/en/latest/topics/downloader-middleware.html?highlight=302 ,and tried it with following codes
handle_httpstatus_list = [302]
meta = 'dont_redirect': True, "handle_httpstatus_list": [302]
# and
custom_settings = 'REDIRECT_ENABLED': False
All of it do not work for me.
Here is my code
class LagouSpider(CrawlSpider):
handle_httpstatus_list = [302]
meta = 'dont_redirect': True, "handle_httpstatus_list": [302]
name = 'lagou'
allowed_domains = ['www.lagou.com']
start_urls = ['https://www.lagou.com']
login_url = "https://passport.lagou.com/login/login.html"
custom_settings = 'REDIRECT_ENABLED': False
rules = (
Rule(LinkExtractor(allow=("zhaopin/.*",)), follow=True),
Rule(LinkExtractor(allow=("gongsi/jd+.html",)), follow=True),
Rule(LinkExtractor(allow=r'jobs/d+.html'), callback='parse_job', follow=True),
)
headers =
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Host': 'www.lagou.com',
'Referer': 'https://www.lagou.com/',
'X-Anit-Forge-Code': '0',
'X-Anit-Forge-Token': 'None',
'Accept-Encoding': 'gzip, deflate, br',
'X-Requested-With': 'XMLHttpRequest'
def start_requests(self):
global rc, im
browser = webdriver.Chrome(executable_path="/home/wqh/下载/chromedriver")
browser.get(self.login_url)
# ··········(some code)
return [scrapy.Request(self.start_urls[0], cookies=cookie_dict,
dont_filter=True)]
# I have tried to use meta in scrapy.request and it failed.
# return [scrapy.Request(self.start_urls[0], cookies=cookie_dict,
meta=self.meta)]
def parse_job(self, response):
if response.status == 302:
print("302")
time.sleep(100)
And it never print 302 when page 302 status occurred.
selenium redirect scrapy http-status-code-302
selenium redirect scrapy http-status-code-302
edited Mar 23 at 3:11
qihuan wu
asked Mar 22 at 3:32
qihuan wuqihuan wu
104
104
I works. Thank you!!!!! I wrote the wrong code . My apologize.
– qihuan wu
Mar 23 at 3:16
add a comment |
I works. Thank you!!!!! I wrote the wrong code . My apologize.
– qihuan wu
Mar 23 at 3:16
I works. Thank you!!!!! I wrote the wrong code . My apologize.
– qihuan wu
Mar 23 at 3:16
I works. Thank you!!!!! I wrote the wrong code . My apologize.
– qihuan wu
Mar 23 at 3:16
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55292506%2fscrapy1-5-1-crawlspider-handle-302-redirect%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55292506%2fscrapy1-5-1-crawlspider-handle-302-redirect%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I works. Thank you!!!!! I wrote the wrong code . My apologize.
– qihuan wu
Mar 23 at 3:16