Cannot parse a protected page behind a login portal - requests module Python The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experienceProxies with Python 'Requests' moduleadding header to python requests moduleLog all requests from the python-requests moduleCorrect way to try/except using Python requests module?Trouble opening page with Python requestsFacebook login with Python Requests and BeautifulSoupNeopets login with Python Requests moduleLogin with python module requestsPython requests login page before parsing dataUsing python requests module to login on an Wordpress based website

Single author papers against my advisor's will?

How do spell lists change if the party levels up without taking a long rest?

Am I ethically obligated to go into work on an off day if the reason is sudden?

Can each chord in a progression create its own key?

How did passengers keep warm on sail ships?

Is there a writing software that you can sort scenes like slides in PowerPoint?

Using dividends to reduce short term capital gains?

Why can't wing-mounted spoilers be used to steepen approaches?

Can withdrawing asylum be illegal?

For what reasons would an animal species NOT cross a *horizontal* land bridge?

The following signatures were invalid: EXPKEYSIG 1397BC53640DB551

What is the padding with red substance inside of steak packaging?

How to support a colleague who finds meetings extremely tiring?

Mortgage adviser recommends a longer term than necessary combined with overpayments

Why can't devices on different VLANs, but on the same subnet, communicate?

Can we generate random numbers using irrational numbers like π and e?

Make it rain characters

how can a perfect fourth interval be considered either consonant or dissonant?

Why doesn't shell automatically fix "useless use of cat"?

Define a list range inside a list

What information about me do stores get via my credit card?

TDS update packages don't remove unneeded items

"is" operation returns false even though two objects have same id

Student Loan from years ago pops up and is taking my salary

Cannot parse a protected page behind a login portal - requests module Python

The 2019 Stack Overflow Developer Survey Results Are In

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)

The Ask Question Wizard is Live!

Data science time! April 2019 and salary with experienceProxies with Python 'Requests' moduleadding header to python requests moduleLog all requests from the python-requests moduleCorrect way to try/except using Python requests module?Trouble opening page with Python requestsFacebook login with Python Requests and BeautifulSoupNeopets login with Python Requests moduleLogin with python module requestsPython requests login page before parsing dataUsing python requests module to login on an Wordpress based website

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I'm trying to parse data from this url: http://134.209.71.24/ui/attacks/, but I'm not able to because there's a login page at http://134.209.71.24/ui/login/?next=%2F. I'm using Python's requests module with BeautifulSoup.

nikhilh@ubuntu:~/combine$ python -V
Python 2.7.15rc1

I wrote the following code:

import re
import sys
import requests
from bs4 import BeautifulSoup

url = "http://134.209.71.24/ui/attacks/"
url_login = re.sub('attacks', 'login/?next=%2F', url)
print('Need to login into ' + url_login)

headers = 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0'

with requests.Session() as client:
 soup = BeautifulSoup(client.get(url_login).text, 'lxml')

 # Find csrf token value
 csrftoken_field = soup.find_all("input", type="hidden")
 csrftoken_value = csrftoken_field[0]['value']
 login_data = "email": "valid_email",
 "passwd": "valid_passwd",
 "_csrf_token": csrftoken_value

 # login
 post_result = client.post(url_login, data=login_data, headers=headers)

 status_code = post_result.status_code
 if status_code == 502:
 print("Failed to login into " + url_login + ". Exiting...")
 sys.exit();
 print("Status code: " + str(status_code) + ". Login successful")

 # Get required data from URL
 read_data = client.get(url)
 print(read_data.text)

I get a response code of 200 after the login but when I try to parse http://134.209.71.24/ui/attacks/ after the login is completed, I still get the login page HTML document. Here's relevant parts of the output:

Need to login into http://134.209.71.24/ui/login/?next=%2F/
Status code: 200. Login successful
<!doctype html>
...
...
 <input id="_csrf_token" name="_csrf_token" type="hidden" value="valid_csrf_token">
 <fieldset>
 <legend>Log In</legend>
 <label>Email</label>
 <input id="email" name="email" type="text" />
 <label>Password</label>
 <input id="passwd" name="passwd" type="password" />
...
...

edited Mar 30 at 16:46

asked Mar 22 at 5:31

Nikhil Hegde

78111

try replacing data=login_data with json=login_data

– Fozoro
Mar 22 at 11:44

@Fozoro that didn't work. The 200 response code turned into 400 instead.

– Nikhil Hegde
Mar 24 at 1:41

add a comment |

nikhilh@ubuntu:~/combine$ python -V
Python 2.7.15rc1

I wrote the following code:

import re
import sys
import requests
from bs4 import BeautifulSoup

url = "http://134.209.71.24/ui/attacks/"
url_login = re.sub('attacks', 'login/?next=%2F', url)
print('Need to login into ' + url_login)

headers = 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0'

with requests.Session() as client:
 soup = BeautifulSoup(client.get(url_login).text, 'lxml')

 # Find csrf token value
 csrftoken_field = soup.find_all("input", type="hidden")
 csrftoken_value = csrftoken_field[0]['value']
 login_data = "email": "valid_email",
 "passwd": "valid_passwd",
 "_csrf_token": csrftoken_value

 # login
 post_result = client.post(url_login, data=login_data, headers=headers)

 status_code = post_result.status_code
 if status_code == 502:
 print("Failed to login into " + url_login + ". Exiting...")
 sys.exit();
 print("Status code: " + str(status_code) + ". Login successful")

 # Get required data from URL
 read_data = client.get(url)
 print(read_data.text)

Need to login into http://134.209.71.24/ui/login/?next=%2F/
Status code: 200. Login successful
<!doctype html>
...
...
 <input id="_csrf_token" name="_csrf_token" type="hidden" value="valid_csrf_token">
 <fieldset>
 <legend>Log In</legend>
 <label>Email</label>
 <input id="email" name="email" type="text" />
 <label>Password</label>
 <input id="passwd" name="passwd" type="password" />
...
...

edited Mar 30 at 16:46

asked Mar 22 at 5:31

Nikhil Hegde

78111

try replacing data=login_data with json=login_data

– Fozoro
Mar 22 at 11:44

@Fozoro that didn't work. The 200 response code turned into 400 instead.

– Nikhil Hegde
Mar 24 at 1:41

add a comment |

nikhilh@ubuntu:~/combine$ python -V
Python 2.7.15rc1

I wrote the following code:

import re
import sys
import requests
from bs4 import BeautifulSoup

url = "http://134.209.71.24/ui/attacks/"
url_login = re.sub('attacks', 'login/?next=%2F', url)
print('Need to login into ' + url_login)

headers = 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0'

with requests.Session() as client:
 soup = BeautifulSoup(client.get(url_login).text, 'lxml')

 # Find csrf token value
 csrftoken_field = soup.find_all("input", type="hidden")
 csrftoken_value = csrftoken_field[0]['value']
 login_data = "email": "valid_email",
 "passwd": "valid_passwd",
 "_csrf_token": csrftoken_value

 # login
 post_result = client.post(url_login, data=login_data, headers=headers)

 status_code = post_result.status_code
 if status_code == 502:
 print("Failed to login into " + url_login + ". Exiting...")
 sys.exit();
 print("Status code: " + str(status_code) + ". Login successful")

 # Get required data from URL
 read_data = client.get(url)
 print(read_data.text)

Need to login into http://134.209.71.24/ui/login/?next=%2F/
Status code: 200. Login successful
<!doctype html>
...
...
 <input id="_csrf_token" name="_csrf_token" type="hidden" value="valid_csrf_token">
 <fieldset>
 <legend>Log In</legend>
 <label>Email</label>
 <input id="email" name="email" type="text" />
 <label>Password</label>
 <input id="passwd" name="passwd" type="password" />
...
...

edited Mar 30 at 16:46

asked Mar 22 at 5:31

Nikhil Hegde

78111

nikhilh@ubuntu:~/combine$ python -V
Python 2.7.15rc1

I wrote the following code:

import re
import sys
import requests
from bs4 import BeautifulSoup

url = "http://134.209.71.24/ui/attacks/"
url_login = re.sub('attacks', 'login/?next=%2F', url)
print('Need to login into ' + url_login)

headers = 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0'

with requests.Session() as client:
 soup = BeautifulSoup(client.get(url_login).text, 'lxml')

 # Find csrf token value
 csrftoken_field = soup.find_all("input", type="hidden")
 csrftoken_value = csrftoken_field[0]['value']
 login_data = "email": "valid_email",
 "passwd": "valid_passwd",
 "_csrf_token": csrftoken_value

 # login
 post_result = client.post(url_login, data=login_data, headers=headers)

 status_code = post_result.status_code
 if status_code == 502:
 print("Failed to login into " + url_login + ". Exiting...")
 sys.exit();
 print("Status code: " + str(status_code) + ". Login successful")

 # Get required data from URL
 read_data = client.get(url)
 print(read_data.text)

Need to login into http://134.209.71.24/ui/login/?next=%2F/
Status code: 200. Login successful
<!doctype html>
...
...
 <input id="_csrf_token" name="_csrf_token" type="hidden" value="valid_csrf_token">
 <fieldset>
 <legend>Log In</legend>
 <label>Email</label>
 <input id="email" name="email" type="text" />
 <label>Password</label>
 <input id="passwd" name="passwd" type="password" />
...
...

python python-2.7 beautifulsoup python-requests html-parsing

edited Mar 30 at 16:46

asked Mar 22 at 5:31

Nikhil Hegde

78111

edited Mar 30 at 16:46

asked Mar 22 at 5:31

Nikhil Hegde

78111

edited Mar 30 at 16:46

asked Mar 22 at 5:31

Nikhil Hegde

78111

asked Mar 22 at 5:31

Nikhil Hegde

78111

asked Mar 22 at 5:31

Nikhil Hegde

78111

try replacing data=login_data with json=login_data

– Fozoro
Mar 22 at 11:44

@Fozoro that didn't work. The 200 response code turned into 400 instead.

– Nikhil Hegde
Mar 24 at 1:41

add a comment |

try replacing data=login_data with json=login_data

– Fozoro
Mar 22 at 11:44

@Fozoro that didn't work. The 200 response code turned into 400 instead.

– Nikhil Hegde
Mar 24 at 1:41

try replacing data=login_data with json=login_data

– Fozoro
Mar 22 at 11:44

@Fozoro that didn't work. The 200 response code turned into 400 instead.

– Nikhil Hegde
Mar 24 at 1:41

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55293430%2fcannot-parse-a-protected-page-behind-a-login-portal-requests-module-python%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴