Cannot parse a protected page behind a login portal - requests module Python The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experienceProxies with Python 'Requests' moduleadding header to python requests moduleLog all requests from the python-requests moduleCorrect way to try/except using Python requests module?Trouble opening page with Python requestsFacebook login with Python Requests and BeautifulSoupNeopets login with Python Requests moduleLogin with python module requestsPython requests login page before parsing dataUsing python requests module to login on an Wordpress based website

Single author papers against my advisor's will?

How do spell lists change if the party levels up without taking a long rest?

Am I ethically obligated to go into work on an off day if the reason is sudden?

Can each chord in a progression create its own key?

How did passengers keep warm on sail ships?

Is there a writing software that you can sort scenes like slides in PowerPoint?

Using dividends to reduce short term capital gains?

Why can't wing-mounted spoilers be used to steepen approaches?

Can withdrawing asylum be illegal?

For what reasons would an animal species NOT cross a *horizontal* land bridge?

The following signatures were invalid: EXPKEYSIG 1397BC53640DB551

What is the padding with red substance inside of steak packaging?

How to support a colleague who finds meetings extremely tiring?

Mortgage adviser recommends a longer term than necessary combined with overpayments

Why can't devices on different VLANs, but on the same subnet, communicate?

Can we generate random numbers using irrational numbers like π and e?

Make it rain characters

how can a perfect fourth interval be considered either consonant or dissonant?

Why doesn't shell automatically fix "useless use of cat"?

Define a list range inside a list

What information about me do stores get via my credit card?

TDS update packages don't remove unneeded items

"is" operation returns false even though two objects have same id

Student Loan from years ago pops up and is taking my salary



Cannot parse a protected page behind a login portal - requests module Python



The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
The Ask Question Wizard is Live!
Data science time! April 2019 and salary with experienceProxies with Python 'Requests' moduleadding header to python requests moduleLog all requests from the python-requests moduleCorrect way to try/except using Python requests module?Trouble opening page with Python requestsFacebook login with Python Requests and BeautifulSoupNeopets login with Python Requests moduleLogin with python module requestsPython requests login page before parsing dataUsing python requests module to login on an Wordpress based website



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I'm trying to parse data from this url: http://134.209.71.24/ui/attacks/, but I'm not able to because there's a login page at http://134.209.71.24/ui/login/?next=%2F. I'm using Python's requests module with BeautifulSoup.



nikhilh@ubuntu:~/combine$ python -V
Python 2.7.15rc1


I wrote the following code:



import re
import sys
import requests
from bs4 import BeautifulSoup

url = "http://134.209.71.24/ui/attacks/"
url_login = re.sub('attacks', 'login/?next=%2F', url)
print('Need to login into ' + url_login)

headers = 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0'

with requests.Session() as client:
soup = BeautifulSoup(client.get(url_login).text, 'lxml')

# Find csrf token value
csrftoken_field = soup.find_all("input", type="hidden")
csrftoken_value = csrftoken_field[0]['value']
login_data = "email": "valid_email",
"passwd": "valid_passwd",
"_csrf_token": csrftoken_value

# login
post_result = client.post(url_login, data=login_data, headers=headers)

status_code = post_result.status_code
if status_code == 502:
print("Failed to login into " + url_login + ". Exiting...")
sys.exit();
print("Status code: " + str(status_code) + ". Login successful")

# Get required data from URL
read_data = client.get(url)
print(read_data.text)


I get a response code of 200 after the login but when I try to parse http://134.209.71.24/ui/attacks/ after the login is completed, I still get the login page HTML document. Here's relevant parts of the output:



Need to login into http://134.209.71.24/ui/login/?next=%2F/
Status code: 200. Login successful
<!doctype html>
...
...
<input id="_csrf_token" name="_csrf_token" type="hidden" value="valid_csrf_token">
<fieldset>
<legend>Log In</legend>
<label>Email</label>
<input id="email" name="email" type="text" />
<label>Password</label>
<input id="passwd" name="passwd" type="password" />
...
...









share|improve this question
























  • try replacing data=login_data with json=login_data

    – Fozoro
    Mar 22 at 11:44











  • @Fozoro that didn't work. The 200 response code turned into 400 instead.

    – Nikhil Hegde
    Mar 24 at 1:41


















0















I'm trying to parse data from this url: http://134.209.71.24/ui/attacks/, but I'm not able to because there's a login page at http://134.209.71.24/ui/login/?next=%2F. I'm using Python's requests module with BeautifulSoup.



nikhilh@ubuntu:~/combine$ python -V
Python 2.7.15rc1


I wrote the following code:



import re
import sys
import requests
from bs4 import BeautifulSoup

url = "http://134.209.71.24/ui/attacks/"
url_login = re.sub('attacks', 'login/?next=%2F', url)
print('Need to login into ' + url_login)

headers = 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0'

with requests.Session() as client:
soup = BeautifulSoup(client.get(url_login).text, 'lxml')

# Find csrf token value
csrftoken_field = soup.find_all("input", type="hidden")
csrftoken_value = csrftoken_field[0]['value']
login_data = "email": "valid_email",
"passwd": "valid_passwd",
"_csrf_token": csrftoken_value

# login
post_result = client.post(url_login, data=login_data, headers=headers)

status_code = post_result.status_code
if status_code == 502:
print("Failed to login into " + url_login + ". Exiting...")
sys.exit();
print("Status code: " + str(status_code) + ". Login successful")

# Get required data from URL
read_data = client.get(url)
print(read_data.text)


I get a response code of 200 after the login but when I try to parse http://134.209.71.24/ui/attacks/ after the login is completed, I still get the login page HTML document. Here's relevant parts of the output:



Need to login into http://134.209.71.24/ui/login/?next=%2F/
Status code: 200. Login successful
<!doctype html>
...
...
<input id="_csrf_token" name="_csrf_token" type="hidden" value="valid_csrf_token">
<fieldset>
<legend>Log In</legend>
<label>Email</label>
<input id="email" name="email" type="text" />
<label>Password</label>
<input id="passwd" name="passwd" type="password" />
...
...









share|improve this question
























  • try replacing data=login_data with json=login_data

    – Fozoro
    Mar 22 at 11:44











  • @Fozoro that didn't work. The 200 response code turned into 400 instead.

    – Nikhil Hegde
    Mar 24 at 1:41














0












0








0








I'm trying to parse data from this url: http://134.209.71.24/ui/attacks/, but I'm not able to because there's a login page at http://134.209.71.24/ui/login/?next=%2F. I'm using Python's requests module with BeautifulSoup.



nikhilh@ubuntu:~/combine$ python -V
Python 2.7.15rc1


I wrote the following code:



import re
import sys
import requests
from bs4 import BeautifulSoup

url = "http://134.209.71.24/ui/attacks/"
url_login = re.sub('attacks', 'login/?next=%2F', url)
print('Need to login into ' + url_login)

headers = 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0'

with requests.Session() as client:
soup = BeautifulSoup(client.get(url_login).text, 'lxml')

# Find csrf token value
csrftoken_field = soup.find_all("input", type="hidden")
csrftoken_value = csrftoken_field[0]['value']
login_data = "email": "valid_email",
"passwd": "valid_passwd",
"_csrf_token": csrftoken_value

# login
post_result = client.post(url_login, data=login_data, headers=headers)

status_code = post_result.status_code
if status_code == 502:
print("Failed to login into " + url_login + ". Exiting...")
sys.exit();
print("Status code: " + str(status_code) + ". Login successful")

# Get required data from URL
read_data = client.get(url)
print(read_data.text)


I get a response code of 200 after the login but when I try to parse http://134.209.71.24/ui/attacks/ after the login is completed, I still get the login page HTML document. Here's relevant parts of the output:



Need to login into http://134.209.71.24/ui/login/?next=%2F/
Status code: 200. Login successful
<!doctype html>
...
...
<input id="_csrf_token" name="_csrf_token" type="hidden" value="valid_csrf_token">
<fieldset>
<legend>Log In</legend>
<label>Email</label>
<input id="email" name="email" type="text" />
<label>Password</label>
<input id="passwd" name="passwd" type="password" />
...
...









share|improve this question
















I'm trying to parse data from this url: http://134.209.71.24/ui/attacks/, but I'm not able to because there's a login page at http://134.209.71.24/ui/login/?next=%2F. I'm using Python's requests module with BeautifulSoup.



nikhilh@ubuntu:~/combine$ python -V
Python 2.7.15rc1


I wrote the following code:



import re
import sys
import requests
from bs4 import BeautifulSoup

url = "http://134.209.71.24/ui/attacks/"
url_login = re.sub('attacks', 'login/?next=%2F', url)
print('Need to login into ' + url_login)

headers = 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0'

with requests.Session() as client:
soup = BeautifulSoup(client.get(url_login).text, 'lxml')

# Find csrf token value
csrftoken_field = soup.find_all("input", type="hidden")
csrftoken_value = csrftoken_field[0]['value']
login_data = "email": "valid_email",
"passwd": "valid_passwd",
"_csrf_token": csrftoken_value

# login
post_result = client.post(url_login, data=login_data, headers=headers)

status_code = post_result.status_code
if status_code == 502:
print("Failed to login into " + url_login + ". Exiting...")
sys.exit();
print("Status code: " + str(status_code) + ". Login successful")

# Get required data from URL
read_data = client.get(url)
print(read_data.text)


I get a response code of 200 after the login but when I try to parse http://134.209.71.24/ui/attacks/ after the login is completed, I still get the login page HTML document. Here's relevant parts of the output:



Need to login into http://134.209.71.24/ui/login/?next=%2F/
Status code: 200. Login successful
<!doctype html>
...
...
<input id="_csrf_token" name="_csrf_token" type="hidden" value="valid_csrf_token">
<fieldset>
<legend>Log In</legend>
<label>Email</label>
<input id="email" name="email" type="text" />
<label>Password</label>
<input id="passwd" name="passwd" type="password" />
...
...






python python-2.7 beautifulsoup python-requests html-parsing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 30 at 16:46







Nikhil Hegde

















asked Mar 22 at 5:31









Nikhil HegdeNikhil Hegde

78111




78111












  • try replacing data=login_data with json=login_data

    – Fozoro
    Mar 22 at 11:44











  • @Fozoro that didn't work. The 200 response code turned into 400 instead.

    – Nikhil Hegde
    Mar 24 at 1:41


















  • try replacing data=login_data with json=login_data

    – Fozoro
    Mar 22 at 11:44











  • @Fozoro that didn't work. The 200 response code turned into 400 instead.

    – Nikhil Hegde
    Mar 24 at 1:41

















try replacing data=login_data with json=login_data

– Fozoro
Mar 22 at 11:44





try replacing data=login_data with json=login_data

– Fozoro
Mar 22 at 11:44













@Fozoro that didn't work. The 200 response code turned into 400 instead.

– Nikhil Hegde
Mar 24 at 1:41






@Fozoro that didn't work. The 200 response code turned into 400 instead.

– Nikhil Hegde
Mar 24 at 1:41













0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55293430%2fcannot-parse-a-protected-page-behind-a-login-portal-requests-module-python%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55293430%2fcannot-parse-a-protected-page-behind-a-login-portal-requests-module-python%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh

155 수학 과학 기타 둘러보기 메뉴eh추가해eh문서를 완성해