Cannot parse a protected page behind a login portal - requests module Python The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experienceProxies with Python 'Requests' moduleadding header to python requests moduleLog all requests from the python-requests moduleCorrect way to try/except using Python requests module?Trouble opening page with Python requestsFacebook login with Python Requests and BeautifulSoupNeopets login with Python Requests moduleLogin with python module requestsPython requests login page before parsing dataUsing python requests module to login on an Wordpress based website
Single author papers against my advisor's will?
How do spell lists change if the party levels up without taking a long rest?
Am I ethically obligated to go into work on an off day if the reason is sudden?
Can each chord in a progression create its own key?
How did passengers keep warm on sail ships?
Is there a writing software that you can sort scenes like slides in PowerPoint?
Using dividends to reduce short term capital gains?
Why can't wing-mounted spoilers be used to steepen approaches?
Can withdrawing asylum be illegal?
For what reasons would an animal species NOT cross a *horizontal* land bridge?
The following signatures were invalid: EXPKEYSIG 1397BC53640DB551
What is the padding with red substance inside of steak packaging?
How to support a colleague who finds meetings extremely tiring?
Mortgage adviser recommends a longer term than necessary combined with overpayments
Why can't devices on different VLANs, but on the same subnet, communicate?
Can we generate random numbers using irrational numbers like π and e?
Make it rain characters
how can a perfect fourth interval be considered either consonant or dissonant?
Why doesn't shell automatically fix "useless use of cat"?
Define a list range inside a list
What information about me do stores get via my credit card?
TDS update packages don't remove unneeded items
"is" operation returns false even though two objects have same id
Student Loan from years ago pops up and is taking my salary
Cannot parse a protected page behind a login portal - requests module Python
The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
The Ask Question Wizard is Live!
Data science time! April 2019 and salary with experienceProxies with Python 'Requests' moduleadding header to python requests moduleLog all requests from the python-requests moduleCorrect way to try/except using Python requests module?Trouble opening page with Python requestsFacebook login with Python Requests and BeautifulSoupNeopets login with Python Requests moduleLogin with python module requestsPython requests login page before parsing dataUsing python requests module to login on an Wordpress based website
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I'm trying to parse data from this url: http://134.209.71.24/ui/attacks/, but I'm not able to because there's a login page at http://134.209.71.24/ui/login/?next=%2F. I'm using Python's requests module with BeautifulSoup.
nikhilh@ubuntu:~/combine$ python -V
Python 2.7.15rc1
I wrote the following code:
import re
import sys
import requests
from bs4 import BeautifulSoup
url = "http://134.209.71.24/ui/attacks/"
url_login = re.sub('attacks', 'login/?next=%2F', url)
print('Need to login into ' + url_login)
headers = 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0'
with requests.Session() as client:
soup = BeautifulSoup(client.get(url_login).text, 'lxml')
# Find csrf token value
csrftoken_field = soup.find_all("input", type="hidden")
csrftoken_value = csrftoken_field[0]['value']
login_data = "email": "valid_email",
"passwd": "valid_passwd",
"_csrf_token": csrftoken_value
# login
post_result = client.post(url_login, data=login_data, headers=headers)
status_code = post_result.status_code
if status_code == 502:
print("Failed to login into " + url_login + ". Exiting...")
sys.exit();
print("Status code: " + str(status_code) + ". Login successful")
# Get required data from URL
read_data = client.get(url)
print(read_data.text)
I get a response code of 200 after the login but when I try to parse http://134.209.71.24/ui/attacks/ after the login is completed, I still get the login page HTML document. Here's relevant parts of the output:
Need to login into http://134.209.71.24/ui/login/?next=%2F/
Status code: 200. Login successful
<!doctype html>
...
...
<input id="_csrf_token" name="_csrf_token" type="hidden" value="valid_csrf_token">
<fieldset>
<legend>Log In</legend>
<label>Email</label>
<input id="email" name="email" type="text" />
<label>Password</label>
<input id="passwd" name="passwd" type="password" />
...
...
python python-2.7 beautifulsoup python-requests html-parsing
add a comment |
I'm trying to parse data from this url: http://134.209.71.24/ui/attacks/, but I'm not able to because there's a login page at http://134.209.71.24/ui/login/?next=%2F. I'm using Python's requests module with BeautifulSoup.
nikhilh@ubuntu:~/combine$ python -V
Python 2.7.15rc1
I wrote the following code:
import re
import sys
import requests
from bs4 import BeautifulSoup
url = "http://134.209.71.24/ui/attacks/"
url_login = re.sub('attacks', 'login/?next=%2F', url)
print('Need to login into ' + url_login)
headers = 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0'
with requests.Session() as client:
soup = BeautifulSoup(client.get(url_login).text, 'lxml')
# Find csrf token value
csrftoken_field = soup.find_all("input", type="hidden")
csrftoken_value = csrftoken_field[0]['value']
login_data = "email": "valid_email",
"passwd": "valid_passwd",
"_csrf_token": csrftoken_value
# login
post_result = client.post(url_login, data=login_data, headers=headers)
status_code = post_result.status_code
if status_code == 502:
print("Failed to login into " + url_login + ". Exiting...")
sys.exit();
print("Status code: " + str(status_code) + ". Login successful")
# Get required data from URL
read_data = client.get(url)
print(read_data.text)
I get a response code of 200 after the login but when I try to parse http://134.209.71.24/ui/attacks/ after the login is completed, I still get the login page HTML document. Here's relevant parts of the output:
Need to login into http://134.209.71.24/ui/login/?next=%2F/
Status code: 200. Login successful
<!doctype html>
...
...
<input id="_csrf_token" name="_csrf_token" type="hidden" value="valid_csrf_token">
<fieldset>
<legend>Log In</legend>
<label>Email</label>
<input id="email" name="email" type="text" />
<label>Password</label>
<input id="passwd" name="passwd" type="password" />
...
...
python python-2.7 beautifulsoup python-requests html-parsing
try replacingdata=login_datawithjson=login_data
– Fozoro
Mar 22 at 11:44
@Fozoro that didn't work. The 200 response code turned into 400 instead.
– Nikhil Hegde
Mar 24 at 1:41
add a comment |
I'm trying to parse data from this url: http://134.209.71.24/ui/attacks/, but I'm not able to because there's a login page at http://134.209.71.24/ui/login/?next=%2F. I'm using Python's requests module with BeautifulSoup.
nikhilh@ubuntu:~/combine$ python -V
Python 2.7.15rc1
I wrote the following code:
import re
import sys
import requests
from bs4 import BeautifulSoup
url = "http://134.209.71.24/ui/attacks/"
url_login = re.sub('attacks', 'login/?next=%2F', url)
print('Need to login into ' + url_login)
headers = 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0'
with requests.Session() as client:
soup = BeautifulSoup(client.get(url_login).text, 'lxml')
# Find csrf token value
csrftoken_field = soup.find_all("input", type="hidden")
csrftoken_value = csrftoken_field[0]['value']
login_data = "email": "valid_email",
"passwd": "valid_passwd",
"_csrf_token": csrftoken_value
# login
post_result = client.post(url_login, data=login_data, headers=headers)
status_code = post_result.status_code
if status_code == 502:
print("Failed to login into " + url_login + ". Exiting...")
sys.exit();
print("Status code: " + str(status_code) + ". Login successful")
# Get required data from URL
read_data = client.get(url)
print(read_data.text)
I get a response code of 200 after the login but when I try to parse http://134.209.71.24/ui/attacks/ after the login is completed, I still get the login page HTML document. Here's relevant parts of the output:
Need to login into http://134.209.71.24/ui/login/?next=%2F/
Status code: 200. Login successful
<!doctype html>
...
...
<input id="_csrf_token" name="_csrf_token" type="hidden" value="valid_csrf_token">
<fieldset>
<legend>Log In</legend>
<label>Email</label>
<input id="email" name="email" type="text" />
<label>Password</label>
<input id="passwd" name="passwd" type="password" />
...
...
python python-2.7 beautifulsoup python-requests html-parsing
I'm trying to parse data from this url: http://134.209.71.24/ui/attacks/, but I'm not able to because there's a login page at http://134.209.71.24/ui/login/?next=%2F. I'm using Python's requests module with BeautifulSoup.
nikhilh@ubuntu:~/combine$ python -V
Python 2.7.15rc1
I wrote the following code:
import re
import sys
import requests
from bs4 import BeautifulSoup
url = "http://134.209.71.24/ui/attacks/"
url_login = re.sub('attacks', 'login/?next=%2F', url)
print('Need to login into ' + url_login)
headers = 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0'
with requests.Session() as client:
soup = BeautifulSoup(client.get(url_login).text, 'lxml')
# Find csrf token value
csrftoken_field = soup.find_all("input", type="hidden")
csrftoken_value = csrftoken_field[0]['value']
login_data = "email": "valid_email",
"passwd": "valid_passwd",
"_csrf_token": csrftoken_value
# login
post_result = client.post(url_login, data=login_data, headers=headers)
status_code = post_result.status_code
if status_code == 502:
print("Failed to login into " + url_login + ". Exiting...")
sys.exit();
print("Status code: " + str(status_code) + ". Login successful")
# Get required data from URL
read_data = client.get(url)
print(read_data.text)
I get a response code of 200 after the login but when I try to parse http://134.209.71.24/ui/attacks/ after the login is completed, I still get the login page HTML document. Here's relevant parts of the output:
Need to login into http://134.209.71.24/ui/login/?next=%2F/
Status code: 200. Login successful
<!doctype html>
...
...
<input id="_csrf_token" name="_csrf_token" type="hidden" value="valid_csrf_token">
<fieldset>
<legend>Log In</legend>
<label>Email</label>
<input id="email" name="email" type="text" />
<label>Password</label>
<input id="passwd" name="passwd" type="password" />
...
...
python python-2.7 beautifulsoup python-requests html-parsing
python python-2.7 beautifulsoup python-requests html-parsing
edited Mar 30 at 16:46
Nikhil Hegde
asked Mar 22 at 5:31
Nikhil HegdeNikhil Hegde
78111
78111
try replacingdata=login_datawithjson=login_data
– Fozoro
Mar 22 at 11:44
@Fozoro that didn't work. The 200 response code turned into 400 instead.
– Nikhil Hegde
Mar 24 at 1:41
add a comment |
try replacingdata=login_datawithjson=login_data
– Fozoro
Mar 22 at 11:44
@Fozoro that didn't work. The 200 response code turned into 400 instead.
– Nikhil Hegde
Mar 24 at 1:41
try replacing
data=login_data with json=login_data– Fozoro
Mar 22 at 11:44
try replacing
data=login_data with json=login_data– Fozoro
Mar 22 at 11:44
@Fozoro that didn't work. The 200 response code turned into 400 instead.
– Nikhil Hegde
Mar 24 at 1:41
@Fozoro that didn't work. The 200 response code turned into 400 instead.
– Nikhil Hegde
Mar 24 at 1:41
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55293430%2fcannot-parse-a-protected-page-behind-a-login-portal-requests-module-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55293430%2fcannot-parse-a-protected-page-behind-a-login-portal-requests-module-python%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
try replacing
data=login_datawithjson=login_data– Fozoro
Mar 22 at 11:44
@Fozoro that didn't work. The 200 response code turned into 400 instead.
– Nikhil Hegde
Mar 24 at 1:41