Data is missing while scraping using beautifulsoup4Options for HTML scraping?Why can't Python parse this JSON data?store html in pythonPython : Use PyQT4 + Soup to scrape SEVERAL pagesScrape Google with Python - What is the correct URL for requests.get?How to extract results of webpage script using BeautifulSoup/Pythonpython asyncronous images download (multiple urls)Problem in scraping data in non-english character sites [Python]Table scraping with beautifulsoup4 missing cells
Why did Robert pick unworthy men for the White Cloaks?
Professor Roman loves to teach unorthodox Chemistry
What is the theme of analysis?
In The Incredibles 2, why does Screenslaver's name use a pun on something that doesn't exist in the 1950s pastiche?
Nth term of Van Eck Sequence
Is it true that "only photographers care about noise"?
Is it possible to have battery technology that can't be duplicated?
Problem with pronounciation
Why would a home insurer offer a discount based on credit score?
DateTime.addMonths skips a month (from feb to mar)
A life of PhD: is it feasible?
What do you call the action of "describing events as they happen" like sports anchors do?
What's the best way to quit a job mostly because of money?
Why did the World Bank set the global poverty line at $1.90?
Convert GE Load Center to main breaker
Placement of positioning lights on A320 winglets
Dedicated bike GPS computer over smartphone
My mom's return ticket is 3 days after I-94 expires
How many sets of dice do I need for D&D?
How can I list the different hex characters between two files?
Course development: can I pay someone to make slides for the course?
Create a cube from identical 3D objects
When to use и or а as “and”?
C++ logging library
Data is missing while scraping using beautifulsoup4
Options for HTML scraping?Why can't Python parse this JSON data?store html in pythonPython : Use PyQT4 + Soup to scrape SEVERAL pagesScrape Google with Python - What is the correct URL for requests.get?How to extract results of webpage script using BeautifulSoup/Pythonpython asyncronous images download (multiple urls)Problem in scraping data in non-english character sites [Python]Table scraping with beautifulsoup4 missing cells
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
Actually I'm a newbie to the parsing stuff with Python Beautifulsoup4. I was scraping this website. I need Current Price Per Mil on the front page.
I already spent 3 hours with this. While looking for the solution on internet. I got to know that there is a library PyQT4 that can mimic like a web browser and load the content and then once it's done with loading you can extract ur required data. But I got crashed.
Used this approach to collect the data in raw text format. I tried other approaches too.
def parseMe(url):
soup = getContent(url)
source_code = requests.get(url)
plaint_text = source_code.text
soup = BeautifulSoup(plaint_text, 'html.parser')
osrs_text = soup.find('div', class_='col-md-12 text-center')
print(osrs_text.encode('utf-8'))
Please have a look on this image. I think the problem is with ::before and ::after tags. They appear once the page get loaded.
Any help will be highly appreciated.
python python-3.x web-scraping beautifulsoup python-requests
add a comment |
Actually I'm a newbie to the parsing stuff with Python Beautifulsoup4. I was scraping this website. I need Current Price Per Mil on the front page.
I already spent 3 hours with this. While looking for the solution on internet. I got to know that there is a library PyQT4 that can mimic like a web browser and load the content and then once it's done with loading you can extract ur required data. But I got crashed.
Used this approach to collect the data in raw text format. I tried other approaches too.
def parseMe(url):
soup = getContent(url)
source_code = requests.get(url)
plaint_text = source_code.text
soup = BeautifulSoup(plaint_text, 'html.parser')
osrs_text = soup.find('div', class_='col-md-12 text-center')
print(osrs_text.encode('utf-8'))
Please have a look on this image. I think the problem is with ::before and ::after tags. They appear once the page get loaded.
Any help will be highly appreciated.
python python-3.x web-scraping beautifulsoup python-requests
add a comment |
Actually I'm a newbie to the parsing stuff with Python Beautifulsoup4. I was scraping this website. I need Current Price Per Mil on the front page.
I already spent 3 hours with this. While looking for the solution on internet. I got to know that there is a library PyQT4 that can mimic like a web browser and load the content and then once it's done with loading you can extract ur required data. But I got crashed.
Used this approach to collect the data in raw text format. I tried other approaches too.
def parseMe(url):
soup = getContent(url)
source_code = requests.get(url)
plaint_text = source_code.text
soup = BeautifulSoup(plaint_text, 'html.parser')
osrs_text = soup.find('div', class_='col-md-12 text-center')
print(osrs_text.encode('utf-8'))
Please have a look on this image. I think the problem is with ::before and ::after tags. They appear once the page get loaded.
Any help will be highly appreciated.
python python-3.x web-scraping beautifulsoup python-requests
Actually I'm a newbie to the parsing stuff with Python Beautifulsoup4. I was scraping this website. I need Current Price Per Mil on the front page.
I already spent 3 hours with this. While looking for the solution on internet. I got to know that there is a library PyQT4 that can mimic like a web browser and load the content and then once it's done with loading you can extract ur required data. But I got crashed.
Used this approach to collect the data in raw text format. I tried other approaches too.
def parseMe(url):
soup = getContent(url)
source_code = requests.get(url)
plaint_text = source_code.text
soup = BeautifulSoup(plaint_text, 'html.parser')
osrs_text = soup.find('div', class_='col-md-12 text-center')
print(osrs_text.encode('utf-8'))
Please have a look on this image. I think the problem is with ::before and ::after tags. They appear once the page get loaded.
Any help will be highly appreciated.
python python-3.x web-scraping beautifulsoup python-requests
python python-3.x web-scraping beautifulsoup python-requests
asked Mar 24 at 23:14
wolohowoloho
333
333
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
The web page makes an XHR to fetch a JSON file with the but price in it
import requests
r = requests.get('https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null')
j = r.json()
# print(j)
print('sellPrice', j['sellPrice'])
print('buyPrice', j['buyPrice'])
Outputs:
sellPrice 0.8
buyPrice 0.62
add a comment |
As mentioned by the other answers, this page only contains the text Current Price Per Mil:
and 0USD
. The value in the middle - 0.8
- is obtained dynamically with JS from the url described below (which can be obtained using a process described (for example) here and many other places. That site checks for bots so you have to use a method described (for example) here.
So all together:
url = 'https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null'
import requests
response = requests.get(url, headers='User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')
response.json()['sellPrice']
Output:
0.8
add a comment |
You should use selenium
instead of `requests:
from selenium import webdriver
from bs4 import BeautifulSoup
def parse(url):
driver = webdriver.Chrome('D:Programmingutilitieschromedriver.exe')
driver.get('https://boglagold.com/buy-runescape-gold/')
soup = BeautifulSoup(driver.page_source)
return soup.find('h4', 'id': 'curr-price-per-mil-text').text
parse()
Output:
'Current Price Per Mil: 0.80USD'
The reason is that the value of that element is obtained through JavaScript, which requests
can't handle. This particular snippet of code uses the Chrome driver; if you prefer, you can use the Firefox/some other browser equivalent (you will need to install the selenium
library and look for the Chrome driver yourself).
add a comment |
The issue is that the javascript dynamically adds the data you want to scrap on that website. You could try to run JS on the client side, wait for fetching the data you want to scrap and then get the DOM contents - if you want to do it that way, please look at @gmds answer to this question. The other method is to check what requests the javascript code is making and which one contains the information you need. Then you can make that request(s) using python and get the required data without the need of using PyQT4 or even BS4.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55329489%2fdata-is-missing-while-scraping-using-beautifulsoup4%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
The web page makes an XHR to fetch a JSON file with the but price in it
import requests
r = requests.get('https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null')
j = r.json()
# print(j)
print('sellPrice', j['sellPrice'])
print('buyPrice', j['buyPrice'])
Outputs:
sellPrice 0.8
buyPrice 0.62
add a comment |
The web page makes an XHR to fetch a JSON file with the but price in it
import requests
r = requests.get('https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null')
j = r.json()
# print(j)
print('sellPrice', j['sellPrice'])
print('buyPrice', j['buyPrice'])
Outputs:
sellPrice 0.8
buyPrice 0.62
add a comment |
The web page makes an XHR to fetch a JSON file with the but price in it
import requests
r = requests.get('https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null')
j = r.json()
# print(j)
print('sellPrice', j['sellPrice'])
print('buyPrice', j['buyPrice'])
Outputs:
sellPrice 0.8
buyPrice 0.62
The web page makes an XHR to fetch a JSON file with the but price in it
import requests
r = requests.get('https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null')
j = r.json()
# print(j)
print('sellPrice', j['sellPrice'])
print('buyPrice', j['buyPrice'])
Outputs:
sellPrice 0.8
buyPrice 0.62
answered Mar 25 at 0:09
Dan-DevDan-Dev
5,13322134
5,13322134
add a comment |
add a comment |
As mentioned by the other answers, this page only contains the text Current Price Per Mil:
and 0USD
. The value in the middle - 0.8
- is obtained dynamically with JS from the url described below (which can be obtained using a process described (for example) here and many other places. That site checks for bots so you have to use a method described (for example) here.
So all together:
url = 'https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null'
import requests
response = requests.get(url, headers='User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')
response.json()['sellPrice']
Output:
0.8
add a comment |
As mentioned by the other answers, this page only contains the text Current Price Per Mil:
and 0USD
. The value in the middle - 0.8
- is obtained dynamically with JS from the url described below (which can be obtained using a process described (for example) here and many other places. That site checks for bots so you have to use a method described (for example) here.
So all together:
url = 'https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null'
import requests
response = requests.get(url, headers='User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')
response.json()['sellPrice']
Output:
0.8
add a comment |
As mentioned by the other answers, this page only contains the text Current Price Per Mil:
and 0USD
. The value in the middle - 0.8
- is obtained dynamically with JS from the url described below (which can be obtained using a process described (for example) here and many other places. That site checks for bots so you have to use a method described (for example) here.
So all together:
url = 'https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null'
import requests
response = requests.get(url, headers='User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')
response.json()['sellPrice']
Output:
0.8
As mentioned by the other answers, this page only contains the text Current Price Per Mil:
and 0USD
. The value in the middle - 0.8
- is obtained dynamically with JS from the url described below (which can be obtained using a process described (for example) here and many other places. That site checks for bots so you have to use a method described (for example) here.
So all together:
url = 'https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null'
import requests
response = requests.get(url, headers='User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')
response.json()['sellPrice']
Output:
0.8
answered Mar 25 at 0:21
Jack FleetingJack Fleeting
1,2411519
1,2411519
add a comment |
add a comment |
You should use selenium
instead of `requests:
from selenium import webdriver
from bs4 import BeautifulSoup
def parse(url):
driver = webdriver.Chrome('D:Programmingutilitieschromedriver.exe')
driver.get('https://boglagold.com/buy-runescape-gold/')
soup = BeautifulSoup(driver.page_source)
return soup.find('h4', 'id': 'curr-price-per-mil-text').text
parse()
Output:
'Current Price Per Mil: 0.80USD'
The reason is that the value of that element is obtained through JavaScript, which requests
can't handle. This particular snippet of code uses the Chrome driver; if you prefer, you can use the Firefox/some other browser equivalent (you will need to install the selenium
library and look for the Chrome driver yourself).
add a comment |
You should use selenium
instead of `requests:
from selenium import webdriver
from bs4 import BeautifulSoup
def parse(url):
driver = webdriver.Chrome('D:Programmingutilitieschromedriver.exe')
driver.get('https://boglagold.com/buy-runescape-gold/')
soup = BeautifulSoup(driver.page_source)
return soup.find('h4', 'id': 'curr-price-per-mil-text').text
parse()
Output:
'Current Price Per Mil: 0.80USD'
The reason is that the value of that element is obtained through JavaScript, which requests
can't handle. This particular snippet of code uses the Chrome driver; if you prefer, you can use the Firefox/some other browser equivalent (you will need to install the selenium
library and look for the Chrome driver yourself).
add a comment |
You should use selenium
instead of `requests:
from selenium import webdriver
from bs4 import BeautifulSoup
def parse(url):
driver = webdriver.Chrome('D:Programmingutilitieschromedriver.exe')
driver.get('https://boglagold.com/buy-runescape-gold/')
soup = BeautifulSoup(driver.page_source)
return soup.find('h4', 'id': 'curr-price-per-mil-text').text
parse()
Output:
'Current Price Per Mil: 0.80USD'
The reason is that the value of that element is obtained through JavaScript, which requests
can't handle. This particular snippet of code uses the Chrome driver; if you prefer, you can use the Firefox/some other browser equivalent (you will need to install the selenium
library and look for the Chrome driver yourself).
You should use selenium
instead of `requests:
from selenium import webdriver
from bs4 import BeautifulSoup
def parse(url):
driver = webdriver.Chrome('D:Programmingutilitieschromedriver.exe')
driver.get('https://boglagold.com/buy-runescape-gold/')
soup = BeautifulSoup(driver.page_source)
return soup.find('h4', 'id': 'curr-price-per-mil-text').text
parse()
Output:
'Current Price Per Mil: 0.80USD'
The reason is that the value of that element is obtained through JavaScript, which requests
can't handle. This particular snippet of code uses the Chrome driver; if you prefer, you can use the Firefox/some other browser equivalent (you will need to install the selenium
library and look for the Chrome driver yourself).
answered Mar 24 at 23:20
gmdsgmds
13.1k31038
13.1k31038
add a comment |
add a comment |
The issue is that the javascript dynamically adds the data you want to scrap on that website. You could try to run JS on the client side, wait for fetching the data you want to scrap and then get the DOM contents - if you want to do it that way, please look at @gmds answer to this question. The other method is to check what requests the javascript code is making and which one contains the information you need. Then you can make that request(s) using python and get the required data without the need of using PyQT4 or even BS4.
add a comment |
The issue is that the javascript dynamically adds the data you want to scrap on that website. You could try to run JS on the client side, wait for fetching the data you want to scrap and then get the DOM contents - if you want to do it that way, please look at @gmds answer to this question. The other method is to check what requests the javascript code is making and which one contains the information you need. Then you can make that request(s) using python and get the required data without the need of using PyQT4 or even BS4.
add a comment |
The issue is that the javascript dynamically adds the data you want to scrap on that website. You could try to run JS on the client side, wait for fetching the data you want to scrap and then get the DOM contents - if you want to do it that way, please look at @gmds answer to this question. The other method is to check what requests the javascript code is making and which one contains the information you need. Then you can make that request(s) using python and get the required data without the need of using PyQT4 or even BS4.
The issue is that the javascript dynamically adds the data you want to scrap on that website. You could try to run JS on the client side, wait for fetching the data you want to scrap and then get the DOM contents - if you want to do it that way, please look at @gmds answer to this question. The other method is to check what requests the javascript code is making and which one contains the information you need. Then you can make that request(s) using python and get the required data without the need of using PyQT4 or even BS4.
answered Mar 24 at 23:20
Tomasz KajtochTomasz Kajtoch
562513
562513
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55329489%2fdata-is-missing-while-scraping-using-beautifulsoup4%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown