Data is missing while scraping using beautifulsoup4Options for HTML scraping?Why can't Python parse this JSON data?store html in pythonPython : Use PyQT4 + Soup to scrape SEVERAL pagesScrape Google with Python - What is the correct URL for requests.get?How to extract results of webpage script using BeautifulSoup/Pythonpython asyncronous images download (multiple urls)Problem in scraping data in non-english character sites [Python]Table scraping with beautifulsoup4 missing cells

Why did Robert pick unworthy men for the White Cloaks?

Professor Roman loves to teach unorthodox Chemistry

What is the theme of analysis?

In The Incredibles 2, why does Screenslaver's name use a pun on something that doesn't exist in the 1950s pastiche?

Nth term of Van Eck Sequence

Is it true that "only photographers care about noise"?

Is it possible to have battery technology that can't be duplicated?

Problem with pronounciation

Why would a home insurer offer a discount based on credit score?

DateTime.addMonths skips a month (from feb to mar)

A life of PhD: is it feasible?

What do you call the action of "describing events as they happen" like sports anchors do?

What's the best way to quit a job mostly because of money?

Why did the World Bank set the global poverty line at $1.90?

Convert GE Load Center to main breaker

Placement of positioning lights on A320 winglets

Dedicated bike GPS computer over smartphone

My mom's return ticket is 3 days after I-94 expires

How many sets of dice do I need for D&D?

How can I list the different hex characters between two files?

Course development: can I pay someone to make slides for the course?

Create a cube from identical 3D objects

When to use и or а as “and”?

C++ logging library

Data is missing while scraping using beautifulsoup4

Options for HTML scraping?Why can't Python parse this JSON data?store html in pythonPython : Use PyQT4 + Soup to scrape SEVERAL pagesScrape Google with Python - What is the correct URL for requests.get?How to extract results of webpage script using BeautifulSoup/Pythonpython asyncronous images download (multiple urls)Problem in scraping data in non-english character sites [Python]Table scraping with beautifulsoup4 missing cells

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

Actually I'm a newbie to the parsing stuff with Python Beautifulsoup4. I was scraping this website. I need Current Price Per Mil on the front page.

I already spent 3 hours with this. While looking for the solution on internet. I got to know that there is a library PyQT4 that can mimic like a web browser and load the content and then once it's done with loading you can extract ur required data. But I got crashed.

Used this approach to collect the data in raw text format. I tried other approaches too.

def parseMe(url):
 soup = getContent(url)
 source_code = requests.get(url)
 plaint_text = source_code.text
 soup = BeautifulSoup(plaint_text, 'html.parser')
 osrs_text = soup.find('div', class_='col-md-12 text-center')
 print(osrs_text.encode('utf-8'))

Please have a look on this image. I think the problem is with ::before and ::after tags. They appear once the page get loaded.
Any help will be highly appreciated.

asked Mar 24 at 23:14

woloho

333

add a comment |

Actually I'm a newbie to the parsing stuff with Python Beautifulsoup4. I was scraping this website. I need Current Price Per Mil on the front page.

Used this approach to collect the data in raw text format. I tried other approaches too.

def parseMe(url):
 soup = getContent(url)
 source_code = requests.get(url)
 plaint_text = source_code.text
 soup = BeautifulSoup(plaint_text, 'html.parser')
 osrs_text = soup.find('div', class_='col-md-12 text-center')
 print(osrs_text.encode('utf-8'))

Please have a look on this image. I think the problem is with ::before and ::after tags. They appear once the page get loaded.
Any help will be highly appreciated.

asked Mar 24 at 23:14

woloho

333

add a comment |

Actually I'm a newbie to the parsing stuff with Python Beautifulsoup4. I was scraping this website. I need Current Price Per Mil on the front page.

Used this approach to collect the data in raw text format. I tried other approaches too.

def parseMe(url):
 soup = getContent(url)
 source_code = requests.get(url)
 plaint_text = source_code.text
 soup = BeautifulSoup(plaint_text, 'html.parser')
 osrs_text = soup.find('div', class_='col-md-12 text-center')
 print(osrs_text.encode('utf-8'))

Please have a look on this image. I think the problem is with ::before and ::after tags. They appear once the page get loaded.
Any help will be highly appreciated.

asked Mar 24 at 23:14

woloho

333

Actually I'm a newbie to the parsing stuff with Python Beautifulsoup4. I was scraping this website. I need Current Price Per Mil on the front page.

Used this approach to collect the data in raw text format. I tried other approaches too.

def parseMe(url):
 soup = getContent(url)
 source_code = requests.get(url)
 plaint_text = source_code.text
 soup = BeautifulSoup(plaint_text, 'html.parser')
 osrs_text = soup.find('div', class_='col-md-12 text-center')
 print(osrs_text.encode('utf-8'))

Please have a look on this image. I think the problem is with ::before and ::after tags. They appear once the page get loaded.
Any help will be highly appreciated.

python python-3.x web-scraping beautifulsoup python-requests

asked Mar 24 at 23:14

woloho

333

asked Mar 24 at 23:14

woloho

333

asked Mar 24 at 23:14

woloho

333

asked Mar 24 at 23:14

woloho

333

asked Mar 24 at 23:14

woloho

333

add a comment |

4 Answers
4

active

oldest

votes

The web page makes an XHR to fetch a JSON file with the but price in it

import requests

r = requests.get('https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null')
j = r.json()
# print(j)
print('sellPrice', j['sellPrice'])
print('buyPrice', j['buyPrice'])

Outputs:

sellPrice 0.8
buyPrice 0.62

answered Mar 25 at 0:09

Dan-Dev

5,13322134

add a comment |

As mentioned by the other answers, this page only contains the text Current Price Per Mil: and 0USD. The value in the middle - 0.8 - is obtained dynamically with JS from the url described below (which can be obtained using a process described (for example) here and many other places. That site checks for bots so you have to use a method described (for example) here.

So all together:

url = 'https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null'
import requests
response = requests.get(url, headers='User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')

response.json()['sellPrice']

Output:

0.8

answered Mar 25 at 0:21

Jack Fleeting

1,2411519

add a comment |

You should use selenium instead of `requests:

from selenium import webdriver
from bs4 import BeautifulSoup

def parse(url):
 driver = webdriver.Chrome('D:Programmingutilitieschromedriver.exe')
 driver.get('https://boglagold.com/buy-runescape-gold/')
 soup = BeautifulSoup(driver.page_source)
 return soup.find('h4', 'id': 'curr-price-per-mil-text').text

parse()

Output:

'Current Price Per Mil: 0.80USD'

The reason is that the value of that element is obtained through JavaScript, which requests can't handle. This particular snippet of code uses the Chrome driver; if you prefer, you can use the Firefox/some other browser equivalent (you will need to install the selenium library and look for the Chrome driver yourself).

answered Mar 24 at 23:20

gmds

13.1k31038

add a comment |

The issue is that the javascript dynamically adds the data you want to scrap on that website. You could try to run JS on the client side, wait for fetching the data you want to scrap and then get the DOM contents - if you want to do it that way, please look at @gmds answer to this question. The other method is to check what requests the javascript code is making and which one contains the information you need. Then you can make that request(s) using python and get the required data without the need of using PyQT4 or even BS4.

answered Mar 24 at 23:20

Tomasz Kajtoch

562513

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55329489%2fdata-is-missing-while-scraping-using-beautifulsoup4%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

The web page makes an XHR to fetch a JSON file with the but price in it

import requests

r = requests.get('https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null')
j = r.json()
# print(j)
print('sellPrice', j['sellPrice'])
print('buyPrice', j['buyPrice'])

Outputs:

sellPrice 0.8
buyPrice 0.62

answered Mar 25 at 0:09

Dan-Dev

5,13322134

add a comment |

The web page makes an XHR to fetch a JSON file with the but price in it

import requests

r = requests.get('https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null')
j = r.json()
# print(j)
print('sellPrice', j['sellPrice'])
print('buyPrice', j['buyPrice'])

Outputs:

sellPrice 0.8
buyPrice 0.62

answered Mar 25 at 0:09

Dan-Dev

5,13322134

add a comment |

The web page makes an XHR to fetch a JSON file with the but price in it

import requests

r = requests.get('https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null')
j = r.json()
# print(j)
print('sellPrice', j['sellPrice'])
print('buyPrice', j['buyPrice'])

Outputs:

sellPrice 0.8
buyPrice 0.62

answered Mar 25 at 0:09

Dan-Dev

5,13322134

The web page makes an XHR to fetch a JSON file with the but price in it

import requests

r = requests.get('https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null')
j = r.json()
# print(j)
print('sellPrice', j['sellPrice'])
print('buyPrice', j['buyPrice'])

Outputs:

sellPrice 0.8
buyPrice 0.62

answered Mar 25 at 0:09

Dan-Dev

5,13322134

answered Mar 25 at 0:09

Dan-Dev

5,13322134

answered Mar 25 at 0:09

Dan-Dev

5,13322134

answered Mar 25 at 0:09

Dan-Dev

5,13322134

add a comment |

So all together:

url = 'https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null'
import requests
response = requests.get(url, headers='User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')

response.json()['sellPrice']

Output:

0.8

answered Mar 25 at 0:21

Jack Fleeting

1,2411519

add a comment |

So all together:

url = 'https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null'
import requests
response = requests.get(url, headers='User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')

response.json()['sellPrice']

Output:

0.8

answered Mar 25 at 0:21

Jack Fleeting

1,2411519

add a comment |

So all together:

url = 'https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null'
import requests
response = requests.get(url, headers='User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')

response.json()['sellPrice']

Output:

0.8

answered Mar 25 at 0:21

Jack Fleeting

1,2411519

So all together:

url = 'https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null'
import requests
response = requests.get(url, headers='User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')

response.json()['sellPrice']

Output:

0.8

answered Mar 25 at 0:21

Jack Fleeting

1,2411519

answered Mar 25 at 0:21

Jack Fleeting

1,2411519

answered Mar 25 at 0:21

Jack Fleeting

1,2411519

answered Mar 25 at 0:21

Jack Fleeting

1,2411519

add a comment |

You should use selenium instead of `requests:

from selenium import webdriver
from bs4 import BeautifulSoup

def parse(url):
 driver = webdriver.Chrome('D:Programmingutilitieschromedriver.exe')
 driver.get('https://boglagold.com/buy-runescape-gold/')
 soup = BeautifulSoup(driver.page_source)
 return soup.find('h4', 'id': 'curr-price-per-mil-text').text

parse()

Output:

'Current Price Per Mil: 0.80USD'

answered Mar 24 at 23:20

gmds

13.1k31038

add a comment |

You should use selenium instead of `requests:

from selenium import webdriver
from bs4 import BeautifulSoup

def parse(url):
 driver = webdriver.Chrome('D:Programmingutilitieschromedriver.exe')
 driver.get('https://boglagold.com/buy-runescape-gold/')
 soup = BeautifulSoup(driver.page_source)
 return soup.find('h4', 'id': 'curr-price-per-mil-text').text

parse()

Output:

'Current Price Per Mil: 0.80USD'

answered Mar 24 at 23:20

gmds

13.1k31038

add a comment |

You should use selenium instead of `requests:

from selenium import webdriver
from bs4 import BeautifulSoup

def parse(url):
 driver = webdriver.Chrome('D:Programmingutilitieschromedriver.exe')
 driver.get('https://boglagold.com/buy-runescape-gold/')
 soup = BeautifulSoup(driver.page_source)
 return soup.find('h4', 'id': 'curr-price-per-mil-text').text

parse()

Output:

'Current Price Per Mil: 0.80USD'

answered Mar 24 at 23:20

gmds

13.1k31038

You should use selenium instead of `requests:

from selenium import webdriver
from bs4 import BeautifulSoup

def parse(url):
 driver = webdriver.Chrome('D:Programmingutilitieschromedriver.exe')
 driver.get('https://boglagold.com/buy-runescape-gold/')
 soup = BeautifulSoup(driver.page_source)
 return soup.find('h4', 'id': 'curr-price-per-mil-text').text

parse()

Output:

'Current Price Per Mil: 0.80USD'

answered Mar 24 at 23:20

gmds

13.1k31038

answered Mar 24 at 23:20

gmds

13.1k31038

answered Mar 24 at 23:20

gmds

13.1k31038

answered Mar 24 at 23:20

gmds

13.1k31038

add a comment |

answered Mar 24 at 23:20

Tomasz Kajtoch

562513

add a comment |

answered Mar 24 at 23:20

Tomasz Kajtoch

562513

add a comment |

answered Mar 24 at 23:20

Tomasz Kajtoch

562513

answered Mar 24 at 23:20

Tomasz Kajtoch

562513

answered Mar 24 at 23:20

Tomasz Kajtoch

562513

answered Mar 24 at 23:20

Tomasz Kajtoch

562513

answered Mar 24 at 23:20

Tomasz Kajtoch

562513

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

4 Answers
4

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

4 Answers
4

4 Answers
4

4 Answers
4