Data is missing while scraping using beautifulsoup4Options for HTML scraping?Why can't Python parse this JSON data?store html in pythonPython : Use PyQT4 + Soup to scrape SEVERAL pagesScrape Google with Python - What is the correct URL for requests.get?How to extract results of webpage script using BeautifulSoup/Pythonpython asyncronous images download (multiple urls)Problem in scraping data in non-english character sites [Python]Table scraping with beautifulsoup4 missing cells

Why did Robert pick unworthy men for the White Cloaks?

Professor Roman loves to teach unorthodox Chemistry

What is the theme of analysis?

In The Incredibles 2, why does Screenslaver's name use a pun on something that doesn't exist in the 1950s pastiche?

Nth term of Van Eck Sequence

Is it true that "only photographers care about noise"?

Is it possible to have battery technology that can't be duplicated?

Problem with pronounciation

Why would a home insurer offer a discount based on credit score?

DateTime.addMonths skips a month (from feb to mar)

A life of PhD: is it feasible?

What do you call the action of "describing events as they happen" like sports anchors do?

What's the best way to quit a job mostly because of money?

Why did the World Bank set the global poverty line at $1.90?

Convert GE Load Center to main breaker

Placement of positioning lights on A320 winglets

Dedicated bike GPS computer over smartphone

My mom's return ticket is 3 days after I-94 expires

How many sets of dice do I need for D&D?

How can I list the different hex characters between two files?

Course development: can I pay someone to make slides for the course?

Create a cube from identical 3D objects

When to use и or а as “and”?

C++ logging library



Data is missing while scraping using beautifulsoup4


Options for HTML scraping?Why can't Python parse this JSON data?store html in pythonPython : Use PyQT4 + Soup to scrape SEVERAL pagesScrape Google with Python - What is the correct URL for requests.get?How to extract results of webpage script using BeautifulSoup/Pythonpython asyncronous images download (multiple urls)Problem in scraping data in non-english character sites [Python]Table scraping with beautifulsoup4 missing cells






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








3















Actually I'm a newbie to the parsing stuff with Python Beautifulsoup4. I was scraping this website. I need Current Price Per Mil on the front page.



I already spent 3 hours with this. While looking for the solution on internet. I got to know that there is a library PyQT4 that can mimic like a web browser and load the content and then once it's done with loading you can extract ur required data. But I got crashed.



Used this approach to collect the data in raw text format. I tried other approaches too.



def parseMe(url):
soup = getContent(url)
source_code = requests.get(url)
plaint_text = source_code.text
soup = BeautifulSoup(plaint_text, 'html.parser')
osrs_text = soup.find('div', class_='col-md-12 text-center')
print(osrs_text.encode('utf-8'))


Please have a look on this image. I think the problem is with ::before and ::after tags. They appear once the page get loaded.
Any help will be highly appreciated.










share|improve this question




























    3















    Actually I'm a newbie to the parsing stuff with Python Beautifulsoup4. I was scraping this website. I need Current Price Per Mil on the front page.



    I already spent 3 hours with this. While looking for the solution on internet. I got to know that there is a library PyQT4 that can mimic like a web browser and load the content and then once it's done with loading you can extract ur required data. But I got crashed.



    Used this approach to collect the data in raw text format. I tried other approaches too.



    def parseMe(url):
    soup = getContent(url)
    source_code = requests.get(url)
    plaint_text = source_code.text
    soup = BeautifulSoup(plaint_text, 'html.parser')
    osrs_text = soup.find('div', class_='col-md-12 text-center')
    print(osrs_text.encode('utf-8'))


    Please have a look on this image. I think the problem is with ::before and ::after tags. They appear once the page get loaded.
    Any help will be highly appreciated.










    share|improve this question
























      3












      3








      3








      Actually I'm a newbie to the parsing stuff with Python Beautifulsoup4. I was scraping this website. I need Current Price Per Mil on the front page.



      I already spent 3 hours with this. While looking for the solution on internet. I got to know that there is a library PyQT4 that can mimic like a web browser and load the content and then once it's done with loading you can extract ur required data. But I got crashed.



      Used this approach to collect the data in raw text format. I tried other approaches too.



      def parseMe(url):
      soup = getContent(url)
      source_code = requests.get(url)
      plaint_text = source_code.text
      soup = BeautifulSoup(plaint_text, 'html.parser')
      osrs_text = soup.find('div', class_='col-md-12 text-center')
      print(osrs_text.encode('utf-8'))


      Please have a look on this image. I think the problem is with ::before and ::after tags. They appear once the page get loaded.
      Any help will be highly appreciated.










      share|improve this question














      Actually I'm a newbie to the parsing stuff with Python Beautifulsoup4. I was scraping this website. I need Current Price Per Mil on the front page.



      I already spent 3 hours with this. While looking for the solution on internet. I got to know that there is a library PyQT4 that can mimic like a web browser and load the content and then once it's done with loading you can extract ur required data. But I got crashed.



      Used this approach to collect the data in raw text format. I tried other approaches too.



      def parseMe(url):
      soup = getContent(url)
      source_code = requests.get(url)
      plaint_text = source_code.text
      soup = BeautifulSoup(plaint_text, 'html.parser')
      osrs_text = soup.find('div', class_='col-md-12 text-center')
      print(osrs_text.encode('utf-8'))


      Please have a look on this image. I think the problem is with ::before and ::after tags. They appear once the page get loaded.
      Any help will be highly appreciated.







      python python-3.x web-scraping beautifulsoup python-requests






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 24 at 23:14









      wolohowoloho

      333




      333






















          4 Answers
          4






          active

          oldest

          votes


















          1














          The web page makes an XHR to fetch a JSON file with the but price in it



          import requests

          r = requests.get('https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null')
          j = r.json()
          # print(j)
          print('sellPrice', j['sellPrice'])
          print('buyPrice', j['buyPrice'])


          Outputs:



          sellPrice 0.8
          buyPrice 0.62





          share|improve this answer






























            1














            As mentioned by the other answers, this page only contains the text Current Price Per Mil: and 0USD. The value in the middle - 0.8 - is obtained dynamically with JS from the url described below (which can be obtained using a process described (for example) here and many other places. That site checks for bots so you have to use a method described (for example) here.



            So all together:



            url = 'https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null'
            import requests
            response = requests.get(url, headers='User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')

            response.json()['sellPrice']


            Output:



            0.8





            share|improve this answer






























              0














              You should use selenium instead of `requests:



              from selenium import webdriver
              from bs4 import BeautifulSoup

              def parse(url):
              driver = webdriver.Chrome('D:Programmingutilitieschromedriver.exe')
              driver.get('https://boglagold.com/buy-runescape-gold/')
              soup = BeautifulSoup(driver.page_source)
              return soup.find('h4', 'id': 'curr-price-per-mil-text').text

              parse()


              Output:



              'Current Price Per Mil: 0.80USD'


              The reason is that the value of that element is obtained through JavaScript, which requests can't handle. This particular snippet of code uses the Chrome driver; if you prefer, you can use the Firefox/some other browser equivalent (you will need to install the selenium library and look for the Chrome driver yourself).






              share|improve this answer






























                0














                The issue is that the javascript dynamically adds the data you want to scrap on that website. You could try to run JS on the client side, wait for fetching the data you want to scrap and then get the DOM contents - if you want to do it that way, please look at @gmds answer to this question. The other method is to check what requests the javascript code is making and which one contains the information you need. Then you can make that request(s) using python and get the required data without the need of using PyQT4 or even BS4.






                share|improve this answer























                  Your Answer






                  StackExchange.ifUsing("editor", function ()
                  StackExchange.using("externalEditor", function ()
                  StackExchange.using("snippets", function ()
                  StackExchange.snippets.init();
                  );
                  );
                  , "code-snippets");

                  StackExchange.ready(function()
                  var channelOptions =
                  tags: "".split(" "),
                  id: "1"
                  ;
                  initTagRenderer("".split(" "), "".split(" "), channelOptions);

                  StackExchange.using("externalEditor", function()
                  // Have to fire editor after snippets, if snippets enabled
                  if (StackExchange.settings.snippets.snippetsEnabled)
                  StackExchange.using("snippets", function()
                  createEditor();
                  );

                  else
                  createEditor();

                  );

                  function createEditor()
                  StackExchange.prepareEditor(
                  heartbeatType: 'answer',
                  autoActivateHeartbeat: false,
                  convertImagesToLinks: true,
                  noModals: true,
                  showLowRepImageUploadWarning: true,
                  reputationToPostImages: 10,
                  bindNavPrevention: true,
                  postfix: "",
                  imageUploader:
                  brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                  contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                  allowUrls: true
                  ,
                  onDemand: true,
                  discardSelector: ".discard-answer"
                  ,immediatelyShowMarkdownHelp:true
                  );



                  );













                  draft saved

                  draft discarded


















                  StackExchange.ready(
                  function ()
                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55329489%2fdata-is-missing-while-scraping-using-beautifulsoup4%23new-answer', 'question_page');

                  );

                  Post as a guest















                  Required, but never shown

























                  4 Answers
                  4






                  active

                  oldest

                  votes








                  4 Answers
                  4






                  active

                  oldest

                  votes









                  active

                  oldest

                  votes






                  active

                  oldest

                  votes









                  1














                  The web page makes an XHR to fetch a JSON file with the but price in it



                  import requests

                  r = requests.get('https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null')
                  j = r.json()
                  # print(j)
                  print('sellPrice', j['sellPrice'])
                  print('buyPrice', j['buyPrice'])


                  Outputs:



                  sellPrice 0.8
                  buyPrice 0.62





                  share|improve this answer



























                    1














                    The web page makes an XHR to fetch a JSON file with the but price in it



                    import requests

                    r = requests.get('https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null')
                    j = r.json()
                    # print(j)
                    print('sellPrice', j['sellPrice'])
                    print('buyPrice', j['buyPrice'])


                    Outputs:



                    sellPrice 0.8
                    buyPrice 0.62





                    share|improve this answer

























                      1












                      1








                      1







                      The web page makes an XHR to fetch a JSON file with the but price in it



                      import requests

                      r = requests.get('https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null')
                      j = r.json()
                      # print(j)
                      print('sellPrice', j['sellPrice'])
                      print('buyPrice', j['buyPrice'])


                      Outputs:



                      sellPrice 0.8
                      buyPrice 0.62





                      share|improve this answer













                      The web page makes an XHR to fetch a JSON file with the but price in it



                      import requests

                      r = requests.get('https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null')
                      j = r.json()
                      # print(j)
                      print('sellPrice', j['sellPrice'])
                      print('buyPrice', j['buyPrice'])


                      Outputs:



                      sellPrice 0.8
                      buyPrice 0.62






                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Mar 25 at 0:09









                      Dan-DevDan-Dev

                      5,13322134




                      5,13322134























                          1














                          As mentioned by the other answers, this page only contains the text Current Price Per Mil: and 0USD. The value in the middle - 0.8 - is obtained dynamically with JS from the url described below (which can be obtained using a process described (for example) here and many other places. That site checks for bots so you have to use a method described (for example) here.



                          So all together:



                          url = 'https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null'
                          import requests
                          response = requests.get(url, headers='User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')

                          response.json()['sellPrice']


                          Output:



                          0.8





                          share|improve this answer



























                            1














                            As mentioned by the other answers, this page only contains the text Current Price Per Mil: and 0USD. The value in the middle - 0.8 - is obtained dynamically with JS from the url described below (which can be obtained using a process described (for example) here and many other places. That site checks for bots so you have to use a method described (for example) here.



                            So all together:



                            url = 'https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null'
                            import requests
                            response = requests.get(url, headers='User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')

                            response.json()['sellPrice']


                            Output:



                            0.8





                            share|improve this answer

























                              1












                              1








                              1







                              As mentioned by the other answers, this page only contains the text Current Price Per Mil: and 0USD. The value in the middle - 0.8 - is obtained dynamically with JS from the url described below (which can be obtained using a process described (for example) here and many other places. That site checks for bots so you have to use a method described (for example) here.



                              So all together:



                              url = 'https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null'
                              import requests
                              response = requests.get(url, headers='User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')

                              response.json()['sellPrice']


                              Output:



                              0.8





                              share|improve this answer













                              As mentioned by the other answers, this page only contains the text Current Price Per Mil: and 0USD. The value in the middle - 0.8 - is obtained dynamically with JS from the url described below (which can be obtained using a process described (for example) here and many other places. That site checks for bots so you have to use a method described (for example) here.



                              So all together:



                              url = 'https://api.boglagold.com/api/product/?id=osrs-gold&couponCode=null'
                              import requests
                              response = requests.get(url, headers='User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')

                              response.json()['sellPrice']


                              Output:



                              0.8






                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Mar 25 at 0:21









                              Jack FleetingJack Fleeting

                              1,2411519




                              1,2411519





















                                  0














                                  You should use selenium instead of `requests:



                                  from selenium import webdriver
                                  from bs4 import BeautifulSoup

                                  def parse(url):
                                  driver = webdriver.Chrome('D:Programmingutilitieschromedriver.exe')
                                  driver.get('https://boglagold.com/buy-runescape-gold/')
                                  soup = BeautifulSoup(driver.page_source)
                                  return soup.find('h4', 'id': 'curr-price-per-mil-text').text

                                  parse()


                                  Output:



                                  'Current Price Per Mil: 0.80USD'


                                  The reason is that the value of that element is obtained through JavaScript, which requests can't handle. This particular snippet of code uses the Chrome driver; if you prefer, you can use the Firefox/some other browser equivalent (you will need to install the selenium library and look for the Chrome driver yourself).






                                  share|improve this answer



























                                    0














                                    You should use selenium instead of `requests:



                                    from selenium import webdriver
                                    from bs4 import BeautifulSoup

                                    def parse(url):
                                    driver = webdriver.Chrome('D:Programmingutilitieschromedriver.exe')
                                    driver.get('https://boglagold.com/buy-runescape-gold/')
                                    soup = BeautifulSoup(driver.page_source)
                                    return soup.find('h4', 'id': 'curr-price-per-mil-text').text

                                    parse()


                                    Output:



                                    'Current Price Per Mil: 0.80USD'


                                    The reason is that the value of that element is obtained through JavaScript, which requests can't handle. This particular snippet of code uses the Chrome driver; if you prefer, you can use the Firefox/some other browser equivalent (you will need to install the selenium library and look for the Chrome driver yourself).






                                    share|improve this answer

























                                      0












                                      0








                                      0







                                      You should use selenium instead of `requests:



                                      from selenium import webdriver
                                      from bs4 import BeautifulSoup

                                      def parse(url):
                                      driver = webdriver.Chrome('D:Programmingutilitieschromedriver.exe')
                                      driver.get('https://boglagold.com/buy-runescape-gold/')
                                      soup = BeautifulSoup(driver.page_source)
                                      return soup.find('h4', 'id': 'curr-price-per-mil-text').text

                                      parse()


                                      Output:



                                      'Current Price Per Mil: 0.80USD'


                                      The reason is that the value of that element is obtained through JavaScript, which requests can't handle. This particular snippet of code uses the Chrome driver; if you prefer, you can use the Firefox/some other browser equivalent (you will need to install the selenium library and look for the Chrome driver yourself).






                                      share|improve this answer













                                      You should use selenium instead of `requests:



                                      from selenium import webdriver
                                      from bs4 import BeautifulSoup

                                      def parse(url):
                                      driver = webdriver.Chrome('D:Programmingutilitieschromedriver.exe')
                                      driver.get('https://boglagold.com/buy-runescape-gold/')
                                      soup = BeautifulSoup(driver.page_source)
                                      return soup.find('h4', 'id': 'curr-price-per-mil-text').text

                                      parse()


                                      Output:



                                      'Current Price Per Mil: 0.80USD'


                                      The reason is that the value of that element is obtained through JavaScript, which requests can't handle. This particular snippet of code uses the Chrome driver; if you prefer, you can use the Firefox/some other browser equivalent (you will need to install the selenium library and look for the Chrome driver yourself).







                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Mar 24 at 23:20









                                      gmdsgmds

                                      13.1k31038




                                      13.1k31038





















                                          0














                                          The issue is that the javascript dynamically adds the data you want to scrap on that website. You could try to run JS on the client side, wait for fetching the data you want to scrap and then get the DOM contents - if you want to do it that way, please look at @gmds answer to this question. The other method is to check what requests the javascript code is making and which one contains the information you need. Then you can make that request(s) using python and get the required data without the need of using PyQT4 or even BS4.






                                          share|improve this answer



























                                            0














                                            The issue is that the javascript dynamically adds the data you want to scrap on that website. You could try to run JS on the client side, wait for fetching the data you want to scrap and then get the DOM contents - if you want to do it that way, please look at @gmds answer to this question. The other method is to check what requests the javascript code is making and which one contains the information you need. Then you can make that request(s) using python and get the required data without the need of using PyQT4 or even BS4.






                                            share|improve this answer

























                                              0












                                              0








                                              0







                                              The issue is that the javascript dynamically adds the data you want to scrap on that website. You could try to run JS on the client side, wait for fetching the data you want to scrap and then get the DOM contents - if you want to do it that way, please look at @gmds answer to this question. The other method is to check what requests the javascript code is making and which one contains the information you need. Then you can make that request(s) using python and get the required data without the need of using PyQT4 or even BS4.






                                              share|improve this answer













                                              The issue is that the javascript dynamically adds the data you want to scrap on that website. You could try to run JS on the client side, wait for fetching the data you want to scrap and then get the DOM contents - if you want to do it that way, please look at @gmds answer to this question. The other method is to check what requests the javascript code is making and which one contains the information you need. Then you can make that request(s) using python and get the required data without the need of using PyQT4 or even BS4.







                                              share|improve this answer












                                              share|improve this answer



                                              share|improve this answer










                                              answered Mar 24 at 23:20









                                              Tomasz KajtochTomasz Kajtoch

                                              562513




                                              562513



























                                                  draft saved

                                                  draft discarded
















































                                                  Thanks for contributing an answer to Stack Overflow!


                                                  • Please be sure to answer the question. Provide details and share your research!

                                                  But avoid


                                                  • Asking for help, clarification, or responding to other answers.

                                                  • Making statements based on opinion; back them up with references or personal experience.

                                                  To learn more, see our tips on writing great answers.




                                                  draft saved


                                                  draft discarded














                                                  StackExchange.ready(
                                                  function ()
                                                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55329489%2fdata-is-missing-while-scraping-using-beautifulsoup4%23new-answer', 'question_page');

                                                  );

                                                  Post as a guest















                                                  Required, but never shown





















































                                                  Required, but never shown














                                                  Required, but never shown












                                                  Required, but never shown







                                                  Required, but never shown

































                                                  Required, but never shown














                                                  Required, but never shown












                                                  Required, but never shown







                                                  Required, but never shown







                                                  Popular posts from this blog

                                                  Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

                                                  Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

                                                  Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript