Extracting a json out of the main body of page source Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30 pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Parsing this deeply nested JSON object within a webpage source codeMaximum Python JSON object length in Windows?How to extract the main text body of a url, discarding all irrelevant dataHow to scrape data from Datastore.prime using vbaBeautiflsoup Create Soup With A Snippet of Page SourcePython Selenium Page SourceScraping dynamic data with Selenium & Python 'Unable to locate element'Selenium- page source not changing after performing click()Selenium is unable to extract page source and returning empty body of html pageSelenium driver.get gives different page source than actual source in browser

What's the difference between using dependency injection with a container and using a service locator?

Was there ever a LEGO store in Miami International Airport?

How to begin with a paragraph in latex

Why would the Overseers waste their stock of slaves on the Game?

Why I cannot instantiate a class whose constructor is private in a friend class?

Will I lose my paid in full property

Determinant of a matrix with 2 equal rows

Where to find documentation for `whois` command options?

Is there an efficient way for synchronising audio events real-time with LEDs using an MCU?

Will temporary Dex penalties prevent you from getting the benefits of the "Two Weapon Fighting" feat if your Dex score falls below the prerequisite?

Coin Game with infinite paradox

Does Prince Arnaud cause someone holding the Princess to lose?

Why did Europeans not widely domesticate foxes?

Raising a bilingual kid. When should we introduce the majority language?

Processing ADC conversion result: DMA vs Processor Registers

What is a 'Key' in computer science?

Is it accepted to use working hours to read general interest books?

Why does Java have support for time zone offsets with seconds precision?

Are these square matrices always diagonalisable?

Philosophers who were composers?

France's Public Holidays' Puzzle

What were wait-states, and why was it only an issue for PCs?

My admission is revoked after accepting the admission offer

What was Apollo 13's "Little Jolt" after MECO?



Extracting a json out of the main body of page source



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30 pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Parsing this deeply nested JSON object within a webpage source codeMaximum Python JSON object length in Windows?How to extract the main text body of a url, discarding all irrelevant dataHow to scrape data from Datastore.prime using vbaBeautiflsoup Create Soup With A Snippet of Page SourcePython Selenium Page SourceScraping dynamic data with Selenium & Python 'Unable to locate element'Selenium- page source not changing after performing click()Selenium is unable to extract page source and returning empty body of html pageSelenium driver.get gives different page source than actual source in browser



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I am trying to scrape the data off the webpage below, using Selenium in Python 3:



https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield



If this URL is viewed as the page source (for Chrome users: view-source: https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield), there are some JSONs within the text. My aim is to scrape the first, and quite substantial JSON, which sits in the 'var matchCentreData' bit. A snippet is featured below:



<script type="text/javascript">

var matchCentreData = {"playerIdNameDictionary":{"14244":"Pablo Zabaleta",
"89998":"Manuel Lanzini","34693":"Marko Arnautovic","93026":"Felipe Anderson",
"300359":"Issa Diop","122980"


I am able to scrape the entirety of the page source, however, I am struggling to extract only the JSON above. Any help would be much appreciated!










share|improve this question






























    0















    I am trying to scrape the data off the webpage below, using Selenium in Python 3:



    https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield



    If this URL is viewed as the page source (for Chrome users: view-source: https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield), there are some JSONs within the text. My aim is to scrape the first, and quite substantial JSON, which sits in the 'var matchCentreData' bit. A snippet is featured below:



    <script type="text/javascript">

    var matchCentreData = {"playerIdNameDictionary":{"14244":"Pablo Zabaleta",
    "89998":"Manuel Lanzini","34693":"Marko Arnautovic","93026":"Felipe Anderson",
    "300359":"Issa Diop","122980"


    I am able to scrape the entirety of the page source, however, I am struggling to extract only the JSON above. Any help would be much appreciated!










    share|improve this question


























      0












      0








      0


      1






      I am trying to scrape the data off the webpage below, using Selenium in Python 3:



      https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield



      If this URL is viewed as the page source (for Chrome users: view-source: https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield), there are some JSONs within the text. My aim is to scrape the first, and quite substantial JSON, which sits in the 'var matchCentreData' bit. A snippet is featured below:



      <script type="text/javascript">

      var matchCentreData = {"playerIdNameDictionary":{"14244":"Pablo Zabaleta",
      "89998":"Manuel Lanzini","34693":"Marko Arnautovic","93026":"Felipe Anderson",
      "300359":"Issa Diop","122980"


      I am able to scrape the entirety of the page source, however, I am struggling to extract only the JSON above. Any help would be much appreciated!










      share|improve this question
















      I am trying to scrape the data off the webpage below, using Selenium in Python 3:



      https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield



      If this URL is viewed as the page source (for Chrome users: view-source: https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield), there are some JSONs within the text. My aim is to scrape the first, and quite substantial JSON, which sits in the 'var matchCentreData' bit. A snippet is featured below:



      <script type="text/javascript">

      var matchCentreData = {"playerIdNameDictionary":{"14244":"Pablo Zabaleta",
      "89998":"Manuel Lanzini","34693":"Marko Arnautovic","93026":"Felipe Anderson",
      "300359":"Issa Diop","122980"


      I am able to scrape the entirety of the page source, however, I am struggling to extract only the JSON above. Any help would be much appreciated!







      python selenium web-scraping






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 22 at 15:00









      Ivan Kaloyanov

      1,25331019




      1,25331019










      asked Mar 22 at 14:55









      bobmanbobman

      368




      368






















          2 Answers
          2






          active

          oldest

          votes


















          3














          This is all you need.



          page_json = driver.execute_script("return JSON.stringify(matchCentreData)")
          # Do what you want with the json.


          Worked for me just now. And if you want both this specifically AND the page html, then do this step, along with your page source grabbing logic. No need to extract it specifically from the page source when you have this.






          share|improve this answer






























            0














            You may have to do some string manipulation. Check out BeautifulSoup, i think it will return the entirety of the DOM, and you can do some string work to parse out your relevant data.



            Edit: I see you're actually trying to extract just the json string from the entire DOM string. What substring/regex's have you tried?






            share|improve this answer























            • Yeah I think I'm going to have to go down the regex route - just seems a relatively long winded process when a json is just sitting there!

              – bobman
              Mar 22 at 15:16











            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55302348%2fextracting-a-json-out-of-the-main-body-of-page-source%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3














            This is all you need.



            page_json = driver.execute_script("return JSON.stringify(matchCentreData)")
            # Do what you want with the json.


            Worked for me just now. And if you want both this specifically AND the page html, then do this step, along with your page source grabbing logic. No need to extract it specifically from the page source when you have this.






            share|improve this answer



























              3














              This is all you need.



              page_json = driver.execute_script("return JSON.stringify(matchCentreData)")
              # Do what you want with the json.


              Worked for me just now. And if you want both this specifically AND the page html, then do this step, along with your page source grabbing logic. No need to extract it specifically from the page source when you have this.






              share|improve this answer

























                3












                3








                3







                This is all you need.



                page_json = driver.execute_script("return JSON.stringify(matchCentreData)")
                # Do what you want with the json.


                Worked for me just now. And if you want both this specifically AND the page html, then do this step, along with your page source grabbing logic. No need to extract it specifically from the page source when you have this.






                share|improve this answer













                This is all you need.



                page_json = driver.execute_script("return JSON.stringify(matchCentreData)")
                # Do what you want with the json.


                Worked for me just now. And if you want both this specifically AND the page html, then do this step, along with your page source grabbing logic. No need to extract it specifically from the page source when you have this.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Mar 22 at 15:25









                AsyranokAsyranok

                522212




                522212























                    0














                    You may have to do some string manipulation. Check out BeautifulSoup, i think it will return the entirety of the DOM, and you can do some string work to parse out your relevant data.



                    Edit: I see you're actually trying to extract just the json string from the entire DOM string. What substring/regex's have you tried?






                    share|improve this answer























                    • Yeah I think I'm going to have to go down the regex route - just seems a relatively long winded process when a json is just sitting there!

                      – bobman
                      Mar 22 at 15:16















                    0














                    You may have to do some string manipulation. Check out BeautifulSoup, i think it will return the entirety of the DOM, and you can do some string work to parse out your relevant data.



                    Edit: I see you're actually trying to extract just the json string from the entire DOM string. What substring/regex's have you tried?






                    share|improve this answer























                    • Yeah I think I'm going to have to go down the regex route - just seems a relatively long winded process when a json is just sitting there!

                      – bobman
                      Mar 22 at 15:16













                    0












                    0








                    0







                    You may have to do some string manipulation. Check out BeautifulSoup, i think it will return the entirety of the DOM, and you can do some string work to parse out your relevant data.



                    Edit: I see you're actually trying to extract just the json string from the entire DOM string. What substring/regex's have you tried?






                    share|improve this answer













                    You may have to do some string manipulation. Check out BeautifulSoup, i think it will return the entirety of the DOM, and you can do some string work to parse out your relevant data.



                    Edit: I see you're actually trying to extract just the json string from the entire DOM string. What substring/regex's have you tried?







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Mar 22 at 15:02









                    Doug ClarkDoug Clark

                    295318




                    295318












                    • Yeah I think I'm going to have to go down the regex route - just seems a relatively long winded process when a json is just sitting there!

                      – bobman
                      Mar 22 at 15:16

















                    • Yeah I think I'm going to have to go down the regex route - just seems a relatively long winded process when a json is just sitting there!

                      – bobman
                      Mar 22 at 15:16
















                    Yeah I think I'm going to have to go down the regex route - just seems a relatively long winded process when a json is just sitting there!

                    – bobman
                    Mar 22 at 15:16





                    Yeah I think I'm going to have to go down the regex route - just seems a relatively long winded process when a json is just sitting there!

                    – bobman
                    Mar 22 at 15:16

















                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55302348%2fextracting-a-json-out-of-the-main-body-of-page-source%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

                    Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

                    Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript