Extracting a json out of the main body of page source Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30 pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Parsing this deeply nested JSON object within a webpage source codeMaximum Python JSON object length in Windows?How to extract the main text body of a url, discarding all irrelevant dataHow to scrape data from Datastore.prime using vbaBeautiflsoup Create Soup With A Snippet of Page SourcePython Selenium Page SourceScraping dynamic data with Selenium & Python 'Unable to locate element'Selenium- page source not changing after performing click()Selenium is unable to extract page source and returning empty body of html pageSelenium driver.get gives different page source than actual source in browser
What's the difference between using dependency injection with a container and using a service locator?
Was there ever a LEGO store in Miami International Airport?
How to begin with a paragraph in latex
Why would the Overseers waste their stock of slaves on the Game?
Why I cannot instantiate a class whose constructor is private in a friend class?
Will I lose my paid in full property
Determinant of a matrix with 2 equal rows
Where to find documentation for `whois` command options?
Is there an efficient way for synchronising audio events real-time with LEDs using an MCU?
Will temporary Dex penalties prevent you from getting the benefits of the "Two Weapon Fighting" feat if your Dex score falls below the prerequisite?
Coin Game with infinite paradox
Does Prince Arnaud cause someone holding the Princess to lose?
Why did Europeans not widely domesticate foxes?
Raising a bilingual kid. When should we introduce the majority language?
Processing ADC conversion result: DMA vs Processor Registers
What is a 'Key' in computer science?
Is it accepted to use working hours to read general interest books?
Why does Java have support for time zone offsets with seconds precision?
Are these square matrices always diagonalisable?
Philosophers who were composers?
France's Public Holidays' Puzzle
What were wait-states, and why was it only an issue for PCs?
My admission is revoked after accepting the admission offer
What was Apollo 13's "Little Jolt" after MECO?
Extracting a json out of the main body of page source
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30 pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Parsing this deeply nested JSON object within a webpage source codeMaximum Python JSON object length in Windows?How to extract the main text body of a url, discarding all irrelevant dataHow to scrape data from Datastore.prime using vbaBeautiflsoup Create Soup With A Snippet of Page SourcePython Selenium Page SourceScraping dynamic data with Selenium & Python 'Unable to locate element'Selenium- page source not changing after performing click()Selenium is unable to extract page source and returning empty body of html pageSelenium driver.get gives different page source than actual source in browser
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I am trying to scrape the data off the webpage below, using Selenium in Python 3:
https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield
If this URL is viewed as the page source (for Chrome users: view-source: https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield), there are some JSONs within the text. My aim is to scrape the first, and quite substantial JSON, which sits in the 'var matchCentreData' bit. A snippet is featured below:
<script type="text/javascript">
var matchCentreData = {"playerIdNameDictionary":{"14244":"Pablo Zabaleta",
"89998":"Manuel Lanzini","34693":"Marko Arnautovic","93026":"Felipe Anderson",
"300359":"Issa Diop","122980"
I am able to scrape the entirety of the page source, however, I am struggling to extract only the JSON above. Any help would be much appreciated!
python selenium web-scraping
add a comment |
I am trying to scrape the data off the webpage below, using Selenium in Python 3:
https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield
If this URL is viewed as the page source (for Chrome users: view-source: https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield), there are some JSONs within the text. My aim is to scrape the first, and quite substantial JSON, which sits in the 'var matchCentreData' bit. A snippet is featured below:
<script type="text/javascript">
var matchCentreData = {"playerIdNameDictionary":{"14244":"Pablo Zabaleta",
"89998":"Manuel Lanzini","34693":"Marko Arnautovic","93026":"Felipe Anderson",
"300359":"Issa Diop","122980"
I am able to scrape the entirety of the page source, however, I am struggling to extract only the JSON above. Any help would be much appreciated!
python selenium web-scraping
add a comment |
I am trying to scrape the data off the webpage below, using Selenium in Python 3:
https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield
If this URL is viewed as the page source (for Chrome users: view-source: https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield), there are some JSONs within the text. My aim is to scrape the first, and quite substantial JSON, which sits in the 'var matchCentreData' bit. A snippet is featured below:
<script type="text/javascript">
var matchCentreData = {"playerIdNameDictionary":{"14244":"Pablo Zabaleta",
"89998":"Manuel Lanzini","34693":"Marko Arnautovic","93026":"Felipe Anderson",
"300359":"Issa Diop","122980"
I am able to scrape the entirety of the page source, however, I am struggling to extract only the JSON above. Any help would be much appreciated!
python selenium web-scraping
I am trying to scrape the data off the webpage below, using Selenium in Python 3:
https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield
If this URL is viewed as the page source (for Chrome users: view-source: https://www.whoscored.com/Matches/1285051/Live/England-Premier-League-2018-2019-West-Ham-Huddersfield), there are some JSONs within the text. My aim is to scrape the first, and quite substantial JSON, which sits in the 'var matchCentreData' bit. A snippet is featured below:
<script type="text/javascript">
var matchCentreData = {"playerIdNameDictionary":{"14244":"Pablo Zabaleta",
"89998":"Manuel Lanzini","34693":"Marko Arnautovic","93026":"Felipe Anderson",
"300359":"Issa Diop","122980"
I am able to scrape the entirety of the page source, however, I am struggling to extract only the JSON above. Any help would be much appreciated!
python selenium web-scraping
python selenium web-scraping
edited Mar 22 at 15:00
Ivan Kaloyanov
1,25331019
1,25331019
asked Mar 22 at 14:55
bobmanbobman
368
368
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
This is all you need.
page_json = driver.execute_script("return JSON.stringify(matchCentreData)")
# Do what you want with the json.
Worked for me just now. And if you want both this specifically AND the page html, then do this step, along with your page source grabbing logic. No need to extract it specifically from the page source when you have this.
add a comment |
You may have to do some string manipulation. Check out BeautifulSoup, i think it will return the entirety of the DOM, and you can do some string work to parse out your relevant data.
Edit: I see you're actually trying to extract just the json string from the entire DOM string. What substring/regex's have you tried?
Yeah I think I'm going to have to go down the regex route - just seems a relatively long winded process when a json is just sitting there!
– bobman
Mar 22 at 15:16
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55302348%2fextracting-a-json-out-of-the-main-body-of-page-source%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
This is all you need.
page_json = driver.execute_script("return JSON.stringify(matchCentreData)")
# Do what you want with the json.
Worked for me just now. And if you want both this specifically AND the page html, then do this step, along with your page source grabbing logic. No need to extract it specifically from the page source when you have this.
add a comment |
This is all you need.
page_json = driver.execute_script("return JSON.stringify(matchCentreData)")
# Do what you want with the json.
Worked for me just now. And if you want both this specifically AND the page html, then do this step, along with your page source grabbing logic. No need to extract it specifically from the page source when you have this.
add a comment |
This is all you need.
page_json = driver.execute_script("return JSON.stringify(matchCentreData)")
# Do what you want with the json.
Worked for me just now. And if you want both this specifically AND the page html, then do this step, along with your page source grabbing logic. No need to extract it specifically from the page source when you have this.
This is all you need.
page_json = driver.execute_script("return JSON.stringify(matchCentreData)")
# Do what you want with the json.
Worked for me just now. And if you want both this specifically AND the page html, then do this step, along with your page source grabbing logic. No need to extract it specifically from the page source when you have this.
answered Mar 22 at 15:25
AsyranokAsyranok
522212
522212
add a comment |
add a comment |
You may have to do some string manipulation. Check out BeautifulSoup, i think it will return the entirety of the DOM, and you can do some string work to parse out your relevant data.
Edit: I see you're actually trying to extract just the json string from the entire DOM string. What substring/regex's have you tried?
Yeah I think I'm going to have to go down the regex route - just seems a relatively long winded process when a json is just sitting there!
– bobman
Mar 22 at 15:16
add a comment |
You may have to do some string manipulation. Check out BeautifulSoup, i think it will return the entirety of the DOM, and you can do some string work to parse out your relevant data.
Edit: I see you're actually trying to extract just the json string from the entire DOM string. What substring/regex's have you tried?
Yeah I think I'm going to have to go down the regex route - just seems a relatively long winded process when a json is just sitting there!
– bobman
Mar 22 at 15:16
add a comment |
You may have to do some string manipulation. Check out BeautifulSoup, i think it will return the entirety of the DOM, and you can do some string work to parse out your relevant data.
Edit: I see you're actually trying to extract just the json string from the entire DOM string. What substring/regex's have you tried?
You may have to do some string manipulation. Check out BeautifulSoup, i think it will return the entirety of the DOM, and you can do some string work to parse out your relevant data.
Edit: I see you're actually trying to extract just the json string from the entire DOM string. What substring/regex's have you tried?
answered Mar 22 at 15:02
Doug ClarkDoug Clark
295318
295318
Yeah I think I'm going to have to go down the regex route - just seems a relatively long winded process when a json is just sitting there!
– bobman
Mar 22 at 15:16
add a comment |
Yeah I think I'm going to have to go down the regex route - just seems a relatively long winded process when a json is just sitting there!
– bobman
Mar 22 at 15:16
Yeah I think I'm going to have to go down the regex route - just seems a relatively long winded process when a json is just sitting there!
– bobman
Mar 22 at 15:16
Yeah I think I'm going to have to go down the regex route - just seems a relatively long winded process when a json is just sitting there!
– bobman
Mar 22 at 15:16
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55302348%2fextracting-a-json-out-of-the-main-body-of-page-source%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown