Getting full html back from a website request using PythonWeb-scraping JavaScript page with PythonHow can I parse a website using Selenium and Beautifulsoup in python?Getting a map() to return a list in Python 3.xPost JSON using Python RequestsPython Requests throwing SSLErrorHow do I disable log messages from the Requests library?Correct way to try/except using Python requests module?Download large file in python with requestsPython requests - print entire http request (raw)?Search on website with python and requestsCan't get full table from requests pythonHow to use request python 3 to login to website , then scrape data from that website

Array or vector? Two dimensional array or matrix?

Possibility to correct pitch from digital versions of records with the hole not centered

White's last move?

Why would "dead languages" be the only languages that spells could be written in?

Taking advantage when HR forgets to communicate the rules

Wouldn't putting an electronic key inside a small Faraday cage render it completely useless?

Does the Milky Way orbit around anything?

How can I use my cell phone's light as a reading light?

Implicit conversion between decimals with different precisions

Why does mean tend be more stable in different samples than median?

Attach a visible light telescope to the outside of the ISS

Passwordless authentication - how invalidate login code

Initializing variables in an "if" statement

Is it acceptable that I plot a time-series figure with years increasing from right to left?

Chilling juice in copper vessel

Is there a minimum amount of electricity that can be fed back into the grid?

What is the shape of the upper boundary of water hitting a screen?

Gory anime with pink haired girl escaping an asylum

Examples of fluid (including air) being used to transmit digital data?

Will Jimmy fall off his platform?

What is the fundamental difference between catching whales and hunting other animals?

What happens if the limit of 4 billion files was exceeded in an ext4 partition?

How predictable is $RANDOM really?

Why do most airliners have underwing engines, while business jets have rear-mounted engines?

Getting full html back from a website request using Python

Web-scraping JavaScript page with PythonHow can I parse a website using Selenium and Beautifulsoup in python?Getting a map() to return a list in Python 3.xPost JSON using Python RequestsPython Requests throwing SSLErrorHow do I disable log messages from the Requests library?Correct way to try/except using Python requests module?Download large file in python with requestsPython requests - print entire http request (raw)?Search on website with python and requestsCan't get full table from requests pythonHow to use request python 3 to login to website , then scrape data from that website

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I'm trying to send an http request to a website (for ex, Digikey) and read back the full html. For example, I'm using this link: https://www.digikey.com/products/en?keywords=part_number to get a part number such as: https://www.digikey.com/products/en?keywords=511-8002-KIT. However what I get back is not the full html.

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.digikey.com/products/en?keywords=511-8002-KIT')
soup = BeautifulSoup(r.text)
print(soup.prettify())

Output:

<!DOCTYPE html>
<html>
 <head>
 <script>
 var i10cdone =(function() function pingBeacon(msg) var i10cimg = document.createElement('script'); i10cimg.src='/i10c@p1/botox/file/nv-loaded.js?status='+window.encodeURIComponent(msg); i10cimg.onload = function() (document.head ; i10cimg.onerror = function() (document.head ; ( document.head ; pingBeacon('loaded'); if(String(document.cookie).indexOf('i10c.bdddb=c2-f0103ZLNqAeI3BH6yYOfG7TZlRtCrMwqUo')>=0) document.cookie = 'i10c.bdddb=;path=/';; var error=''; function errorHandler(e) if (e && e.error && e.error.stack ) error=e.error.stack; else if( e && e.message ) error = e.message; else error = 'unknown'; if(window.addEventListener) window.addEventListener('error',errorHandler, false); else if ( window.attachEvent ) window.attachEvent('onerror',errorHandler); return function() if (window.removeEventListener) window.removeEventListener('error',errorHandler); else if (window.detachEvent) window.detachEvent('onerror',errorHandler); if(error) pingBeacon('error-' + String(error).substring(0,500)); document.cookie='i10c.bdddb=c2-f0103ZLNqAeI3BH6yYOfG7TZlRtCrMwqUo;path=/'; ; )();
 </script>
 <script src="/i10c@p1/client/latest/auto/instart.js?i10c.nv.bucket=pci&amp;i10c.nv.host=www.digikey.com&amp;i10c.opts=botox&amp;bcb=1" type="text/javascript">
 </script>
 <script type="text/javascript">
 INSTART.Init("apiDomain":"assets.insnw.net","correlation_id":"1553546232:4907a9bdc85fe4e8","custName":"digikey","devJsExtraFlags":""disableQuerySelectorInterception" :true, 'rumDataConfigKey':'/instartlogic/clientdatacollector/getconfig/monitorprod.json','custName':'digikey','propName':'northamerica'","disableInjectionXhr":true,"disableInjectionXhrQueryParam":"instart_disable_injection","iframeCommunicationTimeout":3000,"nanovisorGlobalNameSpace":"I10C","partialImage":false,"propName":"northamerica","rId":"0","release":"latest","rum":false,"serveNanovisorSameDomain":true,"third_party":["IA://www.digikey.com/js/geotargeting.js"],"useIframeRpc":false,"useWrapper":false,"ver":"auto","virtualDomains":4,"virtualizeDomains":["^auth\.digikey\.com$","^authtest\.digikey\.com$","^blocked\.digikey\.com$","^dynatrace\.digikey\.com$","^search\.digikey\.com$","^www\.digikey\.ca$","^www\.digikey\.com$","^www\.digikey\.com\.mx$"]
);
 </script>
 <script>
 typeof i10cdone === 'function' && i10cdone();
 </script>
 </head>
 <body>
 <script>
 setTimeout(function()document.cookie="i10c.eac23=1";window.location.reload(true);,30);
 </script>
 </body>
</html>

The reason I need the full html is to search into it for specific keywords, such as do the terms "Lead free" or "Through hole" appear in the particular part number result. I'm not only doing this for Digikey, but also other sites.

Any help would be appreciated!

Thanks!

EDIT:

Thank you all for your suggestions/answers. More info here for others who're interested in this: Web-scraping JavaScript page with Python

edited Mar 26 at 5:29

Shanteshwar Inde

1,0432 gold badges11 silver badges19 bronze badges

asked Mar 25 at 20:45

ItM

472 silver badges10 bronze badges

2

This is because the website is rendered in javascript, which means you'll need a browser to retrieve all of the rendered script. Check out selenium

– C.Nivs
Mar 25 at 20:47

add a comment |

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.digikey.com/products/en?keywords=511-8002-KIT')
soup = BeautifulSoup(r.text)
print(soup.prettify())

Output:

<!DOCTYPE html>
<html>
 <head>
 <script>
 var i10cdone =(function() function pingBeacon(msg) var i10cimg = document.createElement('script'); i10cimg.src='/i10c@p1/botox/file/nv-loaded.js?status='+window.encodeURIComponent(msg); i10cimg.onload = function() (document.head ; i10cimg.onerror = function() (document.head ; ( document.head ; pingBeacon('loaded'); if(String(document.cookie).indexOf('i10c.bdddb=c2-f0103ZLNqAeI3BH6yYOfG7TZlRtCrMwqUo')>=0) document.cookie = 'i10c.bdddb=;path=/';; var error=''; function errorHandler(e) if (e && e.error && e.error.stack ) error=e.error.stack; else if( e && e.message ) error = e.message; else error = 'unknown'; if(window.addEventListener) window.addEventListener('error',errorHandler, false); else if ( window.attachEvent ) window.attachEvent('onerror',errorHandler); return function() if (window.removeEventListener) window.removeEventListener('error',errorHandler); else if (window.detachEvent) window.detachEvent('onerror',errorHandler); if(error) pingBeacon('error-' + String(error).substring(0,500)); document.cookie='i10c.bdddb=c2-f0103ZLNqAeI3BH6yYOfG7TZlRtCrMwqUo;path=/'; ; )();
 </script>
 <script src="/i10c@p1/client/latest/auto/instart.js?i10c.nv.bucket=pci&amp;i10c.nv.host=www.digikey.com&amp;i10c.opts=botox&amp;bcb=1" type="text/javascript">
 </script>
 <script type="text/javascript">
 INSTART.Init("apiDomain":"assets.insnw.net","correlation_id":"1553546232:4907a9bdc85fe4e8","custName":"digikey","devJsExtraFlags":""disableQuerySelectorInterception" :true, 'rumDataConfigKey':'/instartlogic/clientdatacollector/getconfig/monitorprod.json','custName':'digikey','propName':'northamerica'","disableInjectionXhr":true,"disableInjectionXhrQueryParam":"instart_disable_injection","iframeCommunicationTimeout":3000,"nanovisorGlobalNameSpace":"I10C","partialImage":false,"propName":"northamerica","rId":"0","release":"latest","rum":false,"serveNanovisorSameDomain":true,"third_party":["IA://www.digikey.com/js/geotargeting.js"],"useIframeRpc":false,"useWrapper":false,"ver":"auto","virtualDomains":4,"virtualizeDomains":["^auth\.digikey\.com$","^authtest\.digikey\.com$","^blocked\.digikey\.com$","^dynatrace\.digikey\.com$","^search\.digikey\.com$","^www\.digikey\.ca$","^www\.digikey\.com$","^www\.digikey\.com\.mx$"]
);
 </script>
 <script>
 typeof i10cdone === 'function' && i10cdone();
 </script>
 </head>
 <body>
 <script>
 setTimeout(function()document.cookie="i10c.eac23=1";window.location.reload(true);,30);
 </script>
 </body>
</html>

Any help would be appreciated!

Thanks!

EDIT:

Thank you all for your suggestions/answers. More info here for others who're interested in this: Web-scraping JavaScript page with Python

edited Mar 26 at 5:29

Shanteshwar Inde

1,0432 gold badges11 silver badges19 bronze badges

asked Mar 25 at 20:45

ItM

472 silver badges10 bronze badges

2

This is because the website is rendered in javascript, which means you'll need a browser to retrieve all of the rendered script. Check out selenium

– C.Nivs
Mar 25 at 20:47

add a comment |

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.digikey.com/products/en?keywords=511-8002-KIT')
soup = BeautifulSoup(r.text)
print(soup.prettify())

Output:

<!DOCTYPE html>
<html>
 <head>
 <script>
 var i10cdone =(function() function pingBeacon(msg) var i10cimg = document.createElement('script'); i10cimg.src='/i10c@p1/botox/file/nv-loaded.js?status='+window.encodeURIComponent(msg); i10cimg.onload = function() (document.head ; i10cimg.onerror = function() (document.head ; ( document.head ; pingBeacon('loaded'); if(String(document.cookie).indexOf('i10c.bdddb=c2-f0103ZLNqAeI3BH6yYOfG7TZlRtCrMwqUo')>=0) document.cookie = 'i10c.bdddb=;path=/';; var error=''; function errorHandler(e) if (e && e.error && e.error.stack ) error=e.error.stack; else if( e && e.message ) error = e.message; else error = 'unknown'; if(window.addEventListener) window.addEventListener('error',errorHandler, false); else if ( window.attachEvent ) window.attachEvent('onerror',errorHandler); return function() if (window.removeEventListener) window.removeEventListener('error',errorHandler); else if (window.detachEvent) window.detachEvent('onerror',errorHandler); if(error) pingBeacon('error-' + String(error).substring(0,500)); document.cookie='i10c.bdddb=c2-f0103ZLNqAeI3BH6yYOfG7TZlRtCrMwqUo;path=/'; ; )();
 </script>
 <script src="/i10c@p1/client/latest/auto/instart.js?i10c.nv.bucket=pci&amp;i10c.nv.host=www.digikey.com&amp;i10c.opts=botox&amp;bcb=1" type="text/javascript">
 </script>
 <script type="text/javascript">
 INSTART.Init("apiDomain":"assets.insnw.net","correlation_id":"1553546232:4907a9bdc85fe4e8","custName":"digikey","devJsExtraFlags":""disableQuerySelectorInterception" :true, 'rumDataConfigKey':'/instartlogic/clientdatacollector/getconfig/monitorprod.json','custName':'digikey','propName':'northamerica'","disableInjectionXhr":true,"disableInjectionXhrQueryParam":"instart_disable_injection","iframeCommunicationTimeout":3000,"nanovisorGlobalNameSpace":"I10C","partialImage":false,"propName":"northamerica","rId":"0","release":"latest","rum":false,"serveNanovisorSameDomain":true,"third_party":["IA://www.digikey.com/js/geotargeting.js"],"useIframeRpc":false,"useWrapper":false,"ver":"auto","virtualDomains":4,"virtualizeDomains":["^auth\.digikey\.com$","^authtest\.digikey\.com$","^blocked\.digikey\.com$","^dynatrace\.digikey\.com$","^search\.digikey\.com$","^www\.digikey\.ca$","^www\.digikey\.com$","^www\.digikey\.com\.mx$"]
);
 </script>
 <script>
 typeof i10cdone === 'function' && i10cdone();
 </script>
 </head>
 <body>
 <script>
 setTimeout(function()document.cookie="i10c.eac23=1";window.location.reload(true);,30);
 </script>
 </body>
</html>

Any help would be appreciated!

Thanks!

EDIT:

Thank you all for your suggestions/answers. More info here for others who're interested in this: Web-scraping JavaScript page with Python

edited Mar 26 at 5:29

Shanteshwar Inde

1,0432 gold badges11 silver badges19 bronze badges

asked Mar 25 at 20:45

ItM

472 silver badges10 bronze badges

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.digikey.com/products/en?keywords=511-8002-KIT')
soup = BeautifulSoup(r.text)
print(soup.prettify())

Output:

<!DOCTYPE html>
<html>
 <head>
 <script>
 var i10cdone =(function() function pingBeacon(msg) var i10cimg = document.createElement('script'); i10cimg.src='/i10c@p1/botox/file/nv-loaded.js?status='+window.encodeURIComponent(msg); i10cimg.onload = function() (document.head ; i10cimg.onerror = function() (document.head ; ( document.head ; pingBeacon('loaded'); if(String(document.cookie).indexOf('i10c.bdddb=c2-f0103ZLNqAeI3BH6yYOfG7TZlRtCrMwqUo')>=0) document.cookie = 'i10c.bdddb=;path=/';; var error=''; function errorHandler(e) if (e && e.error && e.error.stack ) error=e.error.stack; else if( e && e.message ) error = e.message; else error = 'unknown'; if(window.addEventListener) window.addEventListener('error',errorHandler, false); else if ( window.attachEvent ) window.attachEvent('onerror',errorHandler); return function() if (window.removeEventListener) window.removeEventListener('error',errorHandler); else if (window.detachEvent) window.detachEvent('onerror',errorHandler); if(error) pingBeacon('error-' + String(error).substring(0,500)); document.cookie='i10c.bdddb=c2-f0103ZLNqAeI3BH6yYOfG7TZlRtCrMwqUo;path=/'; ; )();
 </script>
 <script src="/i10c@p1/client/latest/auto/instart.js?i10c.nv.bucket=pci&amp;i10c.nv.host=www.digikey.com&amp;i10c.opts=botox&amp;bcb=1" type="text/javascript">
 </script>
 <script type="text/javascript">
 INSTART.Init("apiDomain":"assets.insnw.net","correlation_id":"1553546232:4907a9bdc85fe4e8","custName":"digikey","devJsExtraFlags":""disableQuerySelectorInterception" :true, 'rumDataConfigKey':'/instartlogic/clientdatacollector/getconfig/monitorprod.json','custName':'digikey','propName':'northamerica'","disableInjectionXhr":true,"disableInjectionXhrQueryParam":"instart_disable_injection","iframeCommunicationTimeout":3000,"nanovisorGlobalNameSpace":"I10C","partialImage":false,"propName":"northamerica","rId":"0","release":"latest","rum":false,"serveNanovisorSameDomain":true,"third_party":["IA://www.digikey.com/js/geotargeting.js"],"useIframeRpc":false,"useWrapper":false,"ver":"auto","virtualDomains":4,"virtualizeDomains":["^auth\.digikey\.com$","^authtest\.digikey\.com$","^blocked\.digikey\.com$","^dynatrace\.digikey\.com$","^search\.digikey\.com$","^www\.digikey\.ca$","^www\.digikey\.com$","^www\.digikey\.com\.mx$"]
);
 </script>
 <script>
 typeof i10cdone === 'function' && i10cdone();
 </script>
 </head>
 <body>
 <script>
 setTimeout(function()document.cookie="i10c.eac23=1";window.location.reload(true);,30);
 </script>
 </body>
</html>

Any help would be appreciated!

Thanks!

EDIT:

Thank you all for your suggestions/answers. More info here for others who're interested in this: Web-scraping JavaScript page with Python

python-3.x beautifulsoup python-requests

edited Mar 26 at 5:29

Shanteshwar Inde

1,0432 gold badges11 silver badges19 bronze badges

asked Mar 25 at 20:45

ItM

472 silver badges10 bronze badges

edited Mar 26 at 5:29

Shanteshwar Inde

1,0432 gold badges11 silver badges19 bronze badges

asked Mar 25 at 20:45

ItM

472 silver badges10 bronze badges

edited Mar 26 at 5:29

Shanteshwar Inde

1,0432 gold badges11 silver badges19 bronze badges

edited Mar 26 at 5:29

Shanteshwar Inde

1,0432 gold badges11 silver badges19 bronze badges

edited Mar 26 at 5:29

Shanteshwar Inde

1,0432 gold badges11 silver badges19 bronze badges

asked Mar 25 at 20:45

ItM

472 silver badges10 bronze badges

asked Mar 25 at 20:45

ItM

472 silver badges10 bronze badges

asked Mar 25 at 20:45

ItM

472 silver badges10 bronze badges

2

This is because the website is rendered in javascript, which means you'll need a browser to retrieve all of the rendered script. Check out selenium

– C.Nivs
Mar 25 at 20:47

add a comment |

2

This is because the website is rendered in javascript, which means you'll need a browser to retrieve all of the rendered script. Check out selenium

– C.Nivs
Mar 25 at 20:47

This is because the website is rendered in javascript, which means you'll need a browser to retrieve all of the rendered script. Check out selenium

– C.Nivs
Mar 25 at 20:47

add a comment |

2 Answers
2

active

oldest

votes

Most likely the parts of the page you are looking for includes content that is generated dynamically using Javascript.

Visit view-source:https://www.digikey.com/products/en?keywords=part_number on your browser and you will see requests is fetching the full html - it's just not executing the Javascript code.

If you right-click the and click inspect (Chrome), you will see the final DOM that is created after the javascript code is executed.

To get the rendered content, you would need to use a full web-driver like Selenium that is capable of executing the Javascript to render the full page.

Here an example of how to achieve that using Selenium:

How can I parse a website using Selenium and Beautifulsoup in python?

In [8]: from bs4 import BeautifulSoup
In [9]: from selenium import webdriver
In [10]: driver = webdriver.Firefox()
In [11]: driver.get('http://news.ycombinator.com')
In [12]: html = driver.page_source
In [13]: soup = BeautifulSoup(html)
In [14]: for tag in soup.find_all('title'):
 ....: print tag.text
 ....: 
 ....: 
Hacker News

edited Mar 25 at 20:55

answered Mar 25 at 20:48

gtalarico

1,3547 silver badges20 bronze badges

add a comment |

The issue would be because the javascript of the page does not have time to run and therefore populate the necessary HTML elements. One solution to this would be to implement a webdriver using selenium:

from selenium import webdriver
chrome = webdriver.Chrome()
chrome.get("https://www.digikey.com/products/en?keywords=511-8002-KIT")
source = chrome.page_source

Often this is a lot more inefficient since you have to fully wait for the page to load. One way to get around this would to be to look for various API's that the website provides to access the data you want directly, I would recommend doing some research into what those might be

Here are some of the potential APIs you can use to get the data directly

https://api-portal.digikey.com/product

edited Mar 25 at 21:05

answered Mar 25 at 20:56

NightShade

12110 bronze badges

Seems like the APIs have limits on how many searches you can do per day and Selenium is painfully slow for searching thousands of parts. Thanks though!

– ItM
Mar 26 at 0:18

Selenium isn’t necessarily the part that makes it “slow”, it’s the page running the script. Selenium will take for how ever long as it takes the page to render. If you need it fast as stated above, you need to get the data directly (ie. from the API) or just have to wait for the page to render.

– chitown88
Mar 26 at 7:08

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55346142%2fgetting-full-html-back-from-a-website-request-using-python%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Most likely the parts of the page you are looking for includes content that is generated dynamically using Javascript.

Visit view-source:https://www.digikey.com/products/en?keywords=part_number on your browser and you will see requests is fetching the full html - it's just not executing the Javascript code.

If you right-click the and click inspect (Chrome), you will see the final DOM that is created after the javascript code is executed.

To get the rendered content, you would need to use a full web-driver like Selenium that is capable of executing the Javascript to render the full page.

Here an example of how to achieve that using Selenium:

How can I parse a website using Selenium and Beautifulsoup in python?

In [8]: from bs4 import BeautifulSoup
In [9]: from selenium import webdriver
In [10]: driver = webdriver.Firefox()
In [11]: driver.get('http://news.ycombinator.com')
In [12]: html = driver.page_source
In [13]: soup = BeautifulSoup(html)
In [14]: for tag in soup.find_all('title'):
 ....: print tag.text
 ....: 
 ....: 
Hacker News

edited Mar 25 at 20:55

answered Mar 25 at 20:48

gtalarico

1,3547 silver badges20 bronze badges

add a comment |

Most likely the parts of the page you are looking for includes content that is generated dynamically using Javascript.

Visit view-source:https://www.digikey.com/products/en?keywords=part_number on your browser and you will see requests is fetching the full html - it's just not executing the Javascript code.

If you right-click the and click inspect (Chrome), you will see the final DOM that is created after the javascript code is executed.

To get the rendered content, you would need to use a full web-driver like Selenium that is capable of executing the Javascript to render the full page.

Here an example of how to achieve that using Selenium:

How can I parse a website using Selenium and Beautifulsoup in python?

In [8]: from bs4 import BeautifulSoup
In [9]: from selenium import webdriver
In [10]: driver = webdriver.Firefox()
In [11]: driver.get('http://news.ycombinator.com')
In [12]: html = driver.page_source
In [13]: soup = BeautifulSoup(html)
In [14]: for tag in soup.find_all('title'):
 ....: print tag.text
 ....: 
 ....: 
Hacker News

edited Mar 25 at 20:55

answered Mar 25 at 20:48

gtalarico

1,3547 silver badges20 bronze badges

add a comment |

Most likely the parts of the page you are looking for includes content that is generated dynamically using Javascript.

Visit view-source:https://www.digikey.com/products/en?keywords=part_number on your browser and you will see requests is fetching the full html - it's just not executing the Javascript code.

If you right-click the and click inspect (Chrome), you will see the final DOM that is created after the javascript code is executed.

To get the rendered content, you would need to use a full web-driver like Selenium that is capable of executing the Javascript to render the full page.

Here an example of how to achieve that using Selenium:

How can I parse a website using Selenium and Beautifulsoup in python?

In [8]: from bs4 import BeautifulSoup
In [9]: from selenium import webdriver
In [10]: driver = webdriver.Firefox()
In [11]: driver.get('http://news.ycombinator.com')
In [12]: html = driver.page_source
In [13]: soup = BeautifulSoup(html)
In [14]: for tag in soup.find_all('title'):
 ....: print tag.text
 ....: 
 ....: 
Hacker News

edited Mar 25 at 20:55

answered Mar 25 at 20:48

gtalarico

1,3547 silver badges20 bronze badges

Most likely the parts of the page you are looking for includes content that is generated dynamically using Javascript.

Visit view-source:https://www.digikey.com/products/en?keywords=part_number on your browser and you will see requests is fetching the full html - it's just not executing the Javascript code.

If you right-click the and click inspect (Chrome), you will see the final DOM that is created after the javascript code is executed.

To get the rendered content, you would need to use a full web-driver like Selenium that is capable of executing the Javascript to render the full page.

Here an example of how to achieve that using Selenium:

How can I parse a website using Selenium and Beautifulsoup in python?

In [8]: from bs4 import BeautifulSoup
In [9]: from selenium import webdriver
In [10]: driver = webdriver.Firefox()
In [11]: driver.get('http://news.ycombinator.com')
In [12]: html = driver.page_source
In [13]: soup = BeautifulSoup(html)
In [14]: for tag in soup.find_all('title'):
 ....: print tag.text
 ....: 
 ....: 
Hacker News

edited Mar 25 at 20:55

answered Mar 25 at 20:48

gtalarico

1,3547 silver badges20 bronze badges

edited Mar 25 at 20:55

answered Mar 25 at 20:48

gtalarico

1,3547 silver badges20 bronze badges

answered Mar 25 at 20:48

gtalarico

1,3547 silver badges20 bronze badges

answered Mar 25 at 20:48

gtalarico

1,3547 silver badges20 bronze badges

add a comment |

from selenium import webdriver
chrome = webdriver.Chrome()
chrome.get("https://www.digikey.com/products/en?keywords=511-8002-KIT")
source = chrome.page_source

Here are some of the potential APIs you can use to get the data directly

https://api-portal.digikey.com/product

edited Mar 25 at 21:05

answered Mar 25 at 20:56

NightShade

12110 bronze badges

Seems like the APIs have limits on how many searches you can do per day and Selenium is painfully slow for searching thousands of parts. Thanks though!

– ItM
Mar 26 at 0:18

Selenium isn’t necessarily the part that makes it “slow”, it’s the page running the script. Selenium will take for how ever long as it takes the page to render. If you need it fast as stated above, you need to get the data directly (ie. from the API) or just have to wait for the page to render.

– chitown88
Mar 26 at 7:08

add a comment |

from selenium import webdriver
chrome = webdriver.Chrome()
chrome.get("https://www.digikey.com/products/en?keywords=511-8002-KIT")
source = chrome.page_source

Here are some of the potential APIs you can use to get the data directly

https://api-portal.digikey.com/product

edited Mar 25 at 21:05

answered Mar 25 at 20:56

NightShade

12110 bronze badges

Seems like the APIs have limits on how many searches you can do per day and Selenium is painfully slow for searching thousands of parts. Thanks though!

– ItM
Mar 26 at 0:18

Selenium isn’t necessarily the part that makes it “slow”, it’s the page running the script. Selenium will take for how ever long as it takes the page to render. If you need it fast as stated above, you need to get the data directly (ie. from the API) or just have to wait for the page to render.

– chitown88
Mar 26 at 7:08

add a comment |

from selenium import webdriver
chrome = webdriver.Chrome()
chrome.get("https://www.digikey.com/products/en?keywords=511-8002-KIT")
source = chrome.page_source

Here are some of the potential APIs you can use to get the data directly

https://api-portal.digikey.com/product

edited Mar 25 at 21:05

answered Mar 25 at 20:56

NightShade

12110 bronze badges

from selenium import webdriver
chrome = webdriver.Chrome()
chrome.get("https://www.digikey.com/products/en?keywords=511-8002-KIT")
source = chrome.page_source

Here are some of the potential APIs you can use to get the data directly

https://api-portal.digikey.com/product

edited Mar 25 at 21:05

answered Mar 25 at 20:56

NightShade

12110 bronze badges

edited Mar 25 at 21:05

answered Mar 25 at 20:56

NightShade

12110 bronze badges

answered Mar 25 at 20:56

NightShade

12110 bronze badges

answered Mar 25 at 20:56

NightShade

12110 bronze badges

Seems like the APIs have limits on how many searches you can do per day and Selenium is painfully slow for searching thousands of parts. Thanks though!

– ItM
Mar 26 at 0:18

Selenium isn’t necessarily the part that makes it “slow”, it’s the page running the script. Selenium will take for how ever long as it takes the page to render. If you need it fast as stated above, you need to get the data directly (ie. from the API) or just have to wait for the page to render.

– chitown88
Mar 26 at 7:08

add a comment |

Seems like the APIs have limits on how many searches you can do per day and Selenium is painfully slow for searching thousands of parts. Thanks though!

– ItM
Mar 26 at 0:18

Selenium isn’t necessarily the part that makes it “slow”, it’s the page running the script. Selenium will take for how ever long as it takes the page to render. If you need it fast as stated above, you need to get the data directly (ie. from the API) or just have to wait for the page to render.

– chitown88
Mar 26 at 7:08

Seems like the APIs have limits on how many searches you can do per day and Selenium is painfully slow for searching thousands of parts. Thanks though!

– ItM
Mar 26 at 0:18

Selenium isn’t necessarily the part that makes it “slow”, it’s the page running the script. Selenium will take for how ever long as it takes the page to render. If you need it fast as stated above, you need to get the data directly (ie. from the API) or just have to wait for the page to render.

– chitown88
Mar 26 at 7:08

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현

2 Answers
2

2 Answers
2

2 Answers
2