soup.findAll returning empty listSelenium using Python - Geckodriver executable needs to be in PATHfindAll returning empty for htmlBeautifulSoup find_all() returns no dataHow do I check if a list is empty?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?Getting the last element of a listHow to make a flat list out of list of listsHow do I get the number of elements in a list?How do I concatenate two lists in Python?How to clone or copy a list?How do I list all files of a directory?How to read a file line-by-line into a list?
Isometric Heyacrazy - Now In 3D!
Unexpected behavior after assignment of function object to function wrapper
Where could I find a math pen pal?
Do universities maintain secret textbooks?
Why are JWST optics not enclosed like HST?
How were US credit cards verified in-store in the 1980's?
What's the difference between a variable and a memory location?
Match blank lines before a word awk
Find the logic in first 2 statements to give the answer for the third statement
Does the telecom provider need physical access to the SIM card to clone it?
What checks exist against overuse of presidential pardons in the USA?
Is "prohibition against," a double negative?
Are sweatpants frowned upon on flights?
Why is there no Disney logo in MCU movies?
Was a six-engine 747 ever seriously considered by Boeing?
Can UV radiation be safe for the skin?
In what language did Túrin converse with Mím?
How can I improve my formal definitions
Strange behavior of std::initializer_list of std::strings
Coupling two 15 Amp circuit breaker for 20 Amp
Printing a list as "a, b, c." using Python
Rapid change in character
How do I get my neighbour to stop disturbing with loud music?
How to differentiate between two people with the same name in a story?
soup.findAll returning empty list
Selenium using Python - Geckodriver executable needs to be in PATHfindAll returning empty for htmlBeautifulSoup find_all() returns no dataHow do I check if a list is empty?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?Getting the last element of a listHow to make a flat list out of list of listsHow do I get the number of elements in a list?How do I concatenate two lists in Python?How to clone or copy a list?How do I list all files of a directory?How to read a file line-by-line into a list?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I am trying to scrape with soup and am obtaining an empty set when I call findAll
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url='https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/SearchDisplayView?catalogId=10123&langId=44&storeId=10151&krypto=70KutR16JmLgr7Ka%2F385RFXrzDpOkSqx%2FRC3DnlU09%2BYcw0pR5cfIfC0kOlQywiD%2BTEe7ppq8ENXglbpqA8sDUtif1h3ZjrEoQkV29%2B90iqljHi2gm2T%2BDZHH2%2FCNeKB%2BkVglbz%2BNx1bKsSfE5L6SVtckHxg%2FM%2F%2FVieWp8vgaJTan0k1WrPjCrVuDs5WnbRN#langId=44&storeId=10151&catalogId=10123&categoryId=&parent_category_rn=&top_category=&pageSize=60&orderBy=RELEVANCE&searchTerm=milk&beginIndex=0&hideFilters=true&categoryFacetId1='
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html,'html.parser')
containers = page_soup.findAll("div","class":"product")
containers
I also got empty datasets from these articles:
findAll returning empty for html
and BeautifulSoup find_all() returns no data
Can anyone offer any help?
python beautifulsoup urllib findall
add a comment |
I am trying to scrape with soup and am obtaining an empty set when I call findAll
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url='https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/SearchDisplayView?catalogId=10123&langId=44&storeId=10151&krypto=70KutR16JmLgr7Ka%2F385RFXrzDpOkSqx%2FRC3DnlU09%2BYcw0pR5cfIfC0kOlQywiD%2BTEe7ppq8ENXglbpqA8sDUtif1h3ZjrEoQkV29%2B90iqljHi2gm2T%2BDZHH2%2FCNeKB%2BkVglbz%2BNx1bKsSfE5L6SVtckHxg%2FM%2F%2FVieWp8vgaJTan0k1WrPjCrVuDs5WnbRN#langId=44&storeId=10151&catalogId=10123&categoryId=&parent_category_rn=&top_category=&pageSize=60&orderBy=RELEVANCE&searchTerm=milk&beginIndex=0&hideFilters=true&categoryFacetId1='
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html,'html.parser')
containers = page_soup.findAll("div","class":"product")
containers
I also got empty datasets from these articles:
findAll returning empty for html
and BeautifulSoup find_all() returns no data
Can anyone offer any help?
python beautifulsoup urllib findall
1
I think you just got unlucky. Look at the page source. You'll notice for "product" there is a rogue space after the name:class="product ", which means you are referencing a class that doesn't exist. If you do Ctr+F forclass="product", you'll find 0 results, but forclass="product ", you'll find 54.
– Recessive
Mar 27 at 23:17
1
Please don't post pictures of code. Use the snippet tool via edit to include html and for python code, insert, select code and press Ctrl + K.
– QHarr
Mar 28 at 3:02
noted. Removed pictures of code
– alex
Mar 28 at 9:33
add a comment |
I am trying to scrape with soup and am obtaining an empty set when I call findAll
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url='https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/SearchDisplayView?catalogId=10123&langId=44&storeId=10151&krypto=70KutR16JmLgr7Ka%2F385RFXrzDpOkSqx%2FRC3DnlU09%2BYcw0pR5cfIfC0kOlQywiD%2BTEe7ppq8ENXglbpqA8sDUtif1h3ZjrEoQkV29%2B90iqljHi2gm2T%2BDZHH2%2FCNeKB%2BkVglbz%2BNx1bKsSfE5L6SVtckHxg%2FM%2F%2FVieWp8vgaJTan0k1WrPjCrVuDs5WnbRN#langId=44&storeId=10151&catalogId=10123&categoryId=&parent_category_rn=&top_category=&pageSize=60&orderBy=RELEVANCE&searchTerm=milk&beginIndex=0&hideFilters=true&categoryFacetId1='
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html,'html.parser')
containers = page_soup.findAll("div","class":"product")
containers
I also got empty datasets from these articles:
findAll returning empty for html
and BeautifulSoup find_all() returns no data
Can anyone offer any help?
python beautifulsoup urllib findall
I am trying to scrape with soup and am obtaining an empty set when I call findAll
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url='https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/SearchDisplayView?catalogId=10123&langId=44&storeId=10151&krypto=70KutR16JmLgr7Ka%2F385RFXrzDpOkSqx%2FRC3DnlU09%2BYcw0pR5cfIfC0kOlQywiD%2BTEe7ppq8ENXglbpqA8sDUtif1h3ZjrEoQkV29%2B90iqljHi2gm2T%2BDZHH2%2FCNeKB%2BkVglbz%2BNx1bKsSfE5L6SVtckHxg%2FM%2F%2FVieWp8vgaJTan0k1WrPjCrVuDs5WnbRN#langId=44&storeId=10151&catalogId=10123&categoryId=&parent_category_rn=&top_category=&pageSize=60&orderBy=RELEVANCE&searchTerm=milk&beginIndex=0&hideFilters=true&categoryFacetId1='
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html,'html.parser')
containers = page_soup.findAll("div","class":"product")
containers
I also got empty datasets from these articles:
findAll returning empty for html
and BeautifulSoup find_all() returns no data
Can anyone offer any help?
python beautifulsoup urllib findall
python beautifulsoup urllib findall
edited Mar 28 at 9:32
alex
asked Mar 27 at 22:55
alexalex
6451 gold badge8 silver badges22 bronze badges
6451 gold badge8 silver badges22 bronze badges
1
I think you just got unlucky. Look at the page source. You'll notice for "product" there is a rogue space after the name:class="product ", which means you are referencing a class that doesn't exist. If you do Ctr+F forclass="product", you'll find 0 results, but forclass="product ", you'll find 54.
– Recessive
Mar 27 at 23:17
1
Please don't post pictures of code. Use the snippet tool via edit to include html and for python code, insert, select code and press Ctrl + K.
– QHarr
Mar 28 at 3:02
noted. Removed pictures of code
– alex
Mar 28 at 9:33
add a comment |
1
I think you just got unlucky. Look at the page source. You'll notice for "product" there is a rogue space after the name:class="product ", which means you are referencing a class that doesn't exist. If you do Ctr+F forclass="product", you'll find 0 results, but forclass="product ", you'll find 54.
– Recessive
Mar 27 at 23:17
1
Please don't post pictures of code. Use the snippet tool via edit to include html and for python code, insert, select code and press Ctrl + K.
– QHarr
Mar 28 at 3:02
noted. Removed pictures of code
– alex
Mar 28 at 9:33
1
1
I think you just got unlucky. Look at the page source. You'll notice for "product" there is a rogue space after the name:
class="product ", which means you are referencing a class that doesn't exist. If you do Ctr+F for class="product", you'll find 0 results, but for class="product ", you'll find 54.– Recessive
Mar 27 at 23:17
I think you just got unlucky. Look at the page source. You'll notice for "product" there is a rogue space after the name:
class="product ", which means you are referencing a class that doesn't exist. If you do Ctr+F for class="product", you'll find 0 results, but for class="product ", you'll find 54.– Recessive
Mar 27 at 23:17
1
1
Please don't post pictures of code. Use the snippet tool via edit to include html and for python code, insert, select code and press Ctrl + K.
– QHarr
Mar 28 at 3:02
Please don't post pictures of code. Use the snippet tool via edit to include html and for python code, insert, select code and press Ctrl + K.
– QHarr
Mar 28 at 3:02
noted. Removed pictures of code
– alex
Mar 28 at 9:33
noted. Removed pictures of code
– alex
Mar 28 at 9:33
add a comment |
1 Answer
1
active
oldest
votes
The page content is loaded with javascript, so you can't just use BeautifulSoup to parse it. You have to use another module like selenium to simulate javacript execution.
Here is an exemple:
from bs4 import BeautifulSoup as soup
from selenium import webdriver
url='https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/SearchDisplayView?catalogId=10123&langId=44&storeId=10151&krypto=70KutR16JmLgr7Ka%2F385RFXrzDpOkSqx%2FRC3DnlU09%2BYcw0pR5cfIfC0kOlQywiD%2BTEe7ppq8ENXglbpqA8sDUtif1h3ZjrEoQkV29%2B90iqljHi2gm2T%2BDZHH2%2FCNeKB%2BkVglbz%2BNx1bKsSfE5L6SVtckHxg%2FM%2F%2FVieWp8vgaJTan0k1WrPjCrVuDs5WnbRN#langId=44&storeId=10151&catalogId=10123&categoryId=&parent_category_rn=&top_category=&pageSize=60&orderBy=RELEVANCE&searchTerm=milk&beginIndex=0&hideFilters=true&categoryFacetId1='
driver = webdriver.Firefox()
driver.get(url)
page = driver.page_source
page_soup = soup(page,'html.parser')
containers = page_soup.findAll("div","class":"product")
print(containers)
print(len(containers))
OUTPUT:
[
<div class="product "> ...
...,
<div class="product hl-product hookLogic highlighted straplineRow" ...
]
64
Unfortunately, I am having issues installing selenium: WebDriverException: Message: 'geckodriver' executable needs to be in PATH. Hoping once I solve that, I can accept your answer
– alex
Mar 28 at 23:29
You can check here for this: stackoverflow.com/questions/40208051/…
– Maaz
Mar 29 at 7:45
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55387710%2fsoup-findall-returning-empty-list%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The page content is loaded with javascript, so you can't just use BeautifulSoup to parse it. You have to use another module like selenium to simulate javacript execution.
Here is an exemple:
from bs4 import BeautifulSoup as soup
from selenium import webdriver
url='https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/SearchDisplayView?catalogId=10123&langId=44&storeId=10151&krypto=70KutR16JmLgr7Ka%2F385RFXrzDpOkSqx%2FRC3DnlU09%2BYcw0pR5cfIfC0kOlQywiD%2BTEe7ppq8ENXglbpqA8sDUtif1h3ZjrEoQkV29%2B90iqljHi2gm2T%2BDZHH2%2FCNeKB%2BkVglbz%2BNx1bKsSfE5L6SVtckHxg%2FM%2F%2FVieWp8vgaJTan0k1WrPjCrVuDs5WnbRN#langId=44&storeId=10151&catalogId=10123&categoryId=&parent_category_rn=&top_category=&pageSize=60&orderBy=RELEVANCE&searchTerm=milk&beginIndex=0&hideFilters=true&categoryFacetId1='
driver = webdriver.Firefox()
driver.get(url)
page = driver.page_source
page_soup = soup(page,'html.parser')
containers = page_soup.findAll("div","class":"product")
print(containers)
print(len(containers))
OUTPUT:
[
<div class="product "> ...
...,
<div class="product hl-product hookLogic highlighted straplineRow" ...
]
64
Unfortunately, I am having issues installing selenium: WebDriverException: Message: 'geckodriver' executable needs to be in PATH. Hoping once I solve that, I can accept your answer
– alex
Mar 28 at 23:29
You can check here for this: stackoverflow.com/questions/40208051/…
– Maaz
Mar 29 at 7:45
add a comment |
The page content is loaded with javascript, so you can't just use BeautifulSoup to parse it. You have to use another module like selenium to simulate javacript execution.
Here is an exemple:
from bs4 import BeautifulSoup as soup
from selenium import webdriver
url='https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/SearchDisplayView?catalogId=10123&langId=44&storeId=10151&krypto=70KutR16JmLgr7Ka%2F385RFXrzDpOkSqx%2FRC3DnlU09%2BYcw0pR5cfIfC0kOlQywiD%2BTEe7ppq8ENXglbpqA8sDUtif1h3ZjrEoQkV29%2B90iqljHi2gm2T%2BDZHH2%2FCNeKB%2BkVglbz%2BNx1bKsSfE5L6SVtckHxg%2FM%2F%2FVieWp8vgaJTan0k1WrPjCrVuDs5WnbRN#langId=44&storeId=10151&catalogId=10123&categoryId=&parent_category_rn=&top_category=&pageSize=60&orderBy=RELEVANCE&searchTerm=milk&beginIndex=0&hideFilters=true&categoryFacetId1='
driver = webdriver.Firefox()
driver.get(url)
page = driver.page_source
page_soup = soup(page,'html.parser')
containers = page_soup.findAll("div","class":"product")
print(containers)
print(len(containers))
OUTPUT:
[
<div class="product "> ...
...,
<div class="product hl-product hookLogic highlighted straplineRow" ...
]
64
Unfortunately, I am having issues installing selenium: WebDriverException: Message: 'geckodriver' executable needs to be in PATH. Hoping once I solve that, I can accept your answer
– alex
Mar 28 at 23:29
You can check here for this: stackoverflow.com/questions/40208051/…
– Maaz
Mar 29 at 7:45
add a comment |
The page content is loaded with javascript, so you can't just use BeautifulSoup to parse it. You have to use another module like selenium to simulate javacript execution.
Here is an exemple:
from bs4 import BeautifulSoup as soup
from selenium import webdriver
url='https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/SearchDisplayView?catalogId=10123&langId=44&storeId=10151&krypto=70KutR16JmLgr7Ka%2F385RFXrzDpOkSqx%2FRC3DnlU09%2BYcw0pR5cfIfC0kOlQywiD%2BTEe7ppq8ENXglbpqA8sDUtif1h3ZjrEoQkV29%2B90iqljHi2gm2T%2BDZHH2%2FCNeKB%2BkVglbz%2BNx1bKsSfE5L6SVtckHxg%2FM%2F%2FVieWp8vgaJTan0k1WrPjCrVuDs5WnbRN#langId=44&storeId=10151&catalogId=10123&categoryId=&parent_category_rn=&top_category=&pageSize=60&orderBy=RELEVANCE&searchTerm=milk&beginIndex=0&hideFilters=true&categoryFacetId1='
driver = webdriver.Firefox()
driver.get(url)
page = driver.page_source
page_soup = soup(page,'html.parser')
containers = page_soup.findAll("div","class":"product")
print(containers)
print(len(containers))
OUTPUT:
[
<div class="product "> ...
...,
<div class="product hl-product hookLogic highlighted straplineRow" ...
]
64
The page content is loaded with javascript, so you can't just use BeautifulSoup to parse it. You have to use another module like selenium to simulate javacript execution.
Here is an exemple:
from bs4 import BeautifulSoup as soup
from selenium import webdriver
url='https://www.sainsburys.co.uk/webapp/wcs/stores/servlet/SearchDisplayView?catalogId=10123&langId=44&storeId=10151&krypto=70KutR16JmLgr7Ka%2F385RFXrzDpOkSqx%2FRC3DnlU09%2BYcw0pR5cfIfC0kOlQywiD%2BTEe7ppq8ENXglbpqA8sDUtif1h3ZjrEoQkV29%2B90iqljHi2gm2T%2BDZHH2%2FCNeKB%2BkVglbz%2BNx1bKsSfE5L6SVtckHxg%2FM%2F%2FVieWp8vgaJTan0k1WrPjCrVuDs5WnbRN#langId=44&storeId=10151&catalogId=10123&categoryId=&parent_category_rn=&top_category=&pageSize=60&orderBy=RELEVANCE&searchTerm=milk&beginIndex=0&hideFilters=true&categoryFacetId1='
driver = webdriver.Firefox()
driver.get(url)
page = driver.page_source
page_soup = soup(page,'html.parser')
containers = page_soup.findAll("div","class":"product")
print(containers)
print(len(containers))
OUTPUT:
[
<div class="product "> ...
...,
<div class="product hl-product hookLogic highlighted straplineRow" ...
]
64
answered Mar 28 at 8:40
MaazMaaz
1,4111 gold badge8 silver badges15 bronze badges
1,4111 gold badge8 silver badges15 bronze badges
Unfortunately, I am having issues installing selenium: WebDriverException: Message: 'geckodriver' executable needs to be in PATH. Hoping once I solve that, I can accept your answer
– alex
Mar 28 at 23:29
You can check here for this: stackoverflow.com/questions/40208051/…
– Maaz
Mar 29 at 7:45
add a comment |
Unfortunately, I am having issues installing selenium: WebDriverException: Message: 'geckodriver' executable needs to be in PATH. Hoping once I solve that, I can accept your answer
– alex
Mar 28 at 23:29
You can check here for this: stackoverflow.com/questions/40208051/…
– Maaz
Mar 29 at 7:45
Unfortunately, I am having issues installing selenium: WebDriverException: Message: 'geckodriver' executable needs to be in PATH. Hoping once I solve that, I can accept your answer
– alex
Mar 28 at 23:29
Unfortunately, I am having issues installing selenium: WebDriverException: Message: 'geckodriver' executable needs to be in PATH. Hoping once I solve that, I can accept your answer
– alex
Mar 28 at 23:29
You can check here for this: stackoverflow.com/questions/40208051/…
– Maaz
Mar 29 at 7:45
You can check here for this: stackoverflow.com/questions/40208051/…
– Maaz
Mar 29 at 7:45
add a comment |
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55387710%2fsoup-findall-returning-empty-list%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
I think you just got unlucky. Look at the page source. You'll notice for "product" there is a rogue space after the name:
class="product ", which means you are referencing a class that doesn't exist. If you do Ctr+F forclass="product", you'll find 0 results, but forclass="product ", you'll find 54.– Recessive
Mar 27 at 23:17
1
Please don't post pictures of code. Use the snippet tool via edit to include html and for python code, insert, select code and press Ctrl + K.
– QHarr
Mar 28 at 3:02
noted. Removed pictures of code
– alex
Mar 28 at 9:33