Screen scraping using Beautiful soupOptions for HTML scraping?Make a div fill the height of the remaining screen spaceUsing python re.compile with beautiful soup to match a stringBeautifulsoup can't find textScraping Indeed with Beautiful SoupScraping a table with beautiful soupUsing Beautiful Soup how can I return this value and use it as an integer?Scraping Table using Python and SeleniumI am getting text error while the code is appicable for on company using python beautifulsoupComments are visible on the webpage, but the html object returned by BeautifulSoup did not contain the comment parts
Span command across LaTeX environments
Is it possible to eat quietly in Minecraft?
Was US film used in Luna 3?
Impact of throwing away fruit waste on a peak > 3200 m above a glacier
Is an easily guessed plot twist a good plot twist?
What is the purpose of this "red room" in "Stranger Things"?
Can GPL and BSD licensed applications be used for government work?
Extrapolation v. Interpolation
Why did NASA use Imperial units?
My current job follows "worst practices". How can I talk about my experience in an interview without giving off red flags?
Correct use of smash with math and root signs
In Local Search, which reheating techniques have a good track record?
German phrase for 'suited and booted'
What is wrong with this query (unexpected token: AND)
Short story where a flexible reality hardens to an unchanging one
"It is what it is" in French
If a check is written for bill, but account number is not mentioned on memo line, is it still processed?
Is it OK to accept a job opportunity while planning on not taking it?
dos2unix is unable to convert typescript file to unix format
Are gangsters hired to attack people at a train station classified as a terrorist attack?
Why is a dedicated QA team member necessary?
Historicity doubted by Romans
Can't understand how static works exactly
Why can't a country print its own money to spend it only abroad?
Screen scraping using Beautiful soup
Options for HTML scraping?Make a div fill the height of the remaining screen spaceUsing python re.compile with beautiful soup to match a stringBeautifulsoup can't find textScraping Indeed with Beautiful SoupScraping a table with beautiful soupUsing Beautiful Soup how can I return this value and use it as an integer?Scraping Table using Python and SeleniumI am getting text error while the code is appicable for on company using python beautifulsoupComments are visible on the webpage, but the html object returned by BeautifulSoup did not contain the comment parts
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I am trying to extract some information from a website. I need to click on a link which is inside the 'a' tag. I am able to get to the tag. But when I try to click on it. I am getting a error called 'NoneType' object is not callable.
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import pandas as pd
browser = webdriver.Chrome()
browser.get("url")
browser.find_element_by_class_name('formButton').click()
soup = BeautifulSoup(browser.page_source, 'html.parser')
embargo = soup.find_all(class_="dataOff")
for row in embargo:
cells = row.find_all("td")
rail = cells[0].get_text().strip()
embargo = cells[1].find_element_by_class_name('dataOff').click()
Here is the HTML tag I want the beautiful soup to click on.
<table class="dataLiquidTable">
<tr id = "headerRow> .... </tr>
<tr class = "dataOff">
<td> AO </td>
<td> <a href="url"> </a> </td>
The code should click the link which is inside the 'a' tag.
python html web-scraping beautifulsoup
add a comment |
I am trying to extract some information from a website. I need to click on a link which is inside the 'a' tag. I am able to get to the tag. But when I try to click on it. I am getting a error called 'NoneType' object is not callable.
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import pandas as pd
browser = webdriver.Chrome()
browser.get("url")
browser.find_element_by_class_name('formButton').click()
soup = BeautifulSoup(browser.page_source, 'html.parser')
embargo = soup.find_all(class_="dataOff")
for row in embargo:
cells = row.find_all("td")
rail = cells[0].get_text().strip()
embargo = cells[1].find_element_by_class_name('dataOff').click()
Here is the HTML tag I want the beautiful soup to click on.
<table class="dataLiquidTable">
<tr id = "headerRow> .... </tr>
<tr class = "dataOff">
<td> AO </td>
<td> <a href="url"> </a> </td>
The code should click the link which is inside the 'a' tag.
python html web-scraping beautifulsoup
It won't help that the HTML is broken with"headerRow. Also, it looks like you are trying to find "dataOff" within "dataOff". This will fail according to your fragment, "dataOff" is only on the row.
– Andy G
Mar 26 at 15:22
The header row contains the heading of the table.
– Unicorn-17
Mar 26 at 15:32
add a comment |
I am trying to extract some information from a website. I need to click on a link which is inside the 'a' tag. I am able to get to the tag. But when I try to click on it. I am getting a error called 'NoneType' object is not callable.
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import pandas as pd
browser = webdriver.Chrome()
browser.get("url")
browser.find_element_by_class_name('formButton').click()
soup = BeautifulSoup(browser.page_source, 'html.parser')
embargo = soup.find_all(class_="dataOff")
for row in embargo:
cells = row.find_all("td")
rail = cells[0].get_text().strip()
embargo = cells[1].find_element_by_class_name('dataOff').click()
Here is the HTML tag I want the beautiful soup to click on.
<table class="dataLiquidTable">
<tr id = "headerRow> .... </tr>
<tr class = "dataOff">
<td> AO </td>
<td> <a href="url"> </a> </td>
The code should click the link which is inside the 'a' tag.
python html web-scraping beautifulsoup
I am trying to extract some information from a website. I need to click on a link which is inside the 'a' tag. I am able to get to the tag. But when I try to click on it. I am getting a error called 'NoneType' object is not callable.
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import pandas as pd
browser = webdriver.Chrome()
browser.get("url")
browser.find_element_by_class_name('formButton').click()
soup = BeautifulSoup(browser.page_source, 'html.parser')
embargo = soup.find_all(class_="dataOff")
for row in embargo:
cells = row.find_all("td")
rail = cells[0].get_text().strip()
embargo = cells[1].find_element_by_class_name('dataOff').click()
Here is the HTML tag I want the beautiful soup to click on.
<table class="dataLiquidTable">
<tr id = "headerRow> .... </tr>
<tr class = "dataOff">
<td> AO </td>
<td> <a href="url"> </a> </td>
The code should click the link which is inside the 'a' tag.
python html web-scraping beautifulsoup
python html web-scraping beautifulsoup
edited Mar 26 at 15:26
QHarr
49.2k9 gold badges28 silver badges51 bronze badges
49.2k9 gold badges28 silver badges51 bronze badges
asked Mar 26 at 15:20
Unicorn-17Unicorn-17
11 bronze badge
11 bronze badge
It won't help that the HTML is broken with"headerRow. Also, it looks like you are trying to find "dataOff" within "dataOff". This will fail according to your fragment, "dataOff" is only on the row.
– Andy G
Mar 26 at 15:22
The header row contains the heading of the table.
– Unicorn-17
Mar 26 at 15:32
add a comment |
It won't help that the HTML is broken with"headerRow. Also, it looks like you are trying to find "dataOff" within "dataOff". This will fail according to your fragment, "dataOff" is only on the row.
– Andy G
Mar 26 at 15:22
The header row contains the heading of the table.
– Unicorn-17
Mar 26 at 15:32
It won't help that the HTML is broken with
"headerRow. Also, it looks like you are trying to find "dataOff" within "dataOff". This will fail according to your fragment, "dataOff" is only on the row.– Andy G
Mar 26 at 15:22
It won't help that the HTML is broken with
"headerRow. Also, it looks like you are trying to find "dataOff" within "dataOff". This will fail according to your fragment, "dataOff" is only on the row.– Andy G
Mar 26 at 15:22
The header row contains the heading of the table.
– Unicorn-17
Mar 26 at 15:32
The header row contains the heading of the table.
– Unicorn-17
Mar 26 at 15:32
add a comment |
1 Answer
1
active
oldest
votes
Try the following which targets the first child a tag with element with class dataOff in the table
browser.find_element_by_css_selector(".dataLiquidTable .dataOff a").click()
Looks like perhaps you want multiple links in which case try and extract links first (hopefully they are valid Urls)
links = [item.get_attribute('href') for item in browser.find_elements_by_css_selector(".dataLiquidTable .dataOff a")]
for link in links:
browser.get(link)
You would then join the info you get from those pages with the info from the start of your code. Assuming lengths of returned lists are the same.
I am not sure
embargo = cells[1].find_element_by_class_name('dataOff').click()
is valid as it is performing an action yet you attempt an assignment. I assume you want to go to a new page. If you can clarify that. That step is what I am replacing by gathering the links from the a tag elements to use as required.
Otherwise, you can always gather the webElements with
elems = browser.find_elements_by_css_selector(".dataLiquidTable .dataOff a")
Where would this line go in the code?
– Unicorn-17
Mar 26 at 15:30
erm... what is your question please?
– QHarr
Mar 26 at 16:50
<td> <a href="url"> sample text </a> </td> . I want the code to click on this. e.g Click on the sample text which will take me to the new page. Also embargo = cells[1].find_element_by_class_name('dataOff').click() is not working with an error called 'NoneType' object is not callable.
– Unicorn-17
Mar 26 at 16:54
Then see my section where i extract all the links into a list you can try to .get to
– QHarr
Mar 26 at 16:57
How can I click on the first link stored in the list?
– Unicorn-17
Mar 26 at 17:06
|
show 4 more comments
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55360679%2fscreen-scraping-using-beautiful-soup%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Try the following which targets the first child a tag with element with class dataOff in the table
browser.find_element_by_css_selector(".dataLiquidTable .dataOff a").click()
Looks like perhaps you want multiple links in which case try and extract links first (hopefully they are valid Urls)
links = [item.get_attribute('href') for item in browser.find_elements_by_css_selector(".dataLiquidTable .dataOff a")]
for link in links:
browser.get(link)
You would then join the info you get from those pages with the info from the start of your code. Assuming lengths of returned lists are the same.
I am not sure
embargo = cells[1].find_element_by_class_name('dataOff').click()
is valid as it is performing an action yet you attempt an assignment. I assume you want to go to a new page. If you can clarify that. That step is what I am replacing by gathering the links from the a tag elements to use as required.
Otherwise, you can always gather the webElements with
elems = browser.find_elements_by_css_selector(".dataLiquidTable .dataOff a")
Where would this line go in the code?
– Unicorn-17
Mar 26 at 15:30
erm... what is your question please?
– QHarr
Mar 26 at 16:50
<td> <a href="url"> sample text </a> </td> . I want the code to click on this. e.g Click on the sample text which will take me to the new page. Also embargo = cells[1].find_element_by_class_name('dataOff').click() is not working with an error called 'NoneType' object is not callable.
– Unicorn-17
Mar 26 at 16:54
Then see my section where i extract all the links into a list you can try to .get to
– QHarr
Mar 26 at 16:57
How can I click on the first link stored in the list?
– Unicorn-17
Mar 26 at 17:06
|
show 4 more comments
Try the following which targets the first child a tag with element with class dataOff in the table
browser.find_element_by_css_selector(".dataLiquidTable .dataOff a").click()
Looks like perhaps you want multiple links in which case try and extract links first (hopefully they are valid Urls)
links = [item.get_attribute('href') for item in browser.find_elements_by_css_selector(".dataLiquidTable .dataOff a")]
for link in links:
browser.get(link)
You would then join the info you get from those pages with the info from the start of your code. Assuming lengths of returned lists are the same.
I am not sure
embargo = cells[1].find_element_by_class_name('dataOff').click()
is valid as it is performing an action yet you attempt an assignment. I assume you want to go to a new page. If you can clarify that. That step is what I am replacing by gathering the links from the a tag elements to use as required.
Otherwise, you can always gather the webElements with
elems = browser.find_elements_by_css_selector(".dataLiquidTable .dataOff a")
Where would this line go in the code?
– Unicorn-17
Mar 26 at 15:30
erm... what is your question please?
– QHarr
Mar 26 at 16:50
<td> <a href="url"> sample text </a> </td> . I want the code to click on this. e.g Click on the sample text which will take me to the new page. Also embargo = cells[1].find_element_by_class_name('dataOff').click() is not working with an error called 'NoneType' object is not callable.
– Unicorn-17
Mar 26 at 16:54
Then see my section where i extract all the links into a list you can try to .get to
– QHarr
Mar 26 at 16:57
How can I click on the first link stored in the list?
– Unicorn-17
Mar 26 at 17:06
|
show 4 more comments
Try the following which targets the first child a tag with element with class dataOff in the table
browser.find_element_by_css_selector(".dataLiquidTable .dataOff a").click()
Looks like perhaps you want multiple links in which case try and extract links first (hopefully they are valid Urls)
links = [item.get_attribute('href') for item in browser.find_elements_by_css_selector(".dataLiquidTable .dataOff a")]
for link in links:
browser.get(link)
You would then join the info you get from those pages with the info from the start of your code. Assuming lengths of returned lists are the same.
I am not sure
embargo = cells[1].find_element_by_class_name('dataOff').click()
is valid as it is performing an action yet you attempt an assignment. I assume you want to go to a new page. If you can clarify that. That step is what I am replacing by gathering the links from the a tag elements to use as required.
Otherwise, you can always gather the webElements with
elems = browser.find_elements_by_css_selector(".dataLiquidTable .dataOff a")
Try the following which targets the first child a tag with element with class dataOff in the table
browser.find_element_by_css_selector(".dataLiquidTable .dataOff a").click()
Looks like perhaps you want multiple links in which case try and extract links first (hopefully they are valid Urls)
links = [item.get_attribute('href') for item in browser.find_elements_by_css_selector(".dataLiquidTable .dataOff a")]
for link in links:
browser.get(link)
You would then join the info you get from those pages with the info from the start of your code. Assuming lengths of returned lists are the same.
I am not sure
embargo = cells[1].find_element_by_class_name('dataOff').click()
is valid as it is performing an action yet you attempt an assignment. I assume you want to go to a new page. If you can clarify that. That step is what I am replacing by gathering the links from the a tag elements to use as required.
Otherwise, you can always gather the webElements with
elems = browser.find_elements_by_css_selector(".dataLiquidTable .dataOff a")
edited Mar 26 at 16:44
answered Mar 26 at 15:24
QHarrQHarr
49.2k9 gold badges28 silver badges51 bronze badges
49.2k9 gold badges28 silver badges51 bronze badges
Where would this line go in the code?
– Unicorn-17
Mar 26 at 15:30
erm... what is your question please?
– QHarr
Mar 26 at 16:50
<td> <a href="url"> sample text </a> </td> . I want the code to click on this. e.g Click on the sample text which will take me to the new page. Also embargo = cells[1].find_element_by_class_name('dataOff').click() is not working with an error called 'NoneType' object is not callable.
– Unicorn-17
Mar 26 at 16:54
Then see my section where i extract all the links into a list you can try to .get to
– QHarr
Mar 26 at 16:57
How can I click on the first link stored in the list?
– Unicorn-17
Mar 26 at 17:06
|
show 4 more comments
Where would this line go in the code?
– Unicorn-17
Mar 26 at 15:30
erm... what is your question please?
– QHarr
Mar 26 at 16:50
<td> <a href="url"> sample text </a> </td> . I want the code to click on this. e.g Click on the sample text which will take me to the new page. Also embargo = cells[1].find_element_by_class_name('dataOff').click() is not working with an error called 'NoneType' object is not callable.
– Unicorn-17
Mar 26 at 16:54
Then see my section where i extract all the links into a list you can try to .get to
– QHarr
Mar 26 at 16:57
How can I click on the first link stored in the list?
– Unicorn-17
Mar 26 at 17:06
Where would this line go in the code?
– Unicorn-17
Mar 26 at 15:30
Where would this line go in the code?
– Unicorn-17
Mar 26 at 15:30
erm... what is your question please?
– QHarr
Mar 26 at 16:50
erm... what is your question please?
– QHarr
Mar 26 at 16:50
<td> <a href="url"> sample text </a> </td> . I want the code to click on this. e.g Click on the sample text which will take me to the new page. Also embargo = cells[1].find_element_by_class_name('dataOff').click() is not working with an error called 'NoneType' object is not callable.
– Unicorn-17
Mar 26 at 16:54
<td> <a href="url"> sample text </a> </td> . I want the code to click on this. e.g Click on the sample text which will take me to the new page. Also embargo = cells[1].find_element_by_class_name('dataOff').click() is not working with an error called 'NoneType' object is not callable.
– Unicorn-17
Mar 26 at 16:54
Then see my section where i extract all the links into a list you can try to .get to
– QHarr
Mar 26 at 16:57
Then see my section where i extract all the links into a list you can try to .get to
– QHarr
Mar 26 at 16:57
How can I click on the first link stored in the list?
– Unicorn-17
Mar 26 at 17:06
How can I click on the first link stored in the list?
– Unicorn-17
Mar 26 at 17:06
|
show 4 more comments
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55360679%2fscreen-scraping-using-beautiful-soup%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
It won't help that the HTML is broken with
"headerRow. Also, it looks like you are trying to find "dataOff" within "dataOff". This will fail according to your fragment, "dataOff" is only on the row.– Andy G
Mar 26 at 15:22
The header row contains the heading of the table.
– Unicorn-17
Mar 26 at 15:32