How to use BeautifulSoup to scrape table links from redditHow to randomly select an item from a list?How do I remove an element from a list by index in Python?How do I trim whitespace from a Python string?How do you read from stdin?How to remove a key from a Python dictionary?Web parsing with python beautifulsoup producing inconsistent resultScraping Table using Python and Seleniumpython asyncronous images download (multiple urls)Scraping JSON data from e-commerce Ajax site with Python
Grade-school elementary algebra presented in an abstract-algebra style?
Why are GND pads often only connected by four traces?
Do photons bend spacetime or not?
Of strange atmospheres - the survivable but unbreathable
Function argument returning void or non-void type
Are runways booked by airlines to land their planes?
Freedom of Speech and Assembly in China
Why did Theresa May offer a vote on a second Brexit referendum?
Is superuser the same as root?
Can my floppy disk still work without a shutter spring?
Why does Bran want to find Drogon?
What is the use case for non-breathable waterproof pants?
Why A=2 and B=1 in the call signs for Spirit and Opportunity?
Dad jokes are fun
“Quand même” to mean “anyway”
Time complexity of an algorithm: Is it important to state the base of the logarithm?
USPS Back Room - Trespassing?
Is there a single word meaning "the thing that attracts me"?
What is the meaning of "<&3" and "done < file11 3< file22"
Why would a rational buyer offer to buy with no conditions precedent?
Can a person survive on blood in place of water?
What weight should be given to writers groups critiques?
Why is the Eisenstein ideal paper so great?
What could a self-sustaining lunar colony slowly lose that would ultimately prove fatal?
How to use BeautifulSoup to scrape table links from reddit
How to randomly select an item from a list?How do I remove an element from a list by index in Python?How do I trim whitespace from a Python string?How do you read from stdin?How to remove a key from a Python dictionary?Web parsing with python beautifulsoup producing inconsistent resultScraping Table using Python and Seleniumpython asyncronous images download (multiple urls)Scraping JSON data from e-commerce Ajax site with Python
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I'm trying to scrape links from a Reddit table by using Beautiful Soup, and can successfully extract all of the table's contents except for the URLs. I am using item.find_all('a') but it's returning an empty list when using this code:
import praw
import csv
import requests
from bs4 import BeautifulSoup
def Authorize():
"""Authorizes Reddit API"""
reddit = praw.Reddit(client_id='',
client_secret='',
username='',
password='',
user_agent='user')
url = 'https://old.reddit.com/r/formattesting/comments/94nc49/will_it_work/'
headers = 'User-Agent': 'Mozilla/5.0'
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
table_extract = soup.find_all('table')[0]
table_extract_items = table_extract.find_all('a')
for item in table_extract_items:
letter_name = item.contents[0]
links = item.find_all('a')
print(letter_name)
print(links)
This is what it returns:
6GB EVGA GTX 980 TI
[]
Intel i7-4790K
[]
Asus Z97-K Motherboard
[]
2x8 HyperX Fury DDR3 RAM
[]
Elagto HD 60 Pro Capture Card
[]
I would like for there to be the URL where the empty list is below each table row.
I am not sure if this makes a difference in the construct, but the end goal is to extract all of the table contents and links (keeping the association between the two) and save to a CSV as two columns. But for now I am just trying to print to keep it simple.
python python-3.x beautifulsoup
add a comment |
I'm trying to scrape links from a Reddit table by using Beautiful Soup, and can successfully extract all of the table's contents except for the URLs. I am using item.find_all('a') but it's returning an empty list when using this code:
import praw
import csv
import requests
from bs4 import BeautifulSoup
def Authorize():
"""Authorizes Reddit API"""
reddit = praw.Reddit(client_id='',
client_secret='',
username='',
password='',
user_agent='user')
url = 'https://old.reddit.com/r/formattesting/comments/94nc49/will_it_work/'
headers = 'User-Agent': 'Mozilla/5.0'
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
table_extract = soup.find_all('table')[0]
table_extract_items = table_extract.find_all('a')
for item in table_extract_items:
letter_name = item.contents[0]
links = item.find_all('a')
print(letter_name)
print(links)
This is what it returns:
6GB EVGA GTX 980 TI
[]
Intel i7-4790K
[]
Asus Z97-K Motherboard
[]
2x8 HyperX Fury DDR3 RAM
[]
Elagto HD 60 Pro Capture Card
[]
I would like for there to be the URL where the empty list is below each table row.
I am not sure if this makes a difference in the construct, but the end goal is to extract all of the table contents and links (keeping the association between the two) and save to a CSV as two columns. But for now I am just trying to print to keep it simple.
python python-3.x beautifulsoup
Are you looking for the links to the Imgur images?
– Jack Fleeting
Mar 24 at 0:40
add a comment |
I'm trying to scrape links from a Reddit table by using Beautiful Soup, and can successfully extract all of the table's contents except for the URLs. I am using item.find_all('a') but it's returning an empty list when using this code:
import praw
import csv
import requests
from bs4 import BeautifulSoup
def Authorize():
"""Authorizes Reddit API"""
reddit = praw.Reddit(client_id='',
client_secret='',
username='',
password='',
user_agent='user')
url = 'https://old.reddit.com/r/formattesting/comments/94nc49/will_it_work/'
headers = 'User-Agent': 'Mozilla/5.0'
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
table_extract = soup.find_all('table')[0]
table_extract_items = table_extract.find_all('a')
for item in table_extract_items:
letter_name = item.contents[0]
links = item.find_all('a')
print(letter_name)
print(links)
This is what it returns:
6GB EVGA GTX 980 TI
[]
Intel i7-4790K
[]
Asus Z97-K Motherboard
[]
2x8 HyperX Fury DDR3 RAM
[]
Elagto HD 60 Pro Capture Card
[]
I would like for there to be the URL where the empty list is below each table row.
I am not sure if this makes a difference in the construct, but the end goal is to extract all of the table contents and links (keeping the association between the two) and save to a CSV as two columns. But for now I am just trying to print to keep it simple.
python python-3.x beautifulsoup
I'm trying to scrape links from a Reddit table by using Beautiful Soup, and can successfully extract all of the table's contents except for the URLs. I am using item.find_all('a') but it's returning an empty list when using this code:
import praw
import csv
import requests
from bs4 import BeautifulSoup
def Authorize():
"""Authorizes Reddit API"""
reddit = praw.Reddit(client_id='',
client_secret='',
username='',
password='',
user_agent='user')
url = 'https://old.reddit.com/r/formattesting/comments/94nc49/will_it_work/'
headers = 'User-Agent': 'Mozilla/5.0'
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
table_extract = soup.find_all('table')[0]
table_extract_items = table_extract.find_all('a')
for item in table_extract_items:
letter_name = item.contents[0]
links = item.find_all('a')
print(letter_name)
print(links)
This is what it returns:
6GB EVGA GTX 980 TI
[]
Intel i7-4790K
[]
Asus Z97-K Motherboard
[]
2x8 HyperX Fury DDR3 RAM
[]
Elagto HD 60 Pro Capture Card
[]
I would like for there to be the URL where the empty list is below each table row.
I am not sure if this makes a difference in the construct, but the end goal is to extract all of the table contents and links (keeping the association between the two) and save to a CSV as two columns. But for now I am just trying to print to keep it simple.
python python-3.x beautifulsoup
python python-3.x beautifulsoup
edited Mar 24 at 1:13
Grijesh Chauhan
46.4k1498164
46.4k1498164
asked Mar 24 at 0:25
MSDMSD
579
579
Are you looking for the links to the Imgur images?
– Jack Fleeting
Mar 24 at 0:40
add a comment |
Are you looking for the links to the Imgur images?
– Jack Fleeting
Mar 24 at 0:40
Are you looking for the links to the Imgur images?
– Jack Fleeting
Mar 24 at 0:40
Are you looking for the links to the Imgur images?
– Jack Fleeting
Mar 24 at 0:40
add a comment |
1 Answer
1
active
oldest
votes
You were almost near. Your table_extract_items are HTML anchors from which you need to extract text – the content and attribute href using [ ] operators. I guess the inappropriate choice of variables name confused you. The line inside for-loop links = item.find_all('a') is wrong!
Here is my solution:
for anchor in table.findAll('a'):
# if not anchor: finaAll returns empty list, .find() return None
# continue
href = anchor['href']
print (href)
print (anchor.text)
table in my code is what you named table_extract in your code
check this:
In [40]: for anchor in table.findAll('a'):
# if not anchor:
# continue
href = anchor['href']
text = anchor.text
print (href, "--", text)
....:
https://imgur.com/a/Y1WlDiK -- 6GB EVGA GTX 980 TI
https://imgur.com/gallery/yxkPF3g -- Intel i7-4790K
https://imgur.com/gallery/nUKnya3 -- Asus Z97-K Motherboard
https://imgur.com/gallery/9YIU19P -- 2x8 HyperX Fury DDR3 RAM
https://imgur.com/gallery/pNqXC2z -- Elagto HD 60 Pro Capture Card
https://imgur.com/gallery/5K3bqMp -- Samsung EVO 250 GB SSD
https://imgur.com/FO8JoQO -- Corsair Scimtar MMO Mouse
https://imgur.com/C8PFsX0 -- Corsair K70 RGB Rapidfire Keyboard
https://imgur.com/hfCEzMA -- I messed up
Thank you, that did the trick!
– MSD
Mar 25 at 12:28
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55319642%2fhow-to-use-beautifulsoup-to-scrape-table-links-from-reddit%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You were almost near. Your table_extract_items are HTML anchors from which you need to extract text – the content and attribute href using [ ] operators. I guess the inappropriate choice of variables name confused you. The line inside for-loop links = item.find_all('a') is wrong!
Here is my solution:
for anchor in table.findAll('a'):
# if not anchor: finaAll returns empty list, .find() return None
# continue
href = anchor['href']
print (href)
print (anchor.text)
table in my code is what you named table_extract in your code
check this:
In [40]: for anchor in table.findAll('a'):
# if not anchor:
# continue
href = anchor['href']
text = anchor.text
print (href, "--", text)
....:
https://imgur.com/a/Y1WlDiK -- 6GB EVGA GTX 980 TI
https://imgur.com/gallery/yxkPF3g -- Intel i7-4790K
https://imgur.com/gallery/nUKnya3 -- Asus Z97-K Motherboard
https://imgur.com/gallery/9YIU19P -- 2x8 HyperX Fury DDR3 RAM
https://imgur.com/gallery/pNqXC2z -- Elagto HD 60 Pro Capture Card
https://imgur.com/gallery/5K3bqMp -- Samsung EVO 250 GB SSD
https://imgur.com/FO8JoQO -- Corsair Scimtar MMO Mouse
https://imgur.com/C8PFsX0 -- Corsair K70 RGB Rapidfire Keyboard
https://imgur.com/hfCEzMA -- I messed up
Thank you, that did the trick!
– MSD
Mar 25 at 12:28
add a comment |
You were almost near. Your table_extract_items are HTML anchors from which you need to extract text – the content and attribute href using [ ] operators. I guess the inappropriate choice of variables name confused you. The line inside for-loop links = item.find_all('a') is wrong!
Here is my solution:
for anchor in table.findAll('a'):
# if not anchor: finaAll returns empty list, .find() return None
# continue
href = anchor['href']
print (href)
print (anchor.text)
table in my code is what you named table_extract in your code
check this:
In [40]: for anchor in table.findAll('a'):
# if not anchor:
# continue
href = anchor['href']
text = anchor.text
print (href, "--", text)
....:
https://imgur.com/a/Y1WlDiK -- 6GB EVGA GTX 980 TI
https://imgur.com/gallery/yxkPF3g -- Intel i7-4790K
https://imgur.com/gallery/nUKnya3 -- Asus Z97-K Motherboard
https://imgur.com/gallery/9YIU19P -- 2x8 HyperX Fury DDR3 RAM
https://imgur.com/gallery/pNqXC2z -- Elagto HD 60 Pro Capture Card
https://imgur.com/gallery/5K3bqMp -- Samsung EVO 250 GB SSD
https://imgur.com/FO8JoQO -- Corsair Scimtar MMO Mouse
https://imgur.com/C8PFsX0 -- Corsair K70 RGB Rapidfire Keyboard
https://imgur.com/hfCEzMA -- I messed up
Thank you, that did the trick!
– MSD
Mar 25 at 12:28
add a comment |
You were almost near. Your table_extract_items are HTML anchors from which you need to extract text – the content and attribute href using [ ] operators. I guess the inappropriate choice of variables name confused you. The line inside for-loop links = item.find_all('a') is wrong!
Here is my solution:
for anchor in table.findAll('a'):
# if not anchor: finaAll returns empty list, .find() return None
# continue
href = anchor['href']
print (href)
print (anchor.text)
table in my code is what you named table_extract in your code
check this:
In [40]: for anchor in table.findAll('a'):
# if not anchor:
# continue
href = anchor['href']
text = anchor.text
print (href, "--", text)
....:
https://imgur.com/a/Y1WlDiK -- 6GB EVGA GTX 980 TI
https://imgur.com/gallery/yxkPF3g -- Intel i7-4790K
https://imgur.com/gallery/nUKnya3 -- Asus Z97-K Motherboard
https://imgur.com/gallery/9YIU19P -- 2x8 HyperX Fury DDR3 RAM
https://imgur.com/gallery/pNqXC2z -- Elagto HD 60 Pro Capture Card
https://imgur.com/gallery/5K3bqMp -- Samsung EVO 250 GB SSD
https://imgur.com/FO8JoQO -- Corsair Scimtar MMO Mouse
https://imgur.com/C8PFsX0 -- Corsair K70 RGB Rapidfire Keyboard
https://imgur.com/hfCEzMA -- I messed up
You were almost near. Your table_extract_items are HTML anchors from which you need to extract text – the content and attribute href using [ ] operators. I guess the inappropriate choice of variables name confused you. The line inside for-loop links = item.find_all('a') is wrong!
Here is my solution:
for anchor in table.findAll('a'):
# if not anchor: finaAll returns empty list, .find() return None
# continue
href = anchor['href']
print (href)
print (anchor.text)
table in my code is what you named table_extract in your code
check this:
In [40]: for anchor in table.findAll('a'):
# if not anchor:
# continue
href = anchor['href']
text = anchor.text
print (href, "--", text)
....:
https://imgur.com/a/Y1WlDiK -- 6GB EVGA GTX 980 TI
https://imgur.com/gallery/yxkPF3g -- Intel i7-4790K
https://imgur.com/gallery/nUKnya3 -- Asus Z97-K Motherboard
https://imgur.com/gallery/9YIU19P -- 2x8 HyperX Fury DDR3 RAM
https://imgur.com/gallery/pNqXC2z -- Elagto HD 60 Pro Capture Card
https://imgur.com/gallery/5K3bqMp -- Samsung EVO 250 GB SSD
https://imgur.com/FO8JoQO -- Corsair Scimtar MMO Mouse
https://imgur.com/C8PFsX0 -- Corsair K70 RGB Rapidfire Keyboard
https://imgur.com/hfCEzMA -- I messed up
edited Mar 31 at 4:34
answered Mar 24 at 1:11
Grijesh ChauhanGrijesh Chauhan
46.4k1498164
46.4k1498164
Thank you, that did the trick!
– MSD
Mar 25 at 12:28
add a comment |
Thank you, that did the trick!
– MSD
Mar 25 at 12:28
Thank you, that did the trick!
– MSD
Mar 25 at 12:28
Thank you, that did the trick!
– MSD
Mar 25 at 12:28
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55319642%2fhow-to-use-beautifulsoup-to-scrape-table-links-from-reddit%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Are you looking for the links to the Imgur images?
– Jack Fleeting
Mar 24 at 0:40