How to use BeautifulSoup to scrape table links from redditHow to randomly select an item from a list?How do I remove an element from a list by index in Python?How do I trim whitespace from a Python string?How do you read from stdin?How to remove a key from a Python dictionary?Web parsing with python beautifulsoup producing inconsistent resultScraping Table using Python and Seleniumpython asyncronous images download (multiple urls)Scraping JSON data from e-commerce Ajax site with Python

Grade-school elementary algebra presented in an abstract-algebra style?

Why are GND pads often only connected by four traces?

Do photons bend spacetime or not?

Of strange atmospheres - the survivable but unbreathable

Function argument returning void or non-void type

Are runways booked by airlines to land their planes?

Freedom of Speech and Assembly in China

Why did Theresa May offer a vote on a second Brexit referendum?

Is superuser the same as root?

Can my floppy disk still work without a shutter spring?

Why does Bran want to find Drogon?

What is the use case for non-breathable waterproof pants?

Why A=2 and B=1 in the call signs for Spirit and Opportunity?

Dad jokes are fun

“Quand même” to mean “anyway”

Time complexity of an algorithm: Is it important to state the base of the logarithm?

USPS Back Room - Trespassing?

Is there a single word meaning "the thing that attracts me"?

What is the meaning of "<&3" and "done < file11 3< file22"

Why would a rational buyer offer to buy with no conditions precedent?

Can a person survive on blood in place of water?

What weight should be given to writers groups critiques?

Why is the Eisenstein ideal paper so great?

What could a self-sustaining lunar colony slowly lose that would ultimately prove fatal?



How to use BeautifulSoup to scrape table links from reddit


How to randomly select an item from a list?How do I remove an element from a list by index in Python?How do I trim whitespace from a Python string?How do you read from stdin?How to remove a key from a Python dictionary?Web parsing with python beautifulsoup producing inconsistent resultScraping Table using Python and Seleniumpython asyncronous images download (multiple urls)Scraping JSON data from e-commerce Ajax site with Python






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








2















I'm trying to scrape links from a Reddit table by using Beautiful Soup, and can successfully extract all of the table's contents except for the URLs. I am using item.find_all('a') but it's returning an empty list when using this code:



import praw
import csv
import requests
from bs4 import BeautifulSoup

def Authorize():
"""Authorizes Reddit API"""
reddit = praw.Reddit(client_id='',
client_secret='',
username='',
password='',
user_agent='user')

url = 'https://old.reddit.com/r/formattesting/comments/94nc49/will_it_work/'
headers = 'User-Agent': 'Mozilla/5.0'
page = requests.get(url, headers=headers)

soup = BeautifulSoup(page.text, 'html.parser')

table_extract = soup.find_all('table')[0]
table_extract_items = table_extract.find_all('a')

for item in table_extract_items:
letter_name = item.contents[0]
links = item.find_all('a')
print(letter_name)
print(links)


This is what it returns:



6GB EVGA GTX 980 TI
[]
Intel i7-4790K
[]
Asus Z97-K Motherboard
[]
2x8 HyperX Fury DDR3 RAM
[]
Elagto HD 60 Pro Capture Card
[]


I would like for there to be the URL where the empty list is below each table row.



I am not sure if this makes a difference in the construct, but the end goal is to extract all of the table contents and links (keeping the association between the two) and save to a CSV as two columns. But for now I am just trying to print to keep it simple.










share|improve this question
























  • Are you looking for the links to the Imgur images?

    – Jack Fleeting
    Mar 24 at 0:40

















2















I'm trying to scrape links from a Reddit table by using Beautiful Soup, and can successfully extract all of the table's contents except for the URLs. I am using item.find_all('a') but it's returning an empty list when using this code:



import praw
import csv
import requests
from bs4 import BeautifulSoup

def Authorize():
"""Authorizes Reddit API"""
reddit = praw.Reddit(client_id='',
client_secret='',
username='',
password='',
user_agent='user')

url = 'https://old.reddit.com/r/formattesting/comments/94nc49/will_it_work/'
headers = 'User-Agent': 'Mozilla/5.0'
page = requests.get(url, headers=headers)

soup = BeautifulSoup(page.text, 'html.parser')

table_extract = soup.find_all('table')[0]
table_extract_items = table_extract.find_all('a')

for item in table_extract_items:
letter_name = item.contents[0]
links = item.find_all('a')
print(letter_name)
print(links)


This is what it returns:



6GB EVGA GTX 980 TI
[]
Intel i7-4790K
[]
Asus Z97-K Motherboard
[]
2x8 HyperX Fury DDR3 RAM
[]
Elagto HD 60 Pro Capture Card
[]


I would like for there to be the URL where the empty list is below each table row.



I am not sure if this makes a difference in the construct, but the end goal is to extract all of the table contents and links (keeping the association between the two) and save to a CSV as two columns. But for now I am just trying to print to keep it simple.










share|improve this question
























  • Are you looking for the links to the Imgur images?

    – Jack Fleeting
    Mar 24 at 0:40













2












2








2








I'm trying to scrape links from a Reddit table by using Beautiful Soup, and can successfully extract all of the table's contents except for the URLs. I am using item.find_all('a') but it's returning an empty list when using this code:



import praw
import csv
import requests
from bs4 import BeautifulSoup

def Authorize():
"""Authorizes Reddit API"""
reddit = praw.Reddit(client_id='',
client_secret='',
username='',
password='',
user_agent='user')

url = 'https://old.reddit.com/r/formattesting/comments/94nc49/will_it_work/'
headers = 'User-Agent': 'Mozilla/5.0'
page = requests.get(url, headers=headers)

soup = BeautifulSoup(page.text, 'html.parser')

table_extract = soup.find_all('table')[0]
table_extract_items = table_extract.find_all('a')

for item in table_extract_items:
letter_name = item.contents[0]
links = item.find_all('a')
print(letter_name)
print(links)


This is what it returns:



6GB EVGA GTX 980 TI
[]
Intel i7-4790K
[]
Asus Z97-K Motherboard
[]
2x8 HyperX Fury DDR3 RAM
[]
Elagto HD 60 Pro Capture Card
[]


I would like for there to be the URL where the empty list is below each table row.



I am not sure if this makes a difference in the construct, but the end goal is to extract all of the table contents and links (keeping the association between the two) and save to a CSV as two columns. But for now I am just trying to print to keep it simple.










share|improve this question
















I'm trying to scrape links from a Reddit table by using Beautiful Soup, and can successfully extract all of the table's contents except for the URLs. I am using item.find_all('a') but it's returning an empty list when using this code:



import praw
import csv
import requests
from bs4 import BeautifulSoup

def Authorize():
"""Authorizes Reddit API"""
reddit = praw.Reddit(client_id='',
client_secret='',
username='',
password='',
user_agent='user')

url = 'https://old.reddit.com/r/formattesting/comments/94nc49/will_it_work/'
headers = 'User-Agent': 'Mozilla/5.0'
page = requests.get(url, headers=headers)

soup = BeautifulSoup(page.text, 'html.parser')

table_extract = soup.find_all('table')[0]
table_extract_items = table_extract.find_all('a')

for item in table_extract_items:
letter_name = item.contents[0]
links = item.find_all('a')
print(letter_name)
print(links)


This is what it returns:



6GB EVGA GTX 980 TI
[]
Intel i7-4790K
[]
Asus Z97-K Motherboard
[]
2x8 HyperX Fury DDR3 RAM
[]
Elagto HD 60 Pro Capture Card
[]


I would like for there to be the URL where the empty list is below each table row.



I am not sure if this makes a difference in the construct, but the end goal is to extract all of the table contents and links (keeping the association between the two) and save to a CSV as two columns. But for now I am just trying to print to keep it simple.







python python-3.x beautifulsoup






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 24 at 1:13









Grijesh Chauhan

46.4k1498164




46.4k1498164










asked Mar 24 at 0:25









MSDMSD

579




579












  • Are you looking for the links to the Imgur images?

    – Jack Fleeting
    Mar 24 at 0:40

















  • Are you looking for the links to the Imgur images?

    – Jack Fleeting
    Mar 24 at 0:40
















Are you looking for the links to the Imgur images?

– Jack Fleeting
Mar 24 at 0:40





Are you looking for the links to the Imgur images?

– Jack Fleeting
Mar 24 at 0:40












1 Answer
1






active

oldest

votes


















1














You were almost near. Your table_extract_items are HTML anchors from which you need to extract text – the content and attribute href using [ ] operators. I guess the inappropriate choice of variables name confused you. The line inside for-loop links = item.find_all('a') is wrong!



Here is my solution:



for anchor in table.findAll('a'):
# if not anchor: finaAll returns empty list, .find() return None
# continue
href = anchor['href']
print (href)
print (anchor.text)


table in my code is what you named table_extract in your code



check this:



In [40]: for anchor in table.findAll('a'):
# if not anchor:
# continue
href = anchor['href']
text = anchor.text
print (href, "--", text)
....:
https://imgur.com/a/Y1WlDiK -- 6GB EVGA GTX 980 TI
https://imgur.com/gallery/yxkPF3g -- Intel i7-4790K
https://imgur.com/gallery/nUKnya3 -- Asus Z97-K Motherboard
https://imgur.com/gallery/9YIU19P -- 2x8 HyperX Fury DDR3 RAM
https://imgur.com/gallery/pNqXC2z -- Elagto HD 60 Pro Capture Card
https://imgur.com/gallery/5K3bqMp -- Samsung EVO 250 GB SSD
https://imgur.com/FO8JoQO -- Corsair Scimtar MMO Mouse
https://imgur.com/C8PFsX0 -- Corsair K70 RGB Rapidfire Keyboard
https://imgur.com/hfCEzMA -- I messed up





share|improve this answer

























  • Thank you, that did the trick!

    – MSD
    Mar 25 at 12:28











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55319642%2fhow-to-use-beautifulsoup-to-scrape-table-links-from-reddit%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














You were almost near. Your table_extract_items are HTML anchors from which you need to extract text – the content and attribute href using [ ] operators. I guess the inappropriate choice of variables name confused you. The line inside for-loop links = item.find_all('a') is wrong!



Here is my solution:



for anchor in table.findAll('a'):
# if not anchor: finaAll returns empty list, .find() return None
# continue
href = anchor['href']
print (href)
print (anchor.text)


table in my code is what you named table_extract in your code



check this:



In [40]: for anchor in table.findAll('a'):
# if not anchor:
# continue
href = anchor['href']
text = anchor.text
print (href, "--", text)
....:
https://imgur.com/a/Y1WlDiK -- 6GB EVGA GTX 980 TI
https://imgur.com/gallery/yxkPF3g -- Intel i7-4790K
https://imgur.com/gallery/nUKnya3 -- Asus Z97-K Motherboard
https://imgur.com/gallery/9YIU19P -- 2x8 HyperX Fury DDR3 RAM
https://imgur.com/gallery/pNqXC2z -- Elagto HD 60 Pro Capture Card
https://imgur.com/gallery/5K3bqMp -- Samsung EVO 250 GB SSD
https://imgur.com/FO8JoQO -- Corsair Scimtar MMO Mouse
https://imgur.com/C8PFsX0 -- Corsair K70 RGB Rapidfire Keyboard
https://imgur.com/hfCEzMA -- I messed up





share|improve this answer

























  • Thank you, that did the trick!

    – MSD
    Mar 25 at 12:28















1














You were almost near. Your table_extract_items are HTML anchors from which you need to extract text – the content and attribute href using [ ] operators. I guess the inappropriate choice of variables name confused you. The line inside for-loop links = item.find_all('a') is wrong!



Here is my solution:



for anchor in table.findAll('a'):
# if not anchor: finaAll returns empty list, .find() return None
# continue
href = anchor['href']
print (href)
print (anchor.text)


table in my code is what you named table_extract in your code



check this:



In [40]: for anchor in table.findAll('a'):
# if not anchor:
# continue
href = anchor['href']
text = anchor.text
print (href, "--", text)
....:
https://imgur.com/a/Y1WlDiK -- 6GB EVGA GTX 980 TI
https://imgur.com/gallery/yxkPF3g -- Intel i7-4790K
https://imgur.com/gallery/nUKnya3 -- Asus Z97-K Motherboard
https://imgur.com/gallery/9YIU19P -- 2x8 HyperX Fury DDR3 RAM
https://imgur.com/gallery/pNqXC2z -- Elagto HD 60 Pro Capture Card
https://imgur.com/gallery/5K3bqMp -- Samsung EVO 250 GB SSD
https://imgur.com/FO8JoQO -- Corsair Scimtar MMO Mouse
https://imgur.com/C8PFsX0 -- Corsair K70 RGB Rapidfire Keyboard
https://imgur.com/hfCEzMA -- I messed up





share|improve this answer

























  • Thank you, that did the trick!

    – MSD
    Mar 25 at 12:28













1












1








1







You were almost near. Your table_extract_items are HTML anchors from which you need to extract text – the content and attribute href using [ ] operators. I guess the inappropriate choice of variables name confused you. The line inside for-loop links = item.find_all('a') is wrong!



Here is my solution:



for anchor in table.findAll('a'):
# if not anchor: finaAll returns empty list, .find() return None
# continue
href = anchor['href']
print (href)
print (anchor.text)


table in my code is what you named table_extract in your code



check this:



In [40]: for anchor in table.findAll('a'):
# if not anchor:
# continue
href = anchor['href']
text = anchor.text
print (href, "--", text)
....:
https://imgur.com/a/Y1WlDiK -- 6GB EVGA GTX 980 TI
https://imgur.com/gallery/yxkPF3g -- Intel i7-4790K
https://imgur.com/gallery/nUKnya3 -- Asus Z97-K Motherboard
https://imgur.com/gallery/9YIU19P -- 2x8 HyperX Fury DDR3 RAM
https://imgur.com/gallery/pNqXC2z -- Elagto HD 60 Pro Capture Card
https://imgur.com/gallery/5K3bqMp -- Samsung EVO 250 GB SSD
https://imgur.com/FO8JoQO -- Corsair Scimtar MMO Mouse
https://imgur.com/C8PFsX0 -- Corsair K70 RGB Rapidfire Keyboard
https://imgur.com/hfCEzMA -- I messed up





share|improve this answer















You were almost near. Your table_extract_items are HTML anchors from which you need to extract text – the content and attribute href using [ ] operators. I guess the inappropriate choice of variables name confused you. The line inside for-loop links = item.find_all('a') is wrong!



Here is my solution:



for anchor in table.findAll('a'):
# if not anchor: finaAll returns empty list, .find() return None
# continue
href = anchor['href']
print (href)
print (anchor.text)


table in my code is what you named table_extract in your code



check this:



In [40]: for anchor in table.findAll('a'):
# if not anchor:
# continue
href = anchor['href']
text = anchor.text
print (href, "--", text)
....:
https://imgur.com/a/Y1WlDiK -- 6GB EVGA GTX 980 TI
https://imgur.com/gallery/yxkPF3g -- Intel i7-4790K
https://imgur.com/gallery/nUKnya3 -- Asus Z97-K Motherboard
https://imgur.com/gallery/9YIU19P -- 2x8 HyperX Fury DDR3 RAM
https://imgur.com/gallery/pNqXC2z -- Elagto HD 60 Pro Capture Card
https://imgur.com/gallery/5K3bqMp -- Samsung EVO 250 GB SSD
https://imgur.com/FO8JoQO -- Corsair Scimtar MMO Mouse
https://imgur.com/C8PFsX0 -- Corsair K70 RGB Rapidfire Keyboard
https://imgur.com/hfCEzMA -- I messed up






share|improve this answer














share|improve this answer



share|improve this answer








edited Mar 31 at 4:34

























answered Mar 24 at 1:11









Grijesh ChauhanGrijesh Chauhan

46.4k1498164




46.4k1498164












  • Thank you, that did the trick!

    – MSD
    Mar 25 at 12:28

















  • Thank you, that did the trick!

    – MSD
    Mar 25 at 12:28
















Thank you, that did the trick!

– MSD
Mar 25 at 12:28





Thank you, that did the trick!

– MSD
Mar 25 at 12:28



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55319642%2fhow-to-use-beautifulsoup-to-scrape-table-links-from-reddit%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

용인 삼성생명 블루밍스 목차 통계 역대 감독 선수단 응원단 경기장 같이 보기 외부 링크 둘러보기 메뉴samsungblueminx.comeh선수 명단용인 삼성생명 블루밍스용인 삼성생명 블루밍스ehsamsungblueminx.comeheheheh

155 수학 과학 기타 둘러보기 메뉴eh추가해eh문서를 완성해