How to use BeautifulSoup to scrape table links from redditHow to randomly select an item from a list?How do I remove an element from a list by index in Python?How do I trim whitespace from a Python string?How do you read from stdin?How to remove a key from a Python dictionary?Web parsing with python beautifulsoup producing inconsistent resultScraping Table using Python and Seleniumpython asyncronous images download (multiple urls)Scraping JSON data from e-commerce Ajax site with Python

Grade-school elementary algebra presented in an abstract-algebra style?

Why are GND pads often only connected by four traces?

Do photons bend spacetime or not?

Of strange atmospheres - the survivable but unbreathable

Function argument returning void or non-void type

Are runways booked by airlines to land their planes?

Freedom of Speech and Assembly in China

Why did Theresa May offer a vote on a second Brexit referendum?

Is superuser the same as root?

Can my floppy disk still work without a shutter spring?

Why does Bran want to find Drogon?

What is the use case for non-breathable waterproof pants?

Why A=2 and B=1 in the call signs for Spirit and Opportunity?

Dad jokes are fun

“Quand même” to mean “anyway”

Time complexity of an algorithm: Is it important to state the base of the logarithm?

USPS Back Room - Trespassing?

Is there a single word meaning "the thing that attracts me"?

What is the meaning of "<&3" and "done < file11 3< file22"

Why would a rational buyer offer to buy with no conditions precedent?

Can a person survive on blood in place of water?

What weight should be given to writers groups critiques?

Why is the Eisenstein ideal paper so great?

What could a self-sustaining lunar colony slowly lose that would ultimately prove fatal?

How to use BeautifulSoup to scrape table links from reddit

How to randomly select an item from a list?How do I remove an element from a list by index in Python?How do I trim whitespace from a Python string?How do you read from stdin?How to remove a key from a Python dictionary?Web parsing with python beautifulsoup producing inconsistent resultScraping Table using Python and Seleniumpython asyncronous images download (multiple urls)Scraping JSON data from e-commerce Ajax site with Python

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I'm trying to scrape links from a Reddit table by using Beautiful Soup, and can successfully extract all of the table's contents except for the URLs. I am using item.find_all('a') but it's returning an empty list when using this code:

import praw
import csv
import requests
from bs4 import BeautifulSoup

def Authorize():
 """Authorizes Reddit API"""
 reddit = praw.Reddit(client_id='',
 client_secret='',
 username='',
 password='',
 user_agent='user')

url = 'https://old.reddit.com/r/formattesting/comments/94nc49/will_it_work/'
headers = 'User-Agent': 'Mozilla/5.0'
page = requests.get(url, headers=headers)

soup = BeautifulSoup(page.text, 'html.parser')

table_extract = soup.find_all('table')[0]
table_extract_items = table_extract.find_all('a')

for item in table_extract_items:
 letter_name = item.contents[0]
 links = item.find_all('a')
 print(letter_name)
 print(links)

This is what it returns:

6GB EVGA GTX 980 TI
[]
Intel i7-4790K
[]
Asus Z97-K Motherboard
[]
2x8 HyperX Fury DDR3 RAM
[]
Elagto HD 60 Pro Capture Card
[]

I would like for there to be the URL where the empty list is below each table row.

I am not sure if this makes a difference in the construct, but the end goal is to extract all of the table contents and links (keeping the association between the two) and save to a CSV as two columns. But for now I am just trying to print to keep it simple.

edited Mar 24 at 1:13

Grijesh Chauhan

46.4k1498164

asked Mar 24 at 0:25

MSD

579

Are you looking for the links to the Imgur images?

– Jack Fleeting
Mar 24 at 0:40

add a comment |

import praw
import csv
import requests
from bs4 import BeautifulSoup

def Authorize():
 """Authorizes Reddit API"""
 reddit = praw.Reddit(client_id='',
 client_secret='',
 username='',
 password='',
 user_agent='user')

url = 'https://old.reddit.com/r/formattesting/comments/94nc49/will_it_work/'
headers = 'User-Agent': 'Mozilla/5.0'
page = requests.get(url, headers=headers)

soup = BeautifulSoup(page.text, 'html.parser')

table_extract = soup.find_all('table')[0]
table_extract_items = table_extract.find_all('a')

for item in table_extract_items:
 letter_name = item.contents[0]
 links = item.find_all('a')
 print(letter_name)
 print(links)

This is what it returns:

6GB EVGA GTX 980 TI
[]
Intel i7-4790K
[]
Asus Z97-K Motherboard
[]
2x8 HyperX Fury DDR3 RAM
[]
Elagto HD 60 Pro Capture Card
[]

I would like for there to be the URL where the empty list is below each table row.

edited Mar 24 at 1:13

Grijesh Chauhan

46.4k1498164

asked Mar 24 at 0:25

MSD

579

Are you looking for the links to the Imgur images?

– Jack Fleeting
Mar 24 at 0:40

add a comment |

import praw
import csv
import requests
from bs4 import BeautifulSoup

def Authorize():
 """Authorizes Reddit API"""
 reddit = praw.Reddit(client_id='',
 client_secret='',
 username='',
 password='',
 user_agent='user')

url = 'https://old.reddit.com/r/formattesting/comments/94nc49/will_it_work/'
headers = 'User-Agent': 'Mozilla/5.0'
page = requests.get(url, headers=headers)

soup = BeautifulSoup(page.text, 'html.parser')

table_extract = soup.find_all('table')[0]
table_extract_items = table_extract.find_all('a')

for item in table_extract_items:
 letter_name = item.contents[0]
 links = item.find_all('a')
 print(letter_name)
 print(links)

This is what it returns:

6GB EVGA GTX 980 TI
[]
Intel i7-4790K
[]
Asus Z97-K Motherboard
[]
2x8 HyperX Fury DDR3 RAM
[]
Elagto HD 60 Pro Capture Card
[]

I would like for there to be the URL where the empty list is below each table row.

edited Mar 24 at 1:13

Grijesh Chauhan

46.4k1498164

asked Mar 24 at 0:25

MSD

579

import praw
import csv
import requests
from bs4 import BeautifulSoup

def Authorize():
 """Authorizes Reddit API"""
 reddit = praw.Reddit(client_id='',
 client_secret='',
 username='',
 password='',
 user_agent='user')

url = 'https://old.reddit.com/r/formattesting/comments/94nc49/will_it_work/'
headers = 'User-Agent': 'Mozilla/5.0'
page = requests.get(url, headers=headers)

soup = BeautifulSoup(page.text, 'html.parser')

table_extract = soup.find_all('table')[0]
table_extract_items = table_extract.find_all('a')

for item in table_extract_items:
 letter_name = item.contents[0]
 links = item.find_all('a')
 print(letter_name)
 print(links)

This is what it returns:

6GB EVGA GTX 980 TI
[]
Intel i7-4790K
[]
Asus Z97-K Motherboard
[]
2x8 HyperX Fury DDR3 RAM
[]
Elagto HD 60 Pro Capture Card
[]

I would like for there to be the URL where the empty list is below each table row.

python python-3.x beautifulsoup

edited Mar 24 at 1:13

Grijesh Chauhan

46.4k1498164

asked Mar 24 at 0:25

MSD

579

edited Mar 24 at 1:13

Grijesh Chauhan

46.4k1498164

asked Mar 24 at 0:25

MSD

579

edited Mar 24 at 1:13

Grijesh Chauhan

46.4k1498164

edited Mar 24 at 1:13

Grijesh Chauhan

46.4k1498164

edited Mar 24 at 1:13

Grijesh Chauhan

46.4k1498164

asked Mar 24 at 0:25

MSD

579

asked Mar 24 at 0:25

MSD

579

asked Mar 24 at 0:25

MSD

579

Are you looking for the links to the Imgur images?

– Jack Fleeting
Mar 24 at 0:40

add a comment |

Are you looking for the links to the Imgur images?

– Jack Fleeting
Mar 24 at 0:40

Are you looking for the links to the Imgur images?

– Jack Fleeting
Mar 24 at 0:40

add a comment |

1 Answer
1

active

oldest

votes

You were almost near. Your table_extract_items are HTML anchors from which you need to extract text – the content and attribute href using [ ] operators. I guess the inappropriate choice of variables name confused you. The line inside for-loop links = item.find_all('a') is wrong!

Here is my solution:

for anchor in table.findAll('a'):
 # if not anchor: finaAll returns empty list, .find() return None
 # continue
 href = anchor['href'] 
 print (href)
 print (anchor.text)

table in my code is what you named table_extract in your code

check this:

In [40]: for anchor in table.findAll('a'):
 # if not anchor:
 # continue
 href = anchor['href']
 text = anchor.text
 print (href, "--", text)
 ....: 
https://imgur.com/a/Y1WlDiK -- 6GB EVGA GTX 980 TI
https://imgur.com/gallery/yxkPF3g -- Intel i7-4790K
https://imgur.com/gallery/nUKnya3 -- Asus Z97-K Motherboard
https://imgur.com/gallery/9YIU19P -- 2x8 HyperX Fury DDR3 RAM
https://imgur.com/gallery/pNqXC2z -- Elagto HD 60 Pro Capture Card
https://imgur.com/gallery/5K3bqMp -- Samsung EVO 250 GB SSD
https://imgur.com/FO8JoQO -- Corsair Scimtar MMO Mouse
https://imgur.com/C8PFsX0 -- Corsair K70 RGB Rapidfire Keyboard
https://imgur.com/hfCEzMA -- I messed up

edited Mar 31 at 4:34

answered Mar 24 at 1:11

Grijesh Chauhan

46.4k1498164

Thank you, that did the trick!

– MSD
Mar 25 at 12:28

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55319642%2fhow-to-use-beautifulsoup-to-scrape-table-links-from-reddit%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Here is my solution:

for anchor in table.findAll('a'):
 # if not anchor: finaAll returns empty list, .find() return None
 # continue
 href = anchor['href'] 
 print (href)
 print (anchor.text)

table in my code is what you named table_extract in your code

check this:

In [40]: for anchor in table.findAll('a'):
 # if not anchor:
 # continue
 href = anchor['href']
 text = anchor.text
 print (href, "--", text)
 ....: 
https://imgur.com/a/Y1WlDiK -- 6GB EVGA GTX 980 TI
https://imgur.com/gallery/yxkPF3g -- Intel i7-4790K
https://imgur.com/gallery/nUKnya3 -- Asus Z97-K Motherboard
https://imgur.com/gallery/9YIU19P -- 2x8 HyperX Fury DDR3 RAM
https://imgur.com/gallery/pNqXC2z -- Elagto HD 60 Pro Capture Card
https://imgur.com/gallery/5K3bqMp -- Samsung EVO 250 GB SSD
https://imgur.com/FO8JoQO -- Corsair Scimtar MMO Mouse
https://imgur.com/C8PFsX0 -- Corsair K70 RGB Rapidfire Keyboard
https://imgur.com/hfCEzMA -- I messed up

edited Mar 31 at 4:34

answered Mar 24 at 1:11

Grijesh Chauhan

46.4k1498164

Thank you, that did the trick!

– MSD
Mar 25 at 12:28

add a comment |

Here is my solution:

for anchor in table.findAll('a'):
 # if not anchor: finaAll returns empty list, .find() return None
 # continue
 href = anchor['href'] 
 print (href)
 print (anchor.text)

table in my code is what you named table_extract in your code

check this:

In [40]: for anchor in table.findAll('a'):
 # if not anchor:
 # continue
 href = anchor['href']
 text = anchor.text
 print (href, "--", text)
 ....: 
https://imgur.com/a/Y1WlDiK -- 6GB EVGA GTX 980 TI
https://imgur.com/gallery/yxkPF3g -- Intel i7-4790K
https://imgur.com/gallery/nUKnya3 -- Asus Z97-K Motherboard
https://imgur.com/gallery/9YIU19P -- 2x8 HyperX Fury DDR3 RAM
https://imgur.com/gallery/pNqXC2z -- Elagto HD 60 Pro Capture Card
https://imgur.com/gallery/5K3bqMp -- Samsung EVO 250 GB SSD
https://imgur.com/FO8JoQO -- Corsair Scimtar MMO Mouse
https://imgur.com/C8PFsX0 -- Corsair K70 RGB Rapidfire Keyboard
https://imgur.com/hfCEzMA -- I messed up

edited Mar 31 at 4:34

answered Mar 24 at 1:11

Grijesh Chauhan

46.4k1498164

Thank you, that did the trick!

– MSD
Mar 25 at 12:28

add a comment |

Here is my solution:

for anchor in table.findAll('a'):
 # if not anchor: finaAll returns empty list, .find() return None
 # continue
 href = anchor['href'] 
 print (href)
 print (anchor.text)

table in my code is what you named table_extract in your code

check this:

In [40]: for anchor in table.findAll('a'):
 # if not anchor:
 # continue
 href = anchor['href']
 text = anchor.text
 print (href, "--", text)
 ....: 
https://imgur.com/a/Y1WlDiK -- 6GB EVGA GTX 980 TI
https://imgur.com/gallery/yxkPF3g -- Intel i7-4790K
https://imgur.com/gallery/nUKnya3 -- Asus Z97-K Motherboard
https://imgur.com/gallery/9YIU19P -- 2x8 HyperX Fury DDR3 RAM
https://imgur.com/gallery/pNqXC2z -- Elagto HD 60 Pro Capture Card
https://imgur.com/gallery/5K3bqMp -- Samsung EVO 250 GB SSD
https://imgur.com/FO8JoQO -- Corsair Scimtar MMO Mouse
https://imgur.com/C8PFsX0 -- Corsair K70 RGB Rapidfire Keyboard
https://imgur.com/hfCEzMA -- I messed up

edited Mar 31 at 4:34

answered Mar 24 at 1:11

Grijesh Chauhan

46.4k1498164

Here is my solution:

for anchor in table.findAll('a'):
 # if not anchor: finaAll returns empty list, .find() return None
 # continue
 href = anchor['href'] 
 print (href)
 print (anchor.text)

table in my code is what you named table_extract in your code

check this:

In [40]: for anchor in table.findAll('a'):
 # if not anchor:
 # continue
 href = anchor['href']
 text = anchor.text
 print (href, "--", text)
 ....: 
https://imgur.com/a/Y1WlDiK -- 6GB EVGA GTX 980 TI
https://imgur.com/gallery/yxkPF3g -- Intel i7-4790K
https://imgur.com/gallery/nUKnya3 -- Asus Z97-K Motherboard
https://imgur.com/gallery/9YIU19P -- 2x8 HyperX Fury DDR3 RAM
https://imgur.com/gallery/pNqXC2z -- Elagto HD 60 Pro Capture Card
https://imgur.com/gallery/5K3bqMp -- Samsung EVO 250 GB SSD
https://imgur.com/FO8JoQO -- Corsair Scimtar MMO Mouse
https://imgur.com/C8PFsX0 -- Corsair K70 RGB Rapidfire Keyboard
https://imgur.com/hfCEzMA -- I messed up

edited Mar 31 at 4:34

answered Mar 24 at 1:11

Grijesh Chauhan

46.4k1498164

edited Mar 31 at 4:34

answered Mar 24 at 1:11

Grijesh Chauhan

46.4k1498164

answered Mar 24 at 1:11

Grijesh Chauhan

46.4k1498164

answered Mar 24 at 1:11

Grijesh Chauhan

46.4k1498164

Thank you, that did the trick!

– MSD
Mar 25 at 12:28

add a comment |

Thank you, that did the trick!

– MSD
Mar 25 at 12:28

Thank you, that did the trick!

– MSD
Mar 25 at 12:28

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer
1

1 Answer
1

1 Answer
1