How to read the entire urls from the first column in a csv fileHow do I check whether a file exists without exceptions?How do I copy a file in Python?How do I modify the URL without reloading the page?How to get the value from the GET parameters?How do you read from stdin?How can I open a URL in Android's web browser from my application?How to change the URI (URL) for a remote Git repository?How do I list all files of a directory?How to read a file line-by-line into a list?Why is reading lines from stdin much slower in C++ than Python?
Could a person damage a jet airliner - from the outside - with their bare hands?
empApi with Lightning Web Components?
Does the new finding on "reversing a quantum jump mid-flight" rule out any interpretations of QM?
Is there a set of positive integers of density 1 which contains no infinite arithmetic progression?
Why the output signal of my amplifier is heavily distorted
Increase speed altering column on large table to NON NULL
How do free-speech protections in the United States apply in public to corporate misrepresentations?
How to befriend someone who doesn't like to talk?
Does putting salt first make it easier for attacker to bruteforce the hash?
Why does Inuyasha stop using Blades of Blood after the first few episodes
What aircraft was used as Air Force One for the flight between Southampton and Shannon?
Is it okay to have a sequel start immediately after the end of the first book?
Does the Nuka-Cola bottler actually generate nuka cola?
If there's something that implicates the president why is there then a national security issue? (John Dowd)
Why did the World Bank set the global poverty line at $1.90?
Printing Pascal’s triangle for n number of rows in Python
Why is Na5 not played in this line of the French Defense, Advance Variation?
Can you make an identity from this product?
What are formats in LaTeX and how to manage them?
60s or 70s novel about Empire of Man making 1st contact with 1st discovered alien race
Write a function that checks if a string starts with or contains something
How can I remove material from this wood beam?
C++ logging library
Who won a Game of Bar Dice?
How to read the entire urls from the first column in a csv file
How do I check whether a file exists without exceptions?How do I copy a file in Python?How do I modify the URL without reloading the page?How to get the value from the GET parameters?How do you read from stdin?How can I open a URL in Android's web browser from my application?How to change the URI (URL) for a remote Git repository?How do I list all files of a directory?How to read a file line-by-line into a list?Why is reading lines from stdin much slower in C++ than Python?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I am trying to read the urls from the first column in a csv file. In the csv file, there are 6051 urls in total which I want to read. To do so, I tried the following codes:
urls = []
with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
blogurl = csv.reader(csvfile)
for row in blogurl:
row = row[0]
print(row)
len(row)
However, the number of urls that are shown is only 65. I have no idea why the total number of urls appears differently from the csv file.
Can anybody help me with figuring out how to read all urls (6051 in total) from the csv file?
To read all the urls from the csv file, I also tried several different codes that resulted in the same number of urls (i.e., 65 urls) or failure, such as:
1)
openfile = open("C:/Users/hyoungm/Downloads/urls.csv")
r = csv.reader(openfile)
for i in r:
#the urls are in the first column ... 0 refers to the first column
blogurls = i[0]
print (blogurls)
len(blogurls)
2)
urls = pd.read_csv("C:/Users/hyoungm/Downloads/urls.csv")
with closing(requests.get(urls, stream = True)) as r:
reader = csv.reader(r.iter_lines(), delimiter = ',', quotechar = '""')
for row in reader:
print(row)
len(row)
3)
with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
lines = csv.reader(csvfile)
for i, line in enumerate(lines):
if i == 0:
for line in csvfile:
print(line[1:])
len(line)
4) and
blogurls = []
with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
r = csv.reader(csvfile)
for i in r:
blogurl = i[0]
r = requests.get(blogurl)
blogurls.append(blogurl)
for url in blogurls:
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, "html.parser")
len(blogurls)
I expect the output of 6051 urls as originally collected in the csv file, instead of 65 urls.
After reading all the urls, I am going to scrawl down the textual data from each url. I supposed to get the following textual data using all 6051 urls. Please click the following link for the image:
the codes and the outcomes based on 65 urls so far
python url
|
show 1 more comment
I am trying to read the urls from the first column in a csv file. In the csv file, there are 6051 urls in total which I want to read. To do so, I tried the following codes:
urls = []
with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
blogurl = csv.reader(csvfile)
for row in blogurl:
row = row[0]
print(row)
len(row)
However, the number of urls that are shown is only 65. I have no idea why the total number of urls appears differently from the csv file.
Can anybody help me with figuring out how to read all urls (6051 in total) from the csv file?
To read all the urls from the csv file, I also tried several different codes that resulted in the same number of urls (i.e., 65 urls) or failure, such as:
1)
openfile = open("C:/Users/hyoungm/Downloads/urls.csv")
r = csv.reader(openfile)
for i in r:
#the urls are in the first column ... 0 refers to the first column
blogurls = i[0]
print (blogurls)
len(blogurls)
2)
urls = pd.read_csv("C:/Users/hyoungm/Downloads/urls.csv")
with closing(requests.get(urls, stream = True)) as r:
reader = csv.reader(r.iter_lines(), delimiter = ',', quotechar = '""')
for row in reader:
print(row)
len(row)
3)
with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
lines = csv.reader(csvfile)
for i, line in enumerate(lines):
if i == 0:
for line in csvfile:
print(line[1:])
len(line)
4) and
blogurls = []
with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
r = csv.reader(csvfile)
for i in r:
blogurl = i[0]
r = requests.get(blogurl)
blogurls.append(blogurl)
for url in blogurls:
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, "html.parser")
len(blogurls)
I expect the output of 6051 urls as originally collected in the csv file, instead of 65 urls.
After reading all the urls, I am going to scrawl down the textual data from each url. I supposed to get the following textual data using all 6051 urls. Please click the following link for the image:
the codes and the outcomes based on 65 urls so far
python url
Is it possible that you post part of the CSV file here? Might be a formatting issue (unexpected newline characters or separators). Since your CSV is not very large, you could read it in in fullwith open(...) as csvfile: content = file.read()
. Then you could do some analysis to check if the file is read correctly, e.g. see iflen(content.splitlines()) == 6051
.
– Elias Strehle
Mar 24 at 21:08
@EliasStrehle Thank you for your suggestion. I tried your code and it shows the total number of 6051. However, the links are still only 65. In addition, using the codes you suggested, I cannot extract the textual data that I was able to (please see the attached image in my question). Would you mind me sending me the csv file to your email or other channels? I was unable to find how to attach the file here. Thank you again!
– Hyoungeun Moon
Mar 24 at 21:43
Try to separate getting the list of URLs from actually crawling them. That makes it easier to debug. What do you see when you run the loop withprint(row)
? Does it print only 65 URLs or does it print 6051 URLs?
– Elias Strehle
Mar 24 at 21:48
When I run print(row), it shows the 65 urls although there are actually 6051 urls in the first column. I am not sure why the codes can't read all 6051 urls. I need all the urls to scrawl down the textual data from them (shown in the image above). Would you mind trying again with my csv file?
– Hyoungeun Moon
Mar 26 at 5:44
Can you upload somewhere public? On GitHub for example?
– Elias Strehle
Mar 26 at 9:18
|
show 1 more comment
I am trying to read the urls from the first column in a csv file. In the csv file, there are 6051 urls in total which I want to read. To do so, I tried the following codes:
urls = []
with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
blogurl = csv.reader(csvfile)
for row in blogurl:
row = row[0]
print(row)
len(row)
However, the number of urls that are shown is only 65. I have no idea why the total number of urls appears differently from the csv file.
Can anybody help me with figuring out how to read all urls (6051 in total) from the csv file?
To read all the urls from the csv file, I also tried several different codes that resulted in the same number of urls (i.e., 65 urls) or failure, such as:
1)
openfile = open("C:/Users/hyoungm/Downloads/urls.csv")
r = csv.reader(openfile)
for i in r:
#the urls are in the first column ... 0 refers to the first column
blogurls = i[0]
print (blogurls)
len(blogurls)
2)
urls = pd.read_csv("C:/Users/hyoungm/Downloads/urls.csv")
with closing(requests.get(urls, stream = True)) as r:
reader = csv.reader(r.iter_lines(), delimiter = ',', quotechar = '""')
for row in reader:
print(row)
len(row)
3)
with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
lines = csv.reader(csvfile)
for i, line in enumerate(lines):
if i == 0:
for line in csvfile:
print(line[1:])
len(line)
4) and
blogurls = []
with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
r = csv.reader(csvfile)
for i in r:
blogurl = i[0]
r = requests.get(blogurl)
blogurls.append(blogurl)
for url in blogurls:
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, "html.parser")
len(blogurls)
I expect the output of 6051 urls as originally collected in the csv file, instead of 65 urls.
After reading all the urls, I am going to scrawl down the textual data from each url. I supposed to get the following textual data using all 6051 urls. Please click the following link for the image:
the codes and the outcomes based on 65 urls so far
python url
I am trying to read the urls from the first column in a csv file. In the csv file, there are 6051 urls in total which I want to read. To do so, I tried the following codes:
urls = []
with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
blogurl = csv.reader(csvfile)
for row in blogurl:
row = row[0]
print(row)
len(row)
However, the number of urls that are shown is only 65. I have no idea why the total number of urls appears differently from the csv file.
Can anybody help me with figuring out how to read all urls (6051 in total) from the csv file?
To read all the urls from the csv file, I also tried several different codes that resulted in the same number of urls (i.e., 65 urls) or failure, such as:
1)
openfile = open("C:/Users/hyoungm/Downloads/urls.csv")
r = csv.reader(openfile)
for i in r:
#the urls are in the first column ... 0 refers to the first column
blogurls = i[0]
print (blogurls)
len(blogurls)
2)
urls = pd.read_csv("C:/Users/hyoungm/Downloads/urls.csv")
with closing(requests.get(urls, stream = True)) as r:
reader = csv.reader(r.iter_lines(), delimiter = ',', quotechar = '""')
for row in reader:
print(row)
len(row)
3)
with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
lines = csv.reader(csvfile)
for i, line in enumerate(lines):
if i == 0:
for line in csvfile:
print(line[1:])
len(line)
4) and
blogurls = []
with open("C:/Users/hyoungm/Downloads/urls.csv") as csvfile:
r = csv.reader(csvfile)
for i in r:
blogurl = i[0]
r = requests.get(blogurl)
blogurls.append(blogurl)
for url in blogurls:
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, "html.parser")
len(blogurls)
I expect the output of 6051 urls as originally collected in the csv file, instead of 65 urls.
After reading all the urls, I am going to scrawl down the textual data from each url. I supposed to get the following textual data using all 6051 urls. Please click the following link for the image:
the codes and the outcomes based on 65 urls so far
python url
python url
edited Mar 24 at 21:33
Hyoungeun Moon
asked Mar 24 at 20:53
Hyoungeun MoonHyoungeun Moon
124
124
Is it possible that you post part of the CSV file here? Might be a formatting issue (unexpected newline characters or separators). Since your CSV is not very large, you could read it in in fullwith open(...) as csvfile: content = file.read()
. Then you could do some analysis to check if the file is read correctly, e.g. see iflen(content.splitlines()) == 6051
.
– Elias Strehle
Mar 24 at 21:08
@EliasStrehle Thank you for your suggestion. I tried your code and it shows the total number of 6051. However, the links are still only 65. In addition, using the codes you suggested, I cannot extract the textual data that I was able to (please see the attached image in my question). Would you mind me sending me the csv file to your email or other channels? I was unable to find how to attach the file here. Thank you again!
– Hyoungeun Moon
Mar 24 at 21:43
Try to separate getting the list of URLs from actually crawling them. That makes it easier to debug. What do you see when you run the loop withprint(row)
? Does it print only 65 URLs or does it print 6051 URLs?
– Elias Strehle
Mar 24 at 21:48
When I run print(row), it shows the 65 urls although there are actually 6051 urls in the first column. I am not sure why the codes can't read all 6051 urls. I need all the urls to scrawl down the textual data from them (shown in the image above). Would you mind trying again with my csv file?
– Hyoungeun Moon
Mar 26 at 5:44
Can you upload somewhere public? On GitHub for example?
– Elias Strehle
Mar 26 at 9:18
|
show 1 more comment
Is it possible that you post part of the CSV file here? Might be a formatting issue (unexpected newline characters or separators). Since your CSV is not very large, you could read it in in fullwith open(...) as csvfile: content = file.read()
. Then you could do some analysis to check if the file is read correctly, e.g. see iflen(content.splitlines()) == 6051
.
– Elias Strehle
Mar 24 at 21:08
@EliasStrehle Thank you for your suggestion. I tried your code and it shows the total number of 6051. However, the links are still only 65. In addition, using the codes you suggested, I cannot extract the textual data that I was able to (please see the attached image in my question). Would you mind me sending me the csv file to your email or other channels? I was unable to find how to attach the file here. Thank you again!
– Hyoungeun Moon
Mar 24 at 21:43
Try to separate getting the list of URLs from actually crawling them. That makes it easier to debug. What do you see when you run the loop withprint(row)
? Does it print only 65 URLs or does it print 6051 URLs?
– Elias Strehle
Mar 24 at 21:48
When I run print(row), it shows the 65 urls although there are actually 6051 urls in the first column. I am not sure why the codes can't read all 6051 urls. I need all the urls to scrawl down the textual data from them (shown in the image above). Would you mind trying again with my csv file?
– Hyoungeun Moon
Mar 26 at 5:44
Can you upload somewhere public? On GitHub for example?
– Elias Strehle
Mar 26 at 9:18
Is it possible that you post part of the CSV file here? Might be a formatting issue (unexpected newline characters or separators). Since your CSV is not very large, you could read it in in full
with open(...) as csvfile: content = file.read()
. Then you could do some analysis to check if the file is read correctly, e.g. see if len(content.splitlines()) == 6051
.– Elias Strehle
Mar 24 at 21:08
Is it possible that you post part of the CSV file here? Might be a formatting issue (unexpected newline characters or separators). Since your CSV is not very large, you could read it in in full
with open(...) as csvfile: content = file.read()
. Then you could do some analysis to check if the file is read correctly, e.g. see if len(content.splitlines()) == 6051
.– Elias Strehle
Mar 24 at 21:08
@EliasStrehle Thank you for your suggestion. I tried your code and it shows the total number of 6051. However, the links are still only 65. In addition, using the codes you suggested, I cannot extract the textual data that I was able to (please see the attached image in my question). Would you mind me sending me the csv file to your email or other channels? I was unable to find how to attach the file here. Thank you again!
– Hyoungeun Moon
Mar 24 at 21:43
@EliasStrehle Thank you for your suggestion. I tried your code and it shows the total number of 6051. However, the links are still only 65. In addition, using the codes you suggested, I cannot extract the textual data that I was able to (please see the attached image in my question). Would you mind me sending me the csv file to your email or other channels? I was unable to find how to attach the file here. Thank you again!
– Hyoungeun Moon
Mar 24 at 21:43
Try to separate getting the list of URLs from actually crawling them. That makes it easier to debug. What do you see when you run the loop with
print(row)
? Does it print only 65 URLs or does it print 6051 URLs?– Elias Strehle
Mar 24 at 21:48
Try to separate getting the list of URLs from actually crawling them. That makes it easier to debug. What do you see when you run the loop with
print(row)
? Does it print only 65 URLs or does it print 6051 URLs?– Elias Strehle
Mar 24 at 21:48
When I run print(row), it shows the 65 urls although there are actually 6051 urls in the first column. I am not sure why the codes can't read all 6051 urls. I need all the urls to scrawl down the textual data from them (shown in the image above). Would you mind trying again with my csv file?
– Hyoungeun Moon
Mar 26 at 5:44
When I run print(row), it shows the 65 urls although there are actually 6051 urls in the first column. I am not sure why the codes can't read all 6051 urls. I need all the urls to scrawl down the textual data from them (shown in the image above). Would you mind trying again with my csv file?
– Hyoungeun Moon
Mar 26 at 5:44
Can you upload somewhere public? On GitHub for example?
– Elias Strehle
Mar 26 at 9:18
Can you upload somewhere public? On GitHub for example?
– Elias Strehle
Mar 26 at 9:18
|
show 1 more comment
1 Answer
1
active
oldest
votes
The following two approaches work for me:
import requests
r = requests.get('https://raw.githubusercontent.com/GemmyMoon/MultipleUrls/master/urls.csv')
urls = r.text.splitlines()
print(len(urls)) # Returns 6051
and
import csv
import requests
from io import StringIO
r = requests.get('https://raw.githubusercontent.com/GemmyMoon/MultipleUrls/master/urls.csv')
reader = csv.reader(StringIO(r.text))
urls = [line[0] for line in reader]
print(len(urls)) # Returns 6051
Wow, I will try these. Thank you for your check!
– Hyoungeun Moon
Mar 29 at 0:28
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55328465%2fhow-to-read-the-entire-urls-from-the-first-column-in-a-csv-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The following two approaches work for me:
import requests
r = requests.get('https://raw.githubusercontent.com/GemmyMoon/MultipleUrls/master/urls.csv')
urls = r.text.splitlines()
print(len(urls)) # Returns 6051
and
import csv
import requests
from io import StringIO
r = requests.get('https://raw.githubusercontent.com/GemmyMoon/MultipleUrls/master/urls.csv')
reader = csv.reader(StringIO(r.text))
urls = [line[0] for line in reader]
print(len(urls)) # Returns 6051
Wow, I will try these. Thank you for your check!
– Hyoungeun Moon
Mar 29 at 0:28
add a comment |
The following two approaches work for me:
import requests
r = requests.get('https://raw.githubusercontent.com/GemmyMoon/MultipleUrls/master/urls.csv')
urls = r.text.splitlines()
print(len(urls)) # Returns 6051
and
import csv
import requests
from io import StringIO
r = requests.get('https://raw.githubusercontent.com/GemmyMoon/MultipleUrls/master/urls.csv')
reader = csv.reader(StringIO(r.text))
urls = [line[0] for line in reader]
print(len(urls)) # Returns 6051
Wow, I will try these. Thank you for your check!
– Hyoungeun Moon
Mar 29 at 0:28
add a comment |
The following two approaches work for me:
import requests
r = requests.get('https://raw.githubusercontent.com/GemmyMoon/MultipleUrls/master/urls.csv')
urls = r.text.splitlines()
print(len(urls)) # Returns 6051
and
import csv
import requests
from io import StringIO
r = requests.get('https://raw.githubusercontent.com/GemmyMoon/MultipleUrls/master/urls.csv')
reader = csv.reader(StringIO(r.text))
urls = [line[0] for line in reader]
print(len(urls)) # Returns 6051
The following two approaches work for me:
import requests
r = requests.get('https://raw.githubusercontent.com/GemmyMoon/MultipleUrls/master/urls.csv')
urls = r.text.splitlines()
print(len(urls)) # Returns 6051
and
import csv
import requests
from io import StringIO
r = requests.get('https://raw.githubusercontent.com/GemmyMoon/MultipleUrls/master/urls.csv')
reader = csv.reader(StringIO(r.text))
urls = [line[0] for line in reader]
print(len(urls)) # Returns 6051
answered Mar 28 at 10:13
Elias StrehleElias Strehle
416416
416416
Wow, I will try these. Thank you for your check!
– Hyoungeun Moon
Mar 29 at 0:28
add a comment |
Wow, I will try these. Thank you for your check!
– Hyoungeun Moon
Mar 29 at 0:28
Wow, I will try these. Thank you for your check!
– Hyoungeun Moon
Mar 29 at 0:28
Wow, I will try these. Thank you for your check!
– Hyoungeun Moon
Mar 29 at 0:28
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55328465%2fhow-to-read-the-entire-urls-from-the-first-column-in-a-csv-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Is it possible that you post part of the CSV file here? Might be a formatting issue (unexpected newline characters or separators). Since your CSV is not very large, you could read it in in full
with open(...) as csvfile: content = file.read()
. Then you could do some analysis to check if the file is read correctly, e.g. see iflen(content.splitlines()) == 6051
.– Elias Strehle
Mar 24 at 21:08
@EliasStrehle Thank you for your suggestion. I tried your code and it shows the total number of 6051. However, the links are still only 65. In addition, using the codes you suggested, I cannot extract the textual data that I was able to (please see the attached image in my question). Would you mind me sending me the csv file to your email or other channels? I was unable to find how to attach the file here. Thank you again!
– Hyoungeun Moon
Mar 24 at 21:43
Try to separate getting the list of URLs from actually crawling them. That makes it easier to debug. What do you see when you run the loop with
print(row)
? Does it print only 65 URLs or does it print 6051 URLs?– Elias Strehle
Mar 24 at 21:48
When I run print(row), it shows the 65 urls although there are actually 6051 urls in the first column. I am not sure why the codes can't read all 6051 urls. I need all the urls to scrawl down the textual data from them (shown in the image above). Would you mind trying again with my csv file?
– Hyoungeun Moon
Mar 26 at 5:44
Can you upload somewhere public? On GitHub for example?
– Elias Strehle
Mar 26 at 9:18