Problem with files downloaded by using pythonHow do I check whether a file exists without exceptions?Calling an external command in PythonWhat are metaclasses in Python?How do I copy a file in Python?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?How can I safely create a nested directory?Does Python have a ternary conditional operator?How do I list all files of a directory?Does Python have a string 'contains' substring method?

CircuiTikZ: How to draw contactor coil?

How to publish items after pipeline is finished?

Amplitude of a crest and trough in a sound wave?

How do we say "within a kilometer radius spherically"?

Russian word for a male zebra

What is this Amiga 1200 mod?

Analogy between an unknown in an argument, and a contradiction in the principle of explosion

How to write a convincing religious myth?

Does a bank have to tell me if a check made out to me was cashed there?

How can I make 12 tone and atonal melodies sound interesting?

bash vs. zsh: What are the practical differences?

Why are MBA programs closing in the United States?

What is the color of artificial intelligence?

What STL algorithm can determine if exactly one item in a container satisfies a predicate?

Is it possible to fly backward if you have really strong headwind?

Getting UPS Power from One Room to Another

If I leave the US through an airport, do I have to return through the same airport?

Java Servlet & JSP simple login

Can we completely replace inheritance using strategy pattern and dependency injection?

Who won a Game of Bar Dice?

Sci-fi novel: ark ship from Earth is sent into space to another planet, one man woken early from cryosleep paints a giant mural

Is it possible to have 2 different but equal size real number sets that have the same mean and standard deviation?

Smart-expansion of a range to a list of numbers

Increase speed altering column on large table to NON NULL



Problem with files downloaded by using python


How do I check whether a file exists without exceptions?Calling an external command in PythonWhat are metaclasses in Python?How do I copy a file in Python?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?How can I safely create a nested directory?Does Python have a ternary conditional operator?How do I list all files of a directory?Does Python have a string 'contains' substring method?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I am trying to download some jpgs from the site and save them on my hard drive, but when I do that I can't open files due to it's problem with formatting, all of these files have also 115kb for some reason.



I've tried changing the chunk size and played a little with the request(), but it didn't work. There are no errors in the shell. The website's link is proper.



url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
print('no images')
else:
for i in range(len(elem)):
link = elem[i].get('src')
if link != None:
plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
for chunk in res.iter_content(100000):
plik.write(chunk)
plik.close()
print('downloaded %s' % os.path.basename(link))


Solution (in the 'for i...' loop):



url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
print('no images')
else:
for i in range(len(elem)):
link = url + elem[i].get('src')
res2 = requests.get(link)
res2.raise_for_status()
if link != None:
plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
for chunk in res.iter_content(100000):
plik.write(chunk)
plik.close()
print('downloaded %s' % os.path.basename(link))









share|improve this question



















  • 1





    That (especially the identical file sizes) sounds an awful lot like a 404 or other error. Have you looked at the files in a text editor?

    – Ed Cottrell
    Mar 24 at 20:43











  • Hmm, when I open it with text editor I can see some html with things written on the main site. (I don't really know much about html and css to be honest) Like it didn't download the picture. Do you think it's something wrong with my select() function?

    – Ensien
    Mar 24 at 20:48











  • I don't know much about BeautifulSoup, so I don't know where your error is. I do know, though, that if you are trying to download images and are getting HTML then something is wrong with how you're requesting the images. This is a good candidate for some old-fashioned debugging (walk through the program step-by-step and inspect variables to make sure you're doing what you think you're doing).

    – Ed Cottrell
    Mar 24 at 20:51

















0















I am trying to download some jpgs from the site and save them on my hard drive, but when I do that I can't open files due to it's problem with formatting, all of these files have also 115kb for some reason.



I've tried changing the chunk size and played a little with the request(), but it didn't work. There are no errors in the shell. The website's link is proper.



url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
print('no images')
else:
for i in range(len(elem)):
link = elem[i].get('src')
if link != None:
plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
for chunk in res.iter_content(100000):
plik.write(chunk)
plik.close()
print('downloaded %s' % os.path.basename(link))


Solution (in the 'for i...' loop):



url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
print('no images')
else:
for i in range(len(elem)):
link = url + elem[i].get('src')
res2 = requests.get(link)
res2.raise_for_status()
if link != None:
plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
for chunk in res.iter_content(100000):
plik.write(chunk)
plik.close()
print('downloaded %s' % os.path.basename(link))









share|improve this question



















  • 1





    That (especially the identical file sizes) sounds an awful lot like a 404 or other error. Have you looked at the files in a text editor?

    – Ed Cottrell
    Mar 24 at 20:43











  • Hmm, when I open it with text editor I can see some html with things written on the main site. (I don't really know much about html and css to be honest) Like it didn't download the picture. Do you think it's something wrong with my select() function?

    – Ensien
    Mar 24 at 20:48











  • I don't know much about BeautifulSoup, so I don't know where your error is. I do know, though, that if you are trying to download images and are getting HTML then something is wrong with how you're requesting the images. This is a good candidate for some old-fashioned debugging (walk through the program step-by-step and inspect variables to make sure you're doing what you think you're doing).

    – Ed Cottrell
    Mar 24 at 20:51













0












0








0








I am trying to download some jpgs from the site and save them on my hard drive, but when I do that I can't open files due to it's problem with formatting, all of these files have also 115kb for some reason.



I've tried changing the chunk size and played a little with the request(), but it didn't work. There are no errors in the shell. The website's link is proper.



url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
print('no images')
else:
for i in range(len(elem)):
link = elem[i].get('src')
if link != None:
plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
for chunk in res.iter_content(100000):
plik.write(chunk)
plik.close()
print('downloaded %s' % os.path.basename(link))


Solution (in the 'for i...' loop):



url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
print('no images')
else:
for i in range(len(elem)):
link = url + elem[i].get('src')
res2 = requests.get(link)
res2.raise_for_status()
if link != None:
plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
for chunk in res.iter_content(100000):
plik.write(chunk)
plik.close()
print('downloaded %s' % os.path.basename(link))









share|improve this question
















I am trying to download some jpgs from the site and save them on my hard drive, but when I do that I can't open files due to it's problem with formatting, all of these files have also 115kb for some reason.



I've tried changing the chunk size and played a little with the request(), but it didn't work. There are no errors in the shell. The website's link is proper.



url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
print('no images')
else:
for i in range(len(elem)):
link = elem[i].get('src')
if link != None:
plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
for chunk in res.iter_content(100000):
plik.write(chunk)
plik.close()
print('downloaded %s' % os.path.basename(link))


Solution (in the 'for i...' loop):



url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
print('no images')
else:
for i in range(len(elem)):
link = url + elem[i].get('src')
res2 = requests.get(link)
res2.raise_for_status()
if link != None:
plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
for chunk in res.iter_content(100000):
plik.write(chunk)
plik.close()
print('downloaded %s' % os.path.basename(link))






python beautifulsoup downloading-website-files






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 24 at 21:10







Ensien

















asked Mar 24 at 20:40









EnsienEnsien

34




34







  • 1





    That (especially the identical file sizes) sounds an awful lot like a 404 or other error. Have you looked at the files in a text editor?

    – Ed Cottrell
    Mar 24 at 20:43











  • Hmm, when I open it with text editor I can see some html with things written on the main site. (I don't really know much about html and css to be honest) Like it didn't download the picture. Do you think it's something wrong with my select() function?

    – Ensien
    Mar 24 at 20:48











  • I don't know much about BeautifulSoup, so I don't know where your error is. I do know, though, that if you are trying to download images and are getting HTML then something is wrong with how you're requesting the images. This is a good candidate for some old-fashioned debugging (walk through the program step-by-step and inspect variables to make sure you're doing what you think you're doing).

    – Ed Cottrell
    Mar 24 at 20:51












  • 1





    That (especially the identical file sizes) sounds an awful lot like a 404 or other error. Have you looked at the files in a text editor?

    – Ed Cottrell
    Mar 24 at 20:43











  • Hmm, when I open it with text editor I can see some html with things written on the main site. (I don't really know much about html and css to be honest) Like it didn't download the picture. Do you think it's something wrong with my select() function?

    – Ensien
    Mar 24 at 20:48











  • I don't know much about BeautifulSoup, so I don't know where your error is. I do know, though, that if you are trying to download images and are getting HTML then something is wrong with how you're requesting the images. This is a good candidate for some old-fashioned debugging (walk through the program step-by-step and inspect variables to make sure you're doing what you think you're doing).

    – Ed Cottrell
    Mar 24 at 20:51







1




1





That (especially the identical file sizes) sounds an awful lot like a 404 or other error. Have you looked at the files in a text editor?

– Ed Cottrell
Mar 24 at 20:43





That (especially the identical file sizes) sounds an awful lot like a 404 or other error. Have you looked at the files in a text editor?

– Ed Cottrell
Mar 24 at 20:43













Hmm, when I open it with text editor I can see some html with things written on the main site. (I don't really know much about html and css to be honest) Like it didn't download the picture. Do you think it's something wrong with my select() function?

– Ensien
Mar 24 at 20:48





Hmm, when I open it with text editor I can see some html with things written on the main site. (I don't really know much about html and css to be honest) Like it didn't download the picture. Do you think it's something wrong with my select() function?

– Ensien
Mar 24 at 20:48













I don't know much about BeautifulSoup, so I don't know where your error is. I do know, though, that if you are trying to download images and are getting HTML then something is wrong with how you're requesting the images. This is a good candidate for some old-fashioned debugging (walk through the program step-by-step and inspect variables to make sure you're doing what you think you're doing).

– Ed Cottrell
Mar 24 at 20:51





I don't know much about BeautifulSoup, so I don't know where your error is. I do know, though, that if you are trying to download images and are getting HTML then something is wrong with how you're requesting the images. This is a good candidate for some old-fashioned debugging (walk through the program step-by-step and inspect variables to make sure you're doing what you think you're doing).

– Ed Cottrell
Mar 24 at 20:51












1 Answer
1






active

oldest

votes


















0














After reading the html page response and extracting the src of the image you will have to use that to make another http(s) request to stream the image from that url.



At the moment it appears that you are trying to continue reading from the initial response.



Note: For all links and anchors, browsers make further http request






share|improve this answer























  • You were right. I edited the post with the right code. Thank you!

    – Ensien
    Mar 24 at 21:10











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55328364%2fproblem-with-files-downloaded-by-using-python%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














After reading the html page response and extracting the src of the image you will have to use that to make another http(s) request to stream the image from that url.



At the moment it appears that you are trying to continue reading from the initial response.



Note: For all links and anchors, browsers make further http request






share|improve this answer























  • You were right. I edited the post with the right code. Thank you!

    – Ensien
    Mar 24 at 21:10















0














After reading the html page response and extracting the src of the image you will have to use that to make another http(s) request to stream the image from that url.



At the moment it appears that you are trying to continue reading from the initial response.



Note: For all links and anchors, browsers make further http request






share|improve this answer























  • You were right. I edited the post with the right code. Thank you!

    – Ensien
    Mar 24 at 21:10













0












0








0







After reading the html page response and extracting the src of the image you will have to use that to make another http(s) request to stream the image from that url.



At the moment it appears that you are trying to continue reading from the initial response.



Note: For all links and anchors, browsers make further http request






share|improve this answer













After reading the html page response and extracting the src of the image you will have to use that to make another http(s) request to stream the image from that url.



At the moment it appears that you are trying to continue reading from the initial response.



Note: For all links and anchors, browsers make further http request







share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 24 at 20:56









GroGro

50549




50549












  • You were right. I edited the post with the right code. Thank you!

    – Ensien
    Mar 24 at 21:10

















  • You were right. I edited the post with the right code. Thank you!

    – Ensien
    Mar 24 at 21:10
















You were right. I edited the post with the right code. Thank you!

– Ensien
Mar 24 at 21:10





You were right. I edited the post with the right code. Thank you!

– Ensien
Mar 24 at 21:10



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55328364%2fproblem-with-files-downloaded-by-using-python%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript