Problem with files downloaded by using pythonHow do I check whether a file exists without exceptions?Calling an external command in PythonWhat are metaclasses in Python?How do I copy a file in Python?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?How can I safely create a nested directory?Does Python have a ternary conditional operator?How do I list all files of a directory?Does Python have a string 'contains' substring method?

CircuiTikZ: How to draw contactor coil?

How to publish items after pipeline is finished?

Amplitude of a crest and trough in a sound wave?

How do we say "within a kilometer radius spherically"?

Russian word for a male zebra

What is this Amiga 1200 mod?

Analogy between an unknown in an argument, and a contradiction in the principle of explosion

How to write a convincing religious myth?

Does a bank have to tell me if a check made out to me was cashed there?

How can I make 12 tone and atonal melodies sound interesting?

bash vs. zsh: What are the practical differences?

Why are MBA programs closing in the United States?

What is the color of artificial intelligence?

What STL algorithm can determine if exactly one item in a container satisfies a predicate?

Is it possible to fly backward if you have really strong headwind?

Getting UPS Power from One Room to Another

If I leave the US through an airport, do I have to return through the same airport?

Java Servlet & JSP simple login

Can we completely replace inheritance using strategy pattern and dependency injection?

Who won a Game of Bar Dice?

Sci-fi novel: ark ship from Earth is sent into space to another planet, one man woken early from cryosleep paints a giant mural

Is it possible to have 2 different but equal size real number sets that have the same mean and standard deviation?

Smart-expansion of a range to a list of numbers

Increase speed altering column on large table to NON NULL

Problem with files downloaded by using python

How do I check whether a file exists without exceptions?Calling an external command in PythonWhat are metaclasses in Python?How do I copy a file in Python?Finding the index of an item given a list containing it in PythonWhat is the difference between Python's list methods append and extend?How can I safely create a nested directory?Does Python have a ternary conditional operator?How do I list all files of a directory?Does Python have a string 'contains' substring method?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I am trying to download some jpgs from the site and save them on my hard drive, but when I do that I can't open files due to it's problem with formatting, all of these files have also 115kb for some reason.

I've tried changing the chunk size and played a little with the request(), but it didn't work. There are no errors in the shell. The website's link is proper.

url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
 print('no images')
else:
 for i in range(len(elem)):
 link = elem[i].get('src')
 if link != None:
 plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
 for chunk in res.iter_content(100000):
 plik.write(chunk)
 plik.close()
 print('downloaded %s' % os.path.basename(link))

Solution (in the 'for i...' loop):

url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
 print('no images')
else:
 for i in range(len(elem)):
 link = url + elem[i].get('src')
 res2 = requests.get(link)
 res2.raise_for_status()
 if link != None:
 plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
 for chunk in res.iter_content(100000):
 plik.write(chunk)
 plik.close()
 print('downloaded %s' % os.path.basename(link))

edited Mar 24 at 21:10

asked Mar 24 at 20:40

Ensien

1

That (especially the identical file sizes) sounds an awful lot like a 404 or other error. Have you looked at the files in a text editor?

– Ed Cottrell♦
Mar 24 at 20:43

Hmm, when I open it with text editor I can see some html with things written on the main site. (I don't really know much about html and css to be honest) Like it didn't download the picture. Do you think it's something wrong with my select() function?

– Ensien
Mar 24 at 20:48

I don't know much about BeautifulSoup, so I don't know where your error is. I do know, though, that if you are trying to download images and are getting HTML then something is wrong with how you're requesting the images. This is a good candidate for some old-fashioned debugging (walk through the program step-by-step and inspect variables to make sure you're doing what you think you're doing).

– Ed Cottrell♦
Mar 24 at 20:51

add a comment |

I've tried changing the chunk size and played a little with the request(), but it didn't work. There are no errors in the shell. The website's link is proper.

url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
 print('no images')
else:
 for i in range(len(elem)):
 link = elem[i].get('src')
 if link != None:
 plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
 for chunk in res.iter_content(100000):
 plik.write(chunk)
 plik.close()
 print('downloaded %s' % os.path.basename(link))

Solution (in the 'for i...' loop):

url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
 print('no images')
else:
 for i in range(len(elem)):
 link = url + elem[i].get('src')
 res2 = requests.get(link)
 res2.raise_for_status()
 if link != None:
 plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
 for chunk in res.iter_content(100000):
 plik.write(chunk)
 plik.close()
 print('downloaded %s' % os.path.basename(link))

edited Mar 24 at 21:10

asked Mar 24 at 20:40

Ensien

1

That (especially the identical file sizes) sounds an awful lot like a 404 or other error. Have you looked at the files in a text editor?

– Ed Cottrell♦
Mar 24 at 20:43

Hmm, when I open it with text editor I can see some html with things written on the main site. (I don't really know much about html and css to be honest) Like it didn't download the picture. Do you think it's something wrong with my select() function?

– Ensien
Mar 24 at 20:48

I don't know much about BeautifulSoup, so I don't know where your error is. I do know, though, that if you are trying to download images and are getting HTML then something is wrong with how you're requesting the images. This is a good candidate for some old-fashioned debugging (walk through the program step-by-step and inspect variables to make sure you're doing what you think you're doing).

– Ed Cottrell♦
Mar 24 at 20:51

add a comment |

I've tried changing the chunk size and played a little with the request(), but it didn't work. There are no errors in the shell. The website's link is proper.

url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
 print('no images')
else:
 for i in range(len(elem)):
 link = elem[i].get('src')
 if link != None:
 plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
 for chunk in res.iter_content(100000):
 plik.write(chunk)
 plik.close()
 print('downloaded %s' % os.path.basename(link))

Solution (in the 'for i...' loop):

url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
 print('no images')
else:
 for i in range(len(elem)):
 link = url + elem[i].get('src')
 res2 = requests.get(link)
 res2.raise_for_status()
 if link != None:
 plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
 for chunk in res.iter_content(100000):
 plik.write(chunk)
 plik.close()
 print('downloaded %s' % os.path.basename(link))

edited Mar 24 at 21:10

asked Mar 24 at 20:40

Ensien

I've tried changing the chunk size and played a little with the request(), but it didn't work. There are no errors in the shell. The website's link is proper.

url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
 print('no images')
else:
 for i in range(len(elem)):
 link = elem[i].get('src')
 if link != None:
 plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
 for chunk in res.iter_content(100000):
 plik.write(chunk)
 plik.close()
 print('downloaded %s' % os.path.basename(link))

Solution (in the 'for i...' loop):

url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
 print('no images')
else:
 for i in range(len(elem)):
 link = url + elem[i].get('src')
 res2 = requests.get(link)
 res2.raise_for_status()
 if link != None:
 plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
 for chunk in res.iter_content(100000):
 plik.write(chunk)
 plik.close()
 print('downloaded %s' % os.path.basename(link))

python beautifulsoup downloading-website-files

edited Mar 24 at 21:10

asked Mar 24 at 20:40

Ensien

edited Mar 24 at 21:10

asked Mar 24 at 20:40

Ensien

edited Mar 24 at 21:10

asked Mar 24 at 20:40

Ensien

asked Mar 24 at 20:40

Ensien

asked Mar 24 at 20:40

Ensien

1

That (especially the identical file sizes) sounds an awful lot like a 404 or other error. Have you looked at the files in a text editor?

– Ed Cottrell♦
Mar 24 at 20:43

Hmm, when I open it with text editor I can see some html with things written on the main site. (I don't really know much about html and css to be honest) Like it didn't download the picture. Do you think it's something wrong with my select() function?

– Ensien
Mar 24 at 20:48

I don't know much about BeautifulSoup, so I don't know where your error is. I do know, though, that if you are trying to download images and are getting HTML then something is wrong with how you're requesting the images. This is a good candidate for some old-fashioned debugging (walk through the program step-by-step and inspect variables to make sure you're doing what you think you're doing).

– Ed Cottrell♦
Mar 24 at 20:51

add a comment |

1

That (especially the identical file sizes) sounds an awful lot like a 404 or other error. Have you looked at the files in a text editor?

– Ed Cottrell♦
Mar 24 at 20:43

Hmm, when I open it with text editor I can see some html with things written on the main site. (I don't really know much about html and css to be honest) Like it didn't download the picture. Do you think it's something wrong with my select() function?

– Ensien
Mar 24 at 20:48

I don't know much about BeautifulSoup, so I don't know where your error is. I do know, though, that if you are trying to download images and are getting HTML then something is wrong with how you're requesting the images. This is a good candidate for some old-fashioned debugging (walk through the program step-by-step and inspect variables to make sure you're doing what you think you're doing).

– Ed Cottrell♦
Mar 24 at 20:51

That (especially the identical file sizes) sounds an awful lot like a 404 or other error. Have you looked at the files in a text editor?

– Ed Cottrell♦
Mar 24 at 20:43

Hmm, when I open it with text editor I can see some html with things written on the main site. (I don't really know much about html and css to be honest) Like it didn't download the picture. Do you think it's something wrong with my select() function?

– Ensien
Mar 24 at 20:48

I don't know much about BeautifulSoup, so I don't know where your error is. I do know, though, that if you are trying to download images and are getting HTML then something is wrong with how you're requesting the images. This is a good candidate for some old-fashioned debugging (walk through the program step-by-step and inspect variables to make sure you're doing what you think you're doing).

– Ed Cottrell♦
Mar 24 at 20:51

add a comment |

1 Answer
1

active

oldest

votes

After reading the html page response and extracting the src of the image you will have to use that to make another http(s) request to stream the image from that url.

At the moment it appears that you are trying to continue reading from the initial response.

Note: For all links and anchors, browsers make further http request

answered Mar 24 at 20:56

Gro

50549

You were right. I edited the post with the right code. Thank you!

– Ensien
Mar 24 at 21:10

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55328364%2fproblem-with-files-downloaded-by-using-python%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

After reading the html page response and extracting the src of the image you will have to use that to make another http(s) request to stream the image from that url.

At the moment it appears that you are trying to continue reading from the initial response.

Note: For all links and anchors, browsers make further http request

answered Mar 24 at 20:56

Gro

50549

You were right. I edited the post with the right code. Thank you!

– Ensien
Mar 24 at 21:10

add a comment |

After reading the html page response and extracting the src of the image you will have to use that to make another http(s) request to stream the image from that url.

At the moment it appears that you are trying to continue reading from the initial response.

Note: For all links and anchors, browsers make further http request

answered Mar 24 at 20:56

Gro

50549

You were right. I edited the post with the right code. Thank you!

– Ensien
Mar 24 at 21:10

add a comment |

After reading the html page response and extracting the src of the image you will have to use that to make another http(s) request to stream the image from that url.

At the moment it appears that you are trying to continue reading from the initial response.

Note: For all links and anchors, browsers make further http request

answered Mar 24 at 20:56

Gro

50549

After reading the html page response and extracting the src of the image you will have to use that to make another http(s) request to stream the image from that url.

At the moment it appears that you are trying to continue reading from the initial response.

Note: For all links and anchors, browsers make further http request

answered Mar 24 at 20:56

Gro

50549

answered Mar 24 at 20:56

Gro

50549

answered Mar 24 at 20:56

Gro

50549

answered Mar 24 at 20:56

Gro

50549

You were right. I edited the post with the right code. Thank you!

– Ensien
Mar 24 at 21:10

add a comment |

You were right. I edited the post with the right code. Thank you!

– Ensien
Mar 24 at 21:10

You were right. I edited the post with the right code. Thank you!

– Ensien
Mar 24 at 21:10

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Styjun

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

밀양 대씨 역사 각주 함께 보기 둘러보기 메뉴밀양 대씨

1973년 목차 사건 문화 탄생 사망 노벨상 달력 둘러보기 메뉴

1 Answer
1

1 Answer
1

1 Answer
1