Web scraping from remax.comPython Requests throwing SSLErrorWeb scrapping remax.com for pythonRest service working in Postman but not in Python IDE (Anaconda)Why can't I establish a connection to the Uber API?SSL Certificate error while doing a request via pythonpython, newspaper,unhashable type: 'tzutc' and writing to dataframeUnable to connect to tableau server 10.5 using TableauServerClient python librarySSL Error executing IBM Watson Python SDKPython SSL Bad HandshakeNumba jit giving a AssertionError, TypingError, and LoweringErrorwhy the translation api of python version demo api timeout when invoked'set' object has no attribute 'setdefault'. Error in scraping data using using Requests
How soon after takeoff can you recline your airplane seat?
Lenovo Legion PXI-E61 Media Test Failure, Check Cable. Exiting PXE ROM. Then restarts and works fine
How to extract coefficients of a generating function like this one, using a computer?
Which high-degree derivatives play an essential role?
Wings for orbital transfer bioships?
Installed software from source, how to say yum not to install it from package?
Why is my 401k manager recommending me to save more?
How to track mail undetectably?
Finding an optimal set without forbidden subsets
Why am I getting an electric shock from the water in my hot tub?
Could citing a database like libgen get one into trouble?
Advantages of using bra-ket notation
To “Er” Is Human
What is the meaning of ゴト in the context of 鮎
What prevents a US state from colonizing a smaller state?
2019 2-letters 33-length list
What was the point of separating stdout and stderr?
Classify 2-dim p-adic galois representations
Are all notation equal by derivatives?
Tricky riddle from sister
Why should I allow multiple IP addresses on a website for a single session?
When does it become illegal to exchange bitcoin for cash?
Why is the saxophone not common in classical repertoire?
What is the point of using the kunai?
Web scraping from remax.com
Python Requests throwing SSLErrorWeb scrapping remax.com for pythonRest service working in Postman but not in Python IDE (Anaconda)Why can't I establish a connection to the Uber API?SSL Certificate error while doing a request via pythonpython, newspaper,unhashable type: 'tzutc' and writing to dataframeUnable to connect to tableau server 10.5 using TableauServerClient python librarySSL Error executing IBM Watson Python SDKPython SSL Bad HandshakeNumba jit giving a AssertionError, TypingError, and LoweringErrorwhy the translation api of python version demo api timeout when invoked'set' object has no attribute 'setdefault'. Error in scraping data using using Requests
I am trying to scrape some data from Remax.com for information like lotsize or square feet of property. Although I am get the following errors:
---------------------------------------------------------------------------
Error Traceback (most recent call last)
~AppDataLocalContinuumanaconda3libsite-packagesurllib3contribpyopenssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname)
440 try:
--> 441 cnx.do_handshake()
442 except OpenSSL.SSL.WantReadError:
~AppDataLocalContinuumanaconda3libsite-packagesOpenSSLSSL.py in do_handshake(self)
1715 result = _lib.SSL_do_handshake(self._ssl)
-> 1716 self._raise_ssl_error(self._ssl, result)
1717
~AppDataLocalContinuumanaconda3libsite-packagesOpenSSLSSL.py in _raise_ssl_error(self, ssl, result)
1455 else:
-> 1456 _raise_current_error()
1457
~AppDataLocalContinuumanaconda3libsite-packagesOpenSSL_util.py in exception_from_error_queue(exception_type)
53
---> 54 raise exception_type(errors)
55
Error: [('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')]
During handling of the above exception, another exception occurred:
SSLError Traceback (most recent call last)
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
600 body=body, headers=headers,
--> 601 chunked=chunked)
602
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
345 try:
--> 346 self._validate_conn(conn)
347 except (SocketTimeout, BaseSSLError) as e:
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connectionpool.py in _validate_conn(self, conn)
849 if not getattr(conn, 'sock', None): # AppEngine might not have `.sock`
--> 850 conn.connect()
851
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connection.py in connect(self)
325 server_hostname=hostname,
--> 326 ssl_context=context)
327
~AppDataLocalContinuumanaconda3libsite-packagesurllib3utilssl_.py in ssl_wrap_socket(sock, keyfile, certfile, cert_reqs, ca_certs, server_hostname, ssl_version, ciphers, ssl_context, ca_cert_dir)
328 if HAS_SNI: # Platform-specific: OpenSSL with enabled SNI
--> 329 return context.wrap_socket(sock, server_hostname=server_hostname)
330
~AppDataLocalContinuumanaconda3libsite-packagesurllib3contribpyopenssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname)
447 except OpenSSL.SSL.Error as e:
--> 448 raise ssl.SSLError('bad handshake: %r' % e)
449 break
SSLError: ("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",)
During handling of the above exception, another exception occurred:
MaxRetryError Traceback (most recent call last)
~AppDataLocalContinuumanaconda3libsite-packagesrequestsadapters.py in send(self, request, stream, timeout, verify, cert, proxies)
439 retries=self.max_retries,
--> 440 timeout=timeout
441 )
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
638 retries = retries.increment(method, url, error=e, _pool=self,
--> 639 _stacktrace=sys.exc_info()[2])
640 retries.sleep()
~AppDataLocalContinuumanaconda3libsite-packagesurllib3utilretry.py in increment(self, method, url, response, error, _pool, _stacktrace)
387 if new_retry.is_exhausted():
--> 388 raise MaxRetryError(_pool, url, error or ResponseError(cause))
389
MaxRetryError: HTTPSConnectionPool(host='www.remax.com', port=443): Max retries exceeded with url: /api/listings?nwlat=33.8426971435546875&nwlong=-118.3811187744140625&selat=33.8426971435546875&selong=-118.3783721923828125&Count=100&pagenumber=1&SiteID=68000000&pageCount=10&tab=map&sh=true&forcelatlong=true&maplistings=1&maplistcards=0&sv=true&sortorder=newest&view=forsale (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",),))
During handling of the above exception, another exception occurred:
SSLError Traceback (most recent call last)
<ipython-input-22-bcfdfdfb0a4e> in <module>()
----> 1 get_info('119 S IRENA AVE B, Redondo Beach, CA 90277')
<ipython-input-21-f3c942a87400> in get_info(address)
32 }
33 # proxies = 'http': 'http://user:pass@10.10.1.10:3128/'
---> 34 req_properties = requests.get("https://www.remax.com/api/listings", params=params)
35 matching_properties_json = req_properties.json()
36 for p in matching_properties_json[0]:
~AppDataLocalContinuumanaconda3libsite-packagesrequestsapi.py in get(url, params, **kwargs)
70
71 kwargs.setdefault('allow_redirects', True)
---> 72 return request('get', url, params=params, **kwargs)
73
74
~AppDataLocalContinuumanaconda3libsite-packagesrequestsapi.py in request(method, url, **kwargs)
56 # cases, and look like a memory leak in others.
57 with sessions.Session() as session:
---> 58 return session.request(method=method, url=url, **kwargs)
59
60
~AppDataLocalContinuumanaconda3libsite-packagesrequestssessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
506 }
507 send_kwargs.update(settings)
--> 508 resp = self.send(prep, **send_kwargs)
509
510 return resp
~AppDataLocalContinuumanaconda3libsite-packagesrequestssessions.py in send(self, request, **kwargs)
616
617 # Send the request
--> 618 r = adapter.send(request, **kwargs)
619
620 # Total elapsed time of the request (approximately)
~AppDataLocalContinuumanaconda3libsite-packagesrequestsadapters.py in send(self, request, stream, timeout, verify, cert, proxies)
504 if isinstance(e.reason, _SSLError):
505 # This branch is for urllib3 v1.22 and later.
--> 506 raise SSLError(e, request=request)
507
508 raise ConnectionError(e, request=request)
SSLError: HTTPSConnectionPool(host='www.remax.com', port=443): Max retries exceeded with url: /api/listings?nwlat=33.8426971435546875&nwlong=-118.3811187744140625&selat=33.8426971435546875&selong=-118.3783721923828125&Count=100&pagenumber=1&SiteID=68000000&pageCount=10&tab=map&sh=true&forcelatlong=true&maplistings=1&maplistcards=0&sv=true&sortorder=newest&view=forsale (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",),))
Here is my code:
import urllib
from bs4 import BeautifulSoup
import pandas as pd
import geopy
from geopy.geocoders import Nominatim
import geolib
from geolib import geohash
from geopy.extra.rate_limiter import RateLimiter
import requests
geolocator = Nominatim(timeout=None)
def get_dir(address):
location = geolocator.geocode(address)
lat = location.latitude
lng = location.longitude
h = geolib.geohash.encode(lat, lng, 7)
hashes = geolib.geohash.neighbours(h)
NW = geohash.decode(hashes.nw)
SE = geohash.decode(hashes.ne)
nwlat = NW.lat
nwlon = NW.lon
selat = SE.lat
selon = SE.lon
return nwlat, nwlon, selat, selon
def get_info(address):
try:
nwlat, nwlon, selat, selon = get_dir(address)
params =
"nwlat" : nwlat,
"nwlong" : nwlon,
"selat" : selat,
"selong" : selon,
"Count" : 100,
"pagenumber" : 1,
"SiteID" : "68000000",
"pageCount" : "10",
"tab" : "map",
"sh" : "true",
"forcelatlong" : "true",
"maplistings" : "1",
"maplistcards" : "0",
"sv" : "true",
"sortorder" : "newest",
"view" : "homeestimates",
proxies = 'http': 'http://user:pass@10.10.1.10:3128/'
req_properties = requests.get("https://www.remax.com/api/listings", params=params, proxies=proxies, verify=False)
matching_properties_json = req_properties.json()
for p in matching_properties_json[0]:
print(f"p['Address']:<40 p.get('BedRooms', 0) beds | int(p.get('BathRooms',0)) baths | p['SqFt'] sqft")
except (AttributeError):
return 'NaN'
x = get_info('693 Bluebird Canyon Drive, Laguna Beach CA, 92651')
print(x)
I am not sure how to fix this problem as I am new to web scraping, I tried adding a proxy in the code but I still get the same errors in the latter above.
Update:
adding
proxies = 'http': 'http://user:pass@10.10.1.10:3128/'
req_properties = requests.get("https://www.remax.com/api/listings", params=params, proxies=proxies, verify=False)
yields no errors but also no output at all.
python web-scraping
|
show 1 more comment
I am trying to scrape some data from Remax.com for information like lotsize or square feet of property. Although I am get the following errors:
---------------------------------------------------------------------------
Error Traceback (most recent call last)
~AppDataLocalContinuumanaconda3libsite-packagesurllib3contribpyopenssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname)
440 try:
--> 441 cnx.do_handshake()
442 except OpenSSL.SSL.WantReadError:
~AppDataLocalContinuumanaconda3libsite-packagesOpenSSLSSL.py in do_handshake(self)
1715 result = _lib.SSL_do_handshake(self._ssl)
-> 1716 self._raise_ssl_error(self._ssl, result)
1717
~AppDataLocalContinuumanaconda3libsite-packagesOpenSSLSSL.py in _raise_ssl_error(self, ssl, result)
1455 else:
-> 1456 _raise_current_error()
1457
~AppDataLocalContinuumanaconda3libsite-packagesOpenSSL_util.py in exception_from_error_queue(exception_type)
53
---> 54 raise exception_type(errors)
55
Error: [('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')]
During handling of the above exception, another exception occurred:
SSLError Traceback (most recent call last)
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
600 body=body, headers=headers,
--> 601 chunked=chunked)
602
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
345 try:
--> 346 self._validate_conn(conn)
347 except (SocketTimeout, BaseSSLError) as e:
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connectionpool.py in _validate_conn(self, conn)
849 if not getattr(conn, 'sock', None): # AppEngine might not have `.sock`
--> 850 conn.connect()
851
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connection.py in connect(self)
325 server_hostname=hostname,
--> 326 ssl_context=context)
327
~AppDataLocalContinuumanaconda3libsite-packagesurllib3utilssl_.py in ssl_wrap_socket(sock, keyfile, certfile, cert_reqs, ca_certs, server_hostname, ssl_version, ciphers, ssl_context, ca_cert_dir)
328 if HAS_SNI: # Platform-specific: OpenSSL with enabled SNI
--> 329 return context.wrap_socket(sock, server_hostname=server_hostname)
330
~AppDataLocalContinuumanaconda3libsite-packagesurllib3contribpyopenssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname)
447 except OpenSSL.SSL.Error as e:
--> 448 raise ssl.SSLError('bad handshake: %r' % e)
449 break
SSLError: ("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",)
During handling of the above exception, another exception occurred:
MaxRetryError Traceback (most recent call last)
~AppDataLocalContinuumanaconda3libsite-packagesrequestsadapters.py in send(self, request, stream, timeout, verify, cert, proxies)
439 retries=self.max_retries,
--> 440 timeout=timeout
441 )
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
638 retries = retries.increment(method, url, error=e, _pool=self,
--> 639 _stacktrace=sys.exc_info()[2])
640 retries.sleep()
~AppDataLocalContinuumanaconda3libsite-packagesurllib3utilretry.py in increment(self, method, url, response, error, _pool, _stacktrace)
387 if new_retry.is_exhausted():
--> 388 raise MaxRetryError(_pool, url, error or ResponseError(cause))
389
MaxRetryError: HTTPSConnectionPool(host='www.remax.com', port=443): Max retries exceeded with url: /api/listings?nwlat=33.8426971435546875&nwlong=-118.3811187744140625&selat=33.8426971435546875&selong=-118.3783721923828125&Count=100&pagenumber=1&SiteID=68000000&pageCount=10&tab=map&sh=true&forcelatlong=true&maplistings=1&maplistcards=0&sv=true&sortorder=newest&view=forsale (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",),))
During handling of the above exception, another exception occurred:
SSLError Traceback (most recent call last)
<ipython-input-22-bcfdfdfb0a4e> in <module>()
----> 1 get_info('119 S IRENA AVE B, Redondo Beach, CA 90277')
<ipython-input-21-f3c942a87400> in get_info(address)
32 }
33 # proxies = 'http': 'http://user:pass@10.10.1.10:3128/'
---> 34 req_properties = requests.get("https://www.remax.com/api/listings", params=params)
35 matching_properties_json = req_properties.json()
36 for p in matching_properties_json[0]:
~AppDataLocalContinuumanaconda3libsite-packagesrequestsapi.py in get(url, params, **kwargs)
70
71 kwargs.setdefault('allow_redirects', True)
---> 72 return request('get', url, params=params, **kwargs)
73
74
~AppDataLocalContinuumanaconda3libsite-packagesrequestsapi.py in request(method, url, **kwargs)
56 # cases, and look like a memory leak in others.
57 with sessions.Session() as session:
---> 58 return session.request(method=method, url=url, **kwargs)
59
60
~AppDataLocalContinuumanaconda3libsite-packagesrequestssessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
506 }
507 send_kwargs.update(settings)
--> 508 resp = self.send(prep, **send_kwargs)
509
510 return resp
~AppDataLocalContinuumanaconda3libsite-packagesrequestssessions.py in send(self, request, **kwargs)
616
617 # Send the request
--> 618 r = adapter.send(request, **kwargs)
619
620 # Total elapsed time of the request (approximately)
~AppDataLocalContinuumanaconda3libsite-packagesrequestsadapters.py in send(self, request, stream, timeout, verify, cert, proxies)
504 if isinstance(e.reason, _SSLError):
505 # This branch is for urllib3 v1.22 and later.
--> 506 raise SSLError(e, request=request)
507
508 raise ConnectionError(e, request=request)
SSLError: HTTPSConnectionPool(host='www.remax.com', port=443): Max retries exceeded with url: /api/listings?nwlat=33.8426971435546875&nwlong=-118.3811187744140625&selat=33.8426971435546875&selong=-118.3783721923828125&Count=100&pagenumber=1&SiteID=68000000&pageCount=10&tab=map&sh=true&forcelatlong=true&maplistings=1&maplistcards=0&sv=true&sortorder=newest&view=forsale (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",),))
Here is my code:
import urllib
from bs4 import BeautifulSoup
import pandas as pd
import geopy
from geopy.geocoders import Nominatim
import geolib
from geolib import geohash
from geopy.extra.rate_limiter import RateLimiter
import requests
geolocator = Nominatim(timeout=None)
def get_dir(address):
location = geolocator.geocode(address)
lat = location.latitude
lng = location.longitude
h = geolib.geohash.encode(lat, lng, 7)
hashes = geolib.geohash.neighbours(h)
NW = geohash.decode(hashes.nw)
SE = geohash.decode(hashes.ne)
nwlat = NW.lat
nwlon = NW.lon
selat = SE.lat
selon = SE.lon
return nwlat, nwlon, selat, selon
def get_info(address):
try:
nwlat, nwlon, selat, selon = get_dir(address)
params =
"nwlat" : nwlat,
"nwlong" : nwlon,
"selat" : selat,
"selong" : selon,
"Count" : 100,
"pagenumber" : 1,
"SiteID" : "68000000",
"pageCount" : "10",
"tab" : "map",
"sh" : "true",
"forcelatlong" : "true",
"maplistings" : "1",
"maplistcards" : "0",
"sv" : "true",
"sortorder" : "newest",
"view" : "homeestimates",
proxies = 'http': 'http://user:pass@10.10.1.10:3128/'
req_properties = requests.get("https://www.remax.com/api/listings", params=params, proxies=proxies, verify=False)
matching_properties_json = req_properties.json()
for p in matching_properties_json[0]:
print(f"p['Address']:<40 p.get('BedRooms', 0) beds | int(p.get('BathRooms',0)) baths | p['SqFt'] sqft")
except (AttributeError):
return 'NaN'
x = get_info('693 Bluebird Canyon Drive, Laguna Beach CA, 92651')
print(x)
I am not sure how to fix this problem as I am new to web scraping, I tried adding a proxy in the code but I still get the same errors in the latter above.
Update:
adding
proxies = 'http': 'http://user:pass@10.10.1.10:3128/'
req_properties = requests.get("https://www.remax.com/api/listings", params=params, proxies=proxies, verify=False)
yields no errors but also no output at all.
python web-scraping
are you using a proxy ?
– Nipun Wijerathne
Mar 25 at 16:41
@NipunWijerathne I do not believe so, how would I know?
– Wolfy
Mar 25 at 16:41
Take a look at this answer which pertains to the error you are getting - stackoverflow.com/questions/10667960/…
– Bert
Mar 25 at 17:18
@Bert I tried following the suggestions but still doesn't solve the problem but thank you for the comment.
– Wolfy
Mar 25 at 17:24
Does the code in the original question return results for you? If so the issue should not be proxy related.
– Martin Evans
Mar 26 at 10:13
|
show 1 more comment
I am trying to scrape some data from Remax.com for information like lotsize or square feet of property. Although I am get the following errors:
---------------------------------------------------------------------------
Error Traceback (most recent call last)
~AppDataLocalContinuumanaconda3libsite-packagesurllib3contribpyopenssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname)
440 try:
--> 441 cnx.do_handshake()
442 except OpenSSL.SSL.WantReadError:
~AppDataLocalContinuumanaconda3libsite-packagesOpenSSLSSL.py in do_handshake(self)
1715 result = _lib.SSL_do_handshake(self._ssl)
-> 1716 self._raise_ssl_error(self._ssl, result)
1717
~AppDataLocalContinuumanaconda3libsite-packagesOpenSSLSSL.py in _raise_ssl_error(self, ssl, result)
1455 else:
-> 1456 _raise_current_error()
1457
~AppDataLocalContinuumanaconda3libsite-packagesOpenSSL_util.py in exception_from_error_queue(exception_type)
53
---> 54 raise exception_type(errors)
55
Error: [('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')]
During handling of the above exception, another exception occurred:
SSLError Traceback (most recent call last)
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
600 body=body, headers=headers,
--> 601 chunked=chunked)
602
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
345 try:
--> 346 self._validate_conn(conn)
347 except (SocketTimeout, BaseSSLError) as e:
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connectionpool.py in _validate_conn(self, conn)
849 if not getattr(conn, 'sock', None): # AppEngine might not have `.sock`
--> 850 conn.connect()
851
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connection.py in connect(self)
325 server_hostname=hostname,
--> 326 ssl_context=context)
327
~AppDataLocalContinuumanaconda3libsite-packagesurllib3utilssl_.py in ssl_wrap_socket(sock, keyfile, certfile, cert_reqs, ca_certs, server_hostname, ssl_version, ciphers, ssl_context, ca_cert_dir)
328 if HAS_SNI: # Platform-specific: OpenSSL with enabled SNI
--> 329 return context.wrap_socket(sock, server_hostname=server_hostname)
330
~AppDataLocalContinuumanaconda3libsite-packagesurllib3contribpyopenssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname)
447 except OpenSSL.SSL.Error as e:
--> 448 raise ssl.SSLError('bad handshake: %r' % e)
449 break
SSLError: ("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",)
During handling of the above exception, another exception occurred:
MaxRetryError Traceback (most recent call last)
~AppDataLocalContinuumanaconda3libsite-packagesrequestsadapters.py in send(self, request, stream, timeout, verify, cert, proxies)
439 retries=self.max_retries,
--> 440 timeout=timeout
441 )
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
638 retries = retries.increment(method, url, error=e, _pool=self,
--> 639 _stacktrace=sys.exc_info()[2])
640 retries.sleep()
~AppDataLocalContinuumanaconda3libsite-packagesurllib3utilretry.py in increment(self, method, url, response, error, _pool, _stacktrace)
387 if new_retry.is_exhausted():
--> 388 raise MaxRetryError(_pool, url, error or ResponseError(cause))
389
MaxRetryError: HTTPSConnectionPool(host='www.remax.com', port=443): Max retries exceeded with url: /api/listings?nwlat=33.8426971435546875&nwlong=-118.3811187744140625&selat=33.8426971435546875&selong=-118.3783721923828125&Count=100&pagenumber=1&SiteID=68000000&pageCount=10&tab=map&sh=true&forcelatlong=true&maplistings=1&maplistcards=0&sv=true&sortorder=newest&view=forsale (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",),))
During handling of the above exception, another exception occurred:
SSLError Traceback (most recent call last)
<ipython-input-22-bcfdfdfb0a4e> in <module>()
----> 1 get_info('119 S IRENA AVE B, Redondo Beach, CA 90277')
<ipython-input-21-f3c942a87400> in get_info(address)
32 }
33 # proxies = 'http': 'http://user:pass@10.10.1.10:3128/'
---> 34 req_properties = requests.get("https://www.remax.com/api/listings", params=params)
35 matching_properties_json = req_properties.json()
36 for p in matching_properties_json[0]:
~AppDataLocalContinuumanaconda3libsite-packagesrequestsapi.py in get(url, params, **kwargs)
70
71 kwargs.setdefault('allow_redirects', True)
---> 72 return request('get', url, params=params, **kwargs)
73
74
~AppDataLocalContinuumanaconda3libsite-packagesrequestsapi.py in request(method, url, **kwargs)
56 # cases, and look like a memory leak in others.
57 with sessions.Session() as session:
---> 58 return session.request(method=method, url=url, **kwargs)
59
60
~AppDataLocalContinuumanaconda3libsite-packagesrequestssessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
506 }
507 send_kwargs.update(settings)
--> 508 resp = self.send(prep, **send_kwargs)
509
510 return resp
~AppDataLocalContinuumanaconda3libsite-packagesrequestssessions.py in send(self, request, **kwargs)
616
617 # Send the request
--> 618 r = adapter.send(request, **kwargs)
619
620 # Total elapsed time of the request (approximately)
~AppDataLocalContinuumanaconda3libsite-packagesrequestsadapters.py in send(self, request, stream, timeout, verify, cert, proxies)
504 if isinstance(e.reason, _SSLError):
505 # This branch is for urllib3 v1.22 and later.
--> 506 raise SSLError(e, request=request)
507
508 raise ConnectionError(e, request=request)
SSLError: HTTPSConnectionPool(host='www.remax.com', port=443): Max retries exceeded with url: /api/listings?nwlat=33.8426971435546875&nwlong=-118.3811187744140625&selat=33.8426971435546875&selong=-118.3783721923828125&Count=100&pagenumber=1&SiteID=68000000&pageCount=10&tab=map&sh=true&forcelatlong=true&maplistings=1&maplistcards=0&sv=true&sortorder=newest&view=forsale (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",),))
Here is my code:
import urllib
from bs4 import BeautifulSoup
import pandas as pd
import geopy
from geopy.geocoders import Nominatim
import geolib
from geolib import geohash
from geopy.extra.rate_limiter import RateLimiter
import requests
geolocator = Nominatim(timeout=None)
def get_dir(address):
location = geolocator.geocode(address)
lat = location.latitude
lng = location.longitude
h = geolib.geohash.encode(lat, lng, 7)
hashes = geolib.geohash.neighbours(h)
NW = geohash.decode(hashes.nw)
SE = geohash.decode(hashes.ne)
nwlat = NW.lat
nwlon = NW.lon
selat = SE.lat
selon = SE.lon
return nwlat, nwlon, selat, selon
def get_info(address):
try:
nwlat, nwlon, selat, selon = get_dir(address)
params =
"nwlat" : nwlat,
"nwlong" : nwlon,
"selat" : selat,
"selong" : selon,
"Count" : 100,
"pagenumber" : 1,
"SiteID" : "68000000",
"pageCount" : "10",
"tab" : "map",
"sh" : "true",
"forcelatlong" : "true",
"maplistings" : "1",
"maplistcards" : "0",
"sv" : "true",
"sortorder" : "newest",
"view" : "homeestimates",
proxies = 'http': 'http://user:pass@10.10.1.10:3128/'
req_properties = requests.get("https://www.remax.com/api/listings", params=params, proxies=proxies, verify=False)
matching_properties_json = req_properties.json()
for p in matching_properties_json[0]:
print(f"p['Address']:<40 p.get('BedRooms', 0) beds | int(p.get('BathRooms',0)) baths | p['SqFt'] sqft")
except (AttributeError):
return 'NaN'
x = get_info('693 Bluebird Canyon Drive, Laguna Beach CA, 92651')
print(x)
I am not sure how to fix this problem as I am new to web scraping, I tried adding a proxy in the code but I still get the same errors in the latter above.
Update:
adding
proxies = 'http': 'http://user:pass@10.10.1.10:3128/'
req_properties = requests.get("https://www.remax.com/api/listings", params=params, proxies=proxies, verify=False)
yields no errors but also no output at all.
python web-scraping
I am trying to scrape some data from Remax.com for information like lotsize or square feet of property. Although I am get the following errors:
---------------------------------------------------------------------------
Error Traceback (most recent call last)
~AppDataLocalContinuumanaconda3libsite-packagesurllib3contribpyopenssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname)
440 try:
--> 441 cnx.do_handshake()
442 except OpenSSL.SSL.WantReadError:
~AppDataLocalContinuumanaconda3libsite-packagesOpenSSLSSL.py in do_handshake(self)
1715 result = _lib.SSL_do_handshake(self._ssl)
-> 1716 self._raise_ssl_error(self._ssl, result)
1717
~AppDataLocalContinuumanaconda3libsite-packagesOpenSSLSSL.py in _raise_ssl_error(self, ssl, result)
1455 else:
-> 1456 _raise_current_error()
1457
~AppDataLocalContinuumanaconda3libsite-packagesOpenSSL_util.py in exception_from_error_queue(exception_type)
53
---> 54 raise exception_type(errors)
55
Error: [('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')]
During handling of the above exception, another exception occurred:
SSLError Traceback (most recent call last)
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
600 body=body, headers=headers,
--> 601 chunked=chunked)
602
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
345 try:
--> 346 self._validate_conn(conn)
347 except (SocketTimeout, BaseSSLError) as e:
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connectionpool.py in _validate_conn(self, conn)
849 if not getattr(conn, 'sock', None): # AppEngine might not have `.sock`
--> 850 conn.connect()
851
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connection.py in connect(self)
325 server_hostname=hostname,
--> 326 ssl_context=context)
327
~AppDataLocalContinuumanaconda3libsite-packagesurllib3utilssl_.py in ssl_wrap_socket(sock, keyfile, certfile, cert_reqs, ca_certs, server_hostname, ssl_version, ciphers, ssl_context, ca_cert_dir)
328 if HAS_SNI: # Platform-specific: OpenSSL with enabled SNI
--> 329 return context.wrap_socket(sock, server_hostname=server_hostname)
330
~AppDataLocalContinuumanaconda3libsite-packagesurllib3contribpyopenssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname)
447 except OpenSSL.SSL.Error as e:
--> 448 raise ssl.SSLError('bad handshake: %r' % e)
449 break
SSLError: ("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",)
During handling of the above exception, another exception occurred:
MaxRetryError Traceback (most recent call last)
~AppDataLocalContinuumanaconda3libsite-packagesrequestsadapters.py in send(self, request, stream, timeout, verify, cert, proxies)
439 retries=self.max_retries,
--> 440 timeout=timeout
441 )
~AppDataLocalContinuumanaconda3libsite-packagesurllib3connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
638 retries = retries.increment(method, url, error=e, _pool=self,
--> 639 _stacktrace=sys.exc_info()[2])
640 retries.sleep()
~AppDataLocalContinuumanaconda3libsite-packagesurllib3utilretry.py in increment(self, method, url, response, error, _pool, _stacktrace)
387 if new_retry.is_exhausted():
--> 388 raise MaxRetryError(_pool, url, error or ResponseError(cause))
389
MaxRetryError: HTTPSConnectionPool(host='www.remax.com', port=443): Max retries exceeded with url: /api/listings?nwlat=33.8426971435546875&nwlong=-118.3811187744140625&selat=33.8426971435546875&selong=-118.3783721923828125&Count=100&pagenumber=1&SiteID=68000000&pageCount=10&tab=map&sh=true&forcelatlong=true&maplistings=1&maplistcards=0&sv=true&sortorder=newest&view=forsale (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",),))
During handling of the above exception, another exception occurred:
SSLError Traceback (most recent call last)
<ipython-input-22-bcfdfdfb0a4e> in <module>()
----> 1 get_info('119 S IRENA AVE B, Redondo Beach, CA 90277')
<ipython-input-21-f3c942a87400> in get_info(address)
32 }
33 # proxies = 'http': 'http://user:pass@10.10.1.10:3128/'
---> 34 req_properties = requests.get("https://www.remax.com/api/listings", params=params)
35 matching_properties_json = req_properties.json()
36 for p in matching_properties_json[0]:
~AppDataLocalContinuumanaconda3libsite-packagesrequestsapi.py in get(url, params, **kwargs)
70
71 kwargs.setdefault('allow_redirects', True)
---> 72 return request('get', url, params=params, **kwargs)
73
74
~AppDataLocalContinuumanaconda3libsite-packagesrequestsapi.py in request(method, url, **kwargs)
56 # cases, and look like a memory leak in others.
57 with sessions.Session() as session:
---> 58 return session.request(method=method, url=url, **kwargs)
59
60
~AppDataLocalContinuumanaconda3libsite-packagesrequestssessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
506 }
507 send_kwargs.update(settings)
--> 508 resp = self.send(prep, **send_kwargs)
509
510 return resp
~AppDataLocalContinuumanaconda3libsite-packagesrequestssessions.py in send(self, request, **kwargs)
616
617 # Send the request
--> 618 r = adapter.send(request, **kwargs)
619
620 # Total elapsed time of the request (approximately)
~AppDataLocalContinuumanaconda3libsite-packagesrequestsadapters.py in send(self, request, stream, timeout, verify, cert, proxies)
504 if isinstance(e.reason, _SSLError):
505 # This branch is for urllib3 v1.22 and later.
--> 506 raise SSLError(e, request=request)
507
508 raise ConnectionError(e, request=request)
SSLError: HTTPSConnectionPool(host='www.remax.com', port=443): Max retries exceeded with url: /api/listings?nwlat=33.8426971435546875&nwlong=-118.3811187744140625&selat=33.8426971435546875&selong=-118.3783721923828125&Count=100&pagenumber=1&SiteID=68000000&pageCount=10&tab=map&sh=true&forcelatlong=true&maplistings=1&maplistcards=0&sv=true&sortorder=newest&view=forsale (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",),))
Here is my code:
import urllib
from bs4 import BeautifulSoup
import pandas as pd
import geopy
from geopy.geocoders import Nominatim
import geolib
from geolib import geohash
from geopy.extra.rate_limiter import RateLimiter
import requests
geolocator = Nominatim(timeout=None)
def get_dir(address):
location = geolocator.geocode(address)
lat = location.latitude
lng = location.longitude
h = geolib.geohash.encode(lat, lng, 7)
hashes = geolib.geohash.neighbours(h)
NW = geohash.decode(hashes.nw)
SE = geohash.decode(hashes.ne)
nwlat = NW.lat
nwlon = NW.lon
selat = SE.lat
selon = SE.lon
return nwlat, nwlon, selat, selon
def get_info(address):
try:
nwlat, nwlon, selat, selon = get_dir(address)
params =
"nwlat" : nwlat,
"nwlong" : nwlon,
"selat" : selat,
"selong" : selon,
"Count" : 100,
"pagenumber" : 1,
"SiteID" : "68000000",
"pageCount" : "10",
"tab" : "map",
"sh" : "true",
"forcelatlong" : "true",
"maplistings" : "1",
"maplistcards" : "0",
"sv" : "true",
"sortorder" : "newest",
"view" : "homeestimates",
proxies = 'http': 'http://user:pass@10.10.1.10:3128/'
req_properties = requests.get("https://www.remax.com/api/listings", params=params, proxies=proxies, verify=False)
matching_properties_json = req_properties.json()
for p in matching_properties_json[0]:
print(f"p['Address']:<40 p.get('BedRooms', 0) beds | int(p.get('BathRooms',0)) baths | p['SqFt'] sqft")
except (AttributeError):
return 'NaN'
x = get_info('693 Bluebird Canyon Drive, Laguna Beach CA, 92651')
print(x)
I am not sure how to fix this problem as I am new to web scraping, I tried adding a proxy in the code but I still get the same errors in the latter above.
Update:
adding
proxies = 'http': 'http://user:pass@10.10.1.10:3128/'
req_properties = requests.get("https://www.remax.com/api/listings", params=params, proxies=proxies, verify=False)
yields no errors but also no output at all.
python web-scraping
python web-scraping
edited Mar 25 at 21:55
Wolfy
asked Mar 25 at 16:39
WolfyWolfy
1371 silver badge11 bronze badges
1371 silver badge11 bronze badges
are you using a proxy ?
– Nipun Wijerathne
Mar 25 at 16:41
@NipunWijerathne I do not believe so, how would I know?
– Wolfy
Mar 25 at 16:41
Take a look at this answer which pertains to the error you are getting - stackoverflow.com/questions/10667960/…
– Bert
Mar 25 at 17:18
@Bert I tried following the suggestions but still doesn't solve the problem but thank you for the comment.
– Wolfy
Mar 25 at 17:24
Does the code in the original question return results for you? If so the issue should not be proxy related.
– Martin Evans
Mar 26 at 10:13
|
show 1 more comment
are you using a proxy ?
– Nipun Wijerathne
Mar 25 at 16:41
@NipunWijerathne I do not believe so, how would I know?
– Wolfy
Mar 25 at 16:41
Take a look at this answer which pertains to the error you are getting - stackoverflow.com/questions/10667960/…
– Bert
Mar 25 at 17:18
@Bert I tried following the suggestions but still doesn't solve the problem but thank you for the comment.
– Wolfy
Mar 25 at 17:24
Does the code in the original question return results for you? If so the issue should not be proxy related.
– Martin Evans
Mar 26 at 10:13
are you using a proxy ?
– Nipun Wijerathne
Mar 25 at 16:41
are you using a proxy ?
– Nipun Wijerathne
Mar 25 at 16:41
@NipunWijerathne I do not believe so, how would I know?
– Wolfy
Mar 25 at 16:41
@NipunWijerathne I do not believe so, how would I know?
– Wolfy
Mar 25 at 16:41
Take a look at this answer which pertains to the error you are getting - stackoverflow.com/questions/10667960/…
– Bert
Mar 25 at 17:18
Take a look at this answer which pertains to the error you are getting - stackoverflow.com/questions/10667960/…
– Bert
Mar 25 at 17:18
@Bert I tried following the suggestions but still doesn't solve the problem but thank you for the comment.
– Wolfy
Mar 25 at 17:24
@Bert I tried following the suggestions but still doesn't solve the problem but thank you for the comment.
– Wolfy
Mar 25 at 17:24
Does the code in the original question return results for you? If so the issue should not be proxy related.
– Martin Evans
Mar 26 at 10:13
Does the code in the original question return results for you? If so the issue should not be proxy related.
– Martin Evans
Mar 26 at 10:13
|
show 1 more comment
1 Answer
1
active
oldest
votes
There appear to be a number of issues:
Proxy is not an issue as you have said the previous question is working without needing one to be configured.
Your
geohash.decode(hashes.ne)
call is usingne
instead ofse
.The returned coordinates are not returning any valid properties, the API appears to return a different kind of response in this case which does not include the values you want. It does include the price though.
Make sure that
verify=False
is configured for the get. The warning message can be suppressed.
If the search square is increased slightly in size, it does return results:
import urllib
import urllib3
from bs4 import BeautifulSoup
import pandas as pd
import geopy
from geopy.geocoders import Nominatim
import geolib
from geolib import geohash
from geopy.extra.rate_limiter import RateLimiter
import requests
# Disable the certificate warning
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
geolocator = Nominatim(timeout=None)
def get_dir(address):
location = geolocator.geocode(address)
lat = location.latitude
lng = location.longitude
h = geolib.geohash.encode(lat, lng, 7)
hashes = geolib.geohash.neighbours(h)
NW = geohash.decode(hashes.nw)
SE = geohash.decode(hashes.se)
return NW, SE
def get_info(address):
try:
NW, SE = get_dir(address)
square_size = 0.001
params =
"nwlat" : float(NW.lat) + square_size,
"nwlong" : float(NW.lon) - square_size,
"selat" : float(SE.lat) - square_size,
"selong" : float(SE.lon) + square_size,
"Count" : 100,
"pagenumber" : 1,
"SiteID" : "68000000",
"pageCount" : "10",
"tab" : "map",
"sh" : "true",
"forcelatlong" : "true",
"maplistings" : "1",
"maplistcards" : "0",
"sv" : "true",
"sortorder" : "newest",
"view" : "homeestimates",
req_properties = requests.get("https://www.remax.com/api/listings", params=params, verify=False)
matching_properties_json = req_properties.json()
for p in matching_properties_json[0]:
address = f"p['Address'], p['City'], p['State'], p['Zip']"
try:
print(f" address:<50 | p.get('BedRooms', 0) beds | int(p.get('BathRooms',0)) baths | p['SqFt'] sqft")
except KeyError:
print(f"None found - address - $p['PriceFormatted']")
except (AttributeError):
return 'NaN'
get_info('693 Bluebird Canyon Drive, Laguna Beach CA, 92651')
This displays:
1566 Glenneyre Street, Laguna Beach, CA, 92651 | 0 beds | 0 baths | sqft
1585 S Coast 4, Laguna Beach, CA, 92651 | 3 beds | 2 baths | 1448 sqft
429 Shadow Lane, Laguna Beach, CA, 92651 | 2 beds | 2 baths | 1102 sqft
243 Calliope Street 1, Laguna Beach, CA, 92651 | 2 beds | 2 baths | 1350 sqft
I get this None found - 1566 Glenneyre Street, Laguna Beach, CA, 92651 - $2,595,000
– Wolfy
Mar 26 at 15:58
setsquare_size = 0.001
and try again.
– Martin Evans
Mar 26 at 15:58
Perfect, any idea how to get lotsize? I noticed that in the json it doesn't specify lotsize as a name so its difficult to retrieve.
– Wolfy
Mar 26 at 15:59
1
I suggest you addprint(p)
. You can then see all of the available data for each property. I could only sizeSqFt
.
– Martin Evans
Mar 26 at 16:02
Thanks for you help, really appreciate it.
– Wolfy
Mar 26 at 16:03
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55342568%2fweb-scraping-from-remax-com%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
There appear to be a number of issues:
Proxy is not an issue as you have said the previous question is working without needing one to be configured.
Your
geohash.decode(hashes.ne)
call is usingne
instead ofse
.The returned coordinates are not returning any valid properties, the API appears to return a different kind of response in this case which does not include the values you want. It does include the price though.
Make sure that
verify=False
is configured for the get. The warning message can be suppressed.
If the search square is increased slightly in size, it does return results:
import urllib
import urllib3
from bs4 import BeautifulSoup
import pandas as pd
import geopy
from geopy.geocoders import Nominatim
import geolib
from geolib import geohash
from geopy.extra.rate_limiter import RateLimiter
import requests
# Disable the certificate warning
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
geolocator = Nominatim(timeout=None)
def get_dir(address):
location = geolocator.geocode(address)
lat = location.latitude
lng = location.longitude
h = geolib.geohash.encode(lat, lng, 7)
hashes = geolib.geohash.neighbours(h)
NW = geohash.decode(hashes.nw)
SE = geohash.decode(hashes.se)
return NW, SE
def get_info(address):
try:
NW, SE = get_dir(address)
square_size = 0.001
params =
"nwlat" : float(NW.lat) + square_size,
"nwlong" : float(NW.lon) - square_size,
"selat" : float(SE.lat) - square_size,
"selong" : float(SE.lon) + square_size,
"Count" : 100,
"pagenumber" : 1,
"SiteID" : "68000000",
"pageCount" : "10",
"tab" : "map",
"sh" : "true",
"forcelatlong" : "true",
"maplistings" : "1",
"maplistcards" : "0",
"sv" : "true",
"sortorder" : "newest",
"view" : "homeestimates",
req_properties = requests.get("https://www.remax.com/api/listings", params=params, verify=False)
matching_properties_json = req_properties.json()
for p in matching_properties_json[0]:
address = f"p['Address'], p['City'], p['State'], p['Zip']"
try:
print(f" address:<50 | p.get('BedRooms', 0) beds | int(p.get('BathRooms',0)) baths | p['SqFt'] sqft")
except KeyError:
print(f"None found - address - $p['PriceFormatted']")
except (AttributeError):
return 'NaN'
get_info('693 Bluebird Canyon Drive, Laguna Beach CA, 92651')
This displays:
1566 Glenneyre Street, Laguna Beach, CA, 92651 | 0 beds | 0 baths | sqft
1585 S Coast 4, Laguna Beach, CA, 92651 | 3 beds | 2 baths | 1448 sqft
429 Shadow Lane, Laguna Beach, CA, 92651 | 2 beds | 2 baths | 1102 sqft
243 Calliope Street 1, Laguna Beach, CA, 92651 | 2 beds | 2 baths | 1350 sqft
I get this None found - 1566 Glenneyre Street, Laguna Beach, CA, 92651 - $2,595,000
– Wolfy
Mar 26 at 15:58
setsquare_size = 0.001
and try again.
– Martin Evans
Mar 26 at 15:58
Perfect, any idea how to get lotsize? I noticed that in the json it doesn't specify lotsize as a name so its difficult to retrieve.
– Wolfy
Mar 26 at 15:59
1
I suggest you addprint(p)
. You can then see all of the available data for each property. I could only sizeSqFt
.
– Martin Evans
Mar 26 at 16:02
Thanks for you help, really appreciate it.
– Wolfy
Mar 26 at 16:03
add a comment |
There appear to be a number of issues:
Proxy is not an issue as you have said the previous question is working without needing one to be configured.
Your
geohash.decode(hashes.ne)
call is usingne
instead ofse
.The returned coordinates are not returning any valid properties, the API appears to return a different kind of response in this case which does not include the values you want. It does include the price though.
Make sure that
verify=False
is configured for the get. The warning message can be suppressed.
If the search square is increased slightly in size, it does return results:
import urllib
import urllib3
from bs4 import BeautifulSoup
import pandas as pd
import geopy
from geopy.geocoders import Nominatim
import geolib
from geolib import geohash
from geopy.extra.rate_limiter import RateLimiter
import requests
# Disable the certificate warning
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
geolocator = Nominatim(timeout=None)
def get_dir(address):
location = geolocator.geocode(address)
lat = location.latitude
lng = location.longitude
h = geolib.geohash.encode(lat, lng, 7)
hashes = geolib.geohash.neighbours(h)
NW = geohash.decode(hashes.nw)
SE = geohash.decode(hashes.se)
return NW, SE
def get_info(address):
try:
NW, SE = get_dir(address)
square_size = 0.001
params =
"nwlat" : float(NW.lat) + square_size,
"nwlong" : float(NW.lon) - square_size,
"selat" : float(SE.lat) - square_size,
"selong" : float(SE.lon) + square_size,
"Count" : 100,
"pagenumber" : 1,
"SiteID" : "68000000",
"pageCount" : "10",
"tab" : "map",
"sh" : "true",
"forcelatlong" : "true",
"maplistings" : "1",
"maplistcards" : "0",
"sv" : "true",
"sortorder" : "newest",
"view" : "homeestimates",
req_properties = requests.get("https://www.remax.com/api/listings", params=params, verify=False)
matching_properties_json = req_properties.json()
for p in matching_properties_json[0]:
address = f"p['Address'], p['City'], p['State'], p['Zip']"
try:
print(f" address:<50 | p.get('BedRooms', 0) beds | int(p.get('BathRooms',0)) baths | p['SqFt'] sqft")
except KeyError:
print(f"None found - address - $p['PriceFormatted']")
except (AttributeError):
return 'NaN'
get_info('693 Bluebird Canyon Drive, Laguna Beach CA, 92651')
This displays:
1566 Glenneyre Street, Laguna Beach, CA, 92651 | 0 beds | 0 baths | sqft
1585 S Coast 4, Laguna Beach, CA, 92651 | 3 beds | 2 baths | 1448 sqft
429 Shadow Lane, Laguna Beach, CA, 92651 | 2 beds | 2 baths | 1102 sqft
243 Calliope Street 1, Laguna Beach, CA, 92651 | 2 beds | 2 baths | 1350 sqft
I get this None found - 1566 Glenneyre Street, Laguna Beach, CA, 92651 - $2,595,000
– Wolfy
Mar 26 at 15:58
setsquare_size = 0.001
and try again.
– Martin Evans
Mar 26 at 15:58
Perfect, any idea how to get lotsize? I noticed that in the json it doesn't specify lotsize as a name so its difficult to retrieve.
– Wolfy
Mar 26 at 15:59
1
I suggest you addprint(p)
. You can then see all of the available data for each property. I could only sizeSqFt
.
– Martin Evans
Mar 26 at 16:02
Thanks for you help, really appreciate it.
– Wolfy
Mar 26 at 16:03
add a comment |
There appear to be a number of issues:
Proxy is not an issue as you have said the previous question is working without needing one to be configured.
Your
geohash.decode(hashes.ne)
call is usingne
instead ofse
.The returned coordinates are not returning any valid properties, the API appears to return a different kind of response in this case which does not include the values you want. It does include the price though.
Make sure that
verify=False
is configured for the get. The warning message can be suppressed.
If the search square is increased slightly in size, it does return results:
import urllib
import urllib3
from bs4 import BeautifulSoup
import pandas as pd
import geopy
from geopy.geocoders import Nominatim
import geolib
from geolib import geohash
from geopy.extra.rate_limiter import RateLimiter
import requests
# Disable the certificate warning
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
geolocator = Nominatim(timeout=None)
def get_dir(address):
location = geolocator.geocode(address)
lat = location.latitude
lng = location.longitude
h = geolib.geohash.encode(lat, lng, 7)
hashes = geolib.geohash.neighbours(h)
NW = geohash.decode(hashes.nw)
SE = geohash.decode(hashes.se)
return NW, SE
def get_info(address):
try:
NW, SE = get_dir(address)
square_size = 0.001
params =
"nwlat" : float(NW.lat) + square_size,
"nwlong" : float(NW.lon) - square_size,
"selat" : float(SE.lat) - square_size,
"selong" : float(SE.lon) + square_size,
"Count" : 100,
"pagenumber" : 1,
"SiteID" : "68000000",
"pageCount" : "10",
"tab" : "map",
"sh" : "true",
"forcelatlong" : "true",
"maplistings" : "1",
"maplistcards" : "0",
"sv" : "true",
"sortorder" : "newest",
"view" : "homeestimates",
req_properties = requests.get("https://www.remax.com/api/listings", params=params, verify=False)
matching_properties_json = req_properties.json()
for p in matching_properties_json[0]:
address = f"p['Address'], p['City'], p['State'], p['Zip']"
try:
print(f" address:<50 | p.get('BedRooms', 0) beds | int(p.get('BathRooms',0)) baths | p['SqFt'] sqft")
except KeyError:
print(f"None found - address - $p['PriceFormatted']")
except (AttributeError):
return 'NaN'
get_info('693 Bluebird Canyon Drive, Laguna Beach CA, 92651')
This displays:
1566 Glenneyre Street, Laguna Beach, CA, 92651 | 0 beds | 0 baths | sqft
1585 S Coast 4, Laguna Beach, CA, 92651 | 3 beds | 2 baths | 1448 sqft
429 Shadow Lane, Laguna Beach, CA, 92651 | 2 beds | 2 baths | 1102 sqft
243 Calliope Street 1, Laguna Beach, CA, 92651 | 2 beds | 2 baths | 1350 sqft
There appear to be a number of issues:
Proxy is not an issue as you have said the previous question is working without needing one to be configured.
Your
geohash.decode(hashes.ne)
call is usingne
instead ofse
.The returned coordinates are not returning any valid properties, the API appears to return a different kind of response in this case which does not include the values you want. It does include the price though.
Make sure that
verify=False
is configured for the get. The warning message can be suppressed.
If the search square is increased slightly in size, it does return results:
import urllib
import urllib3
from bs4 import BeautifulSoup
import pandas as pd
import geopy
from geopy.geocoders import Nominatim
import geolib
from geolib import geohash
from geopy.extra.rate_limiter import RateLimiter
import requests
# Disable the certificate warning
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
geolocator = Nominatim(timeout=None)
def get_dir(address):
location = geolocator.geocode(address)
lat = location.latitude
lng = location.longitude
h = geolib.geohash.encode(lat, lng, 7)
hashes = geolib.geohash.neighbours(h)
NW = geohash.decode(hashes.nw)
SE = geohash.decode(hashes.se)
return NW, SE
def get_info(address):
try:
NW, SE = get_dir(address)
square_size = 0.001
params =
"nwlat" : float(NW.lat) + square_size,
"nwlong" : float(NW.lon) - square_size,
"selat" : float(SE.lat) - square_size,
"selong" : float(SE.lon) + square_size,
"Count" : 100,
"pagenumber" : 1,
"SiteID" : "68000000",
"pageCount" : "10",
"tab" : "map",
"sh" : "true",
"forcelatlong" : "true",
"maplistings" : "1",
"maplistcards" : "0",
"sv" : "true",
"sortorder" : "newest",
"view" : "homeestimates",
req_properties = requests.get("https://www.remax.com/api/listings", params=params, verify=False)
matching_properties_json = req_properties.json()
for p in matching_properties_json[0]:
address = f"p['Address'], p['City'], p['State'], p['Zip']"
try:
print(f" address:<50 | p.get('BedRooms', 0) beds | int(p.get('BathRooms',0)) baths | p['SqFt'] sqft")
except KeyError:
print(f"None found - address - $p['PriceFormatted']")
except (AttributeError):
return 'NaN'
get_info('693 Bluebird Canyon Drive, Laguna Beach CA, 92651')
This displays:
1566 Glenneyre Street, Laguna Beach, CA, 92651 | 0 beds | 0 baths | sqft
1585 S Coast 4, Laguna Beach, CA, 92651 | 3 beds | 2 baths | 1448 sqft
429 Shadow Lane, Laguna Beach, CA, 92651 | 2 beds | 2 baths | 1102 sqft
243 Calliope Street 1, Laguna Beach, CA, 92651 | 2 beds | 2 baths | 1350 sqft
edited Mar 26 at 15:58
answered Mar 26 at 15:48
Martin EvansMartin Evans
29.7k13 gold badges37 silver badges61 bronze badges
29.7k13 gold badges37 silver badges61 bronze badges
I get this None found - 1566 Glenneyre Street, Laguna Beach, CA, 92651 - $2,595,000
– Wolfy
Mar 26 at 15:58
setsquare_size = 0.001
and try again.
– Martin Evans
Mar 26 at 15:58
Perfect, any idea how to get lotsize? I noticed that in the json it doesn't specify lotsize as a name so its difficult to retrieve.
– Wolfy
Mar 26 at 15:59
1
I suggest you addprint(p)
. You can then see all of the available data for each property. I could only sizeSqFt
.
– Martin Evans
Mar 26 at 16:02
Thanks for you help, really appreciate it.
– Wolfy
Mar 26 at 16:03
add a comment |
I get this None found - 1566 Glenneyre Street, Laguna Beach, CA, 92651 - $2,595,000
– Wolfy
Mar 26 at 15:58
setsquare_size = 0.001
and try again.
– Martin Evans
Mar 26 at 15:58
Perfect, any idea how to get lotsize? I noticed that in the json it doesn't specify lotsize as a name so its difficult to retrieve.
– Wolfy
Mar 26 at 15:59
1
I suggest you addprint(p)
. You can then see all of the available data for each property. I could only sizeSqFt
.
– Martin Evans
Mar 26 at 16:02
Thanks for you help, really appreciate it.
– Wolfy
Mar 26 at 16:03
I get this None found - 1566 Glenneyre Street, Laguna Beach, CA, 92651 - $2,595,000
– Wolfy
Mar 26 at 15:58
I get this None found - 1566 Glenneyre Street, Laguna Beach, CA, 92651 - $2,595,000
– Wolfy
Mar 26 at 15:58
set
square_size = 0.001
and try again.– Martin Evans
Mar 26 at 15:58
set
square_size = 0.001
and try again.– Martin Evans
Mar 26 at 15:58
Perfect, any idea how to get lotsize? I noticed that in the json it doesn't specify lotsize as a name so its difficult to retrieve.
– Wolfy
Mar 26 at 15:59
Perfect, any idea how to get lotsize? I noticed that in the json it doesn't specify lotsize as a name so its difficult to retrieve.
– Wolfy
Mar 26 at 15:59
1
1
I suggest you add
print(p)
. You can then see all of the available data for each property. I could only size SqFt
.– Martin Evans
Mar 26 at 16:02
I suggest you add
print(p)
. You can then see all of the available data for each property. I could only size SqFt
.– Martin Evans
Mar 26 at 16:02
Thanks for you help, really appreciate it.
– Wolfy
Mar 26 at 16:03
Thanks for you help, really appreciate it.
– Wolfy
Mar 26 at 16:03
add a comment |
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55342568%2fweb-scraping-from-remax-com%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
are you using a proxy ?
– Nipun Wijerathne
Mar 25 at 16:41
@NipunWijerathne I do not believe so, how would I know?
– Wolfy
Mar 25 at 16:41
Take a look at this answer which pertains to the error you are getting - stackoverflow.com/questions/10667960/…
– Bert
Mar 25 at 17:18
@Bert I tried following the suggestions but still doesn't solve the problem but thank you for the comment.
– Wolfy
Mar 25 at 17:24
Does the code in the original question return results for you? If so the issue should not be proxy related.
– Martin Evans
Mar 26 at 10:13