I have a web scraper script which runs fine on my (Windows) PC, but I'm trying to get it to run from a (Linux) web server. I have a number of other scripts which run fine on the server (connecting to different websites than this one), but when I run this script, I get a [Errno 111] Connection refused
error.
Here is a minimal version of the script to demonstrate the problem:
import time
import requests
import urllib.request
from bs4 import BeautifulSoup
s = requests.Session()
target = "http://taxsearch.co.grayson.tx.us:8443/"
headers = {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "en",
"Cache-Control": "no-cache",
"Connection": "keep-alive",
"Host": "taxsearch.co.grayson.tx.us:8443",
"Pragma": "no-cache",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"
}
time.sleep(1)
response = s.get(target, headers=headers)
if response.status_code == requests.codes.ok:
results = BeautifulSoup(response.text, 'html.parser')
# Do something with output
else:
response.raise_for_status()
This runs fine on my PC, but when running on the server, I get the following error:
Traceback (most recent call last):
File "/home/jken/virtualenv/web-scraper/3.6/lib/python3.6/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/home/jken/virtualenv/web-scraper/3.6/lib/python3.6/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/home/jken/virtualenv/web-scraper/3.6/lib/python3.6/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jken/virtualenv/web-scraper/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/home/jken/virtualenv/web-scraper/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/opt/alt/python36/lib64/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/opt/alt/python36/lib64/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/opt/alt/python36/lib64/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/opt/alt/python36/lib64/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/opt/alt/python36/lib64/python3.6/http/client.py", line 964, in send
self.connect()
File "/home/jken/virtualenv/web-scraper/3.6/lib/python3.6/site-packages/urllib3/connection.py", line 181, in connect
conn = self._new_conn()
File "/home/jken/virtualenv/web-scraper/3.6/lib/python3.6/site-packages/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x2af700598c18>: Failed to establish a new connection: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jken/virtualenv/web-scraper/3.6/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/home/jken/virtualenv/web-scraper/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/jken/virtualenv/web-scraper/3.6/lib/python3.6/site-packages/urllib3/util/retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='taxsearch.co.grayson.tx.us', port=8443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2af700598c18>: Failed to establish a new connection: [Errno 111] Connection refused',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "../python/grayson-2year.py", line 22, in <module>
response = s.get(target, headers=headers)
File "/home/jken/virtualenv/web-scraper/3.6/lib/python3.6/site-packages/requests/sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "/home/jken/virtualenv/web-scraper/3.6/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/home/jken/virtualenv/web-scraper/3.6/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/home/jken/virtualenv/web-scraper/3.6/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='taxsearch.co.grayson.tx.us', port=8443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2af700598c18>: Failed to establish a new connection: [Errno 111] Connection refused',))
My guess would be that the issue here is down to some firewall issue on the webserver or something, but I'm really not sure. Is there something I'm missing?