I'm writing some automation software using selenium==3.141.0
, python 3.6.7
, chromedriver 2.44
.
Most of the the logic is ok to be executed by the single browser instance, but for some part i have to launch 10-20 instances to have a decent execution speed.
Once it comes to the part which is executed by ThreadPoolExecutor
, browser interactions start throwing this error:
WARNING|05/Dec/2018 17:33:11|connectionpool|_put_conn|274|Connection pool is full, discarding connection: 127.0.0.1
WARNING|05/Dec/2018 17:33:11|connectionpool|urlopen|662|Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))': /session/119df5b95710793a0421c13ec3a83847/url
WARNING|05/Dec/2018 17:33:11|connectionpool|urlopen|662|Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fcee7ada048>: Failed to establish a new connection: [Errno 111] Connection refused',)': /session/119df5b95710793a0421c13ec3a83847/url
browser setup:
def init_chromedriver(cls):
try:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument(f"user-agent={Utils.get_random_browser_agent()}")
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(driver_paths['chrome'],
chrome_options=chrome_options,
service_args=['--verbose', f'--log-path={bundle_dir}/selenium/chromedriver.log'])
driver.implicitly_wait(10)
return driver
except Exception as e:
logger.error(e)
relevant code:
ProfileParser
instantiates a webdriver and execute a few page interactions. I suppose the interactions themselves are not relevant because everything works without ThreadPoolExecutor
.
However, in short:
class ProfileParser(object):
def __init__(self, acc):
self.driver = Utils.init_chromedriver()
def __exit__(self, exc_type, exc_val, exc_tb):
Utils.shutdown_chromedriver(self.driver)
self.driver = None
collect_user_info(post_url)
self.driver.get(post_url)
profile_url = self.driver.find_element_by_xpath('xpath_here')]').get_attribute('href')
While runs in ThreadPoolExecutor
, the error above appears at this point self.driver.find_element_by_xpath
or at self.driver.get
this is working:
with ProfileParser(acc) as pparser:
pparser.collect_user_info(posts[0])
these options are not working: (connectionpool errors
)
futures = []
#one worker, one future
with ThreadPoolExecutor(max_workers=1) as executor:
with ProfileParser(acc) as pparser:
futures.append(executor.submit(pparser.collect_user_info, posts[0]))
#10 workers, multiple futures
with ThreadPoolExecutor(max_workers=10) as executor:
for p in posts:
with ProfileParser(acc) as pparser:
futures.append(executor.submit(pparser.collect_user_info, p))
UPDATE:
I found a temporal solution (which does not invalidate this initial question) - to instantiate a webdriver
outside of ProfileParser
class. Don't know why it works but the initial does not. I suppose the cause in some language specifics?
Thanks for answers, however it doesn't seem like the problem is with the ThreadPoolExecutor
max_workers
limit - as you see in one of the options i tried to submit a single instance and it is still didn't work.
current workaround:
futures = []
with ThreadPoolExecutor(max_workers=10) as executor:
for p in posts:
driver = Utils.init_chromedriver()
futures.append({
'future': executor.submit(collect_user_info, driver, acc, p),
'driver': driver
})
for f in futures:
f['future'].done()
Utils.shutdown_chromedriver(f['driver'])
This error message...
WARNING|05/Dec/2018 17:33:11|connectionpool|_put_conn|274|Connection pool is full, discarding connection: 127.0.0.1
WARNING|05/Dec/2018 17:33:11|connectionpool|urlopen|662|Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))': /session/119df5b95710793a0421c13ec3a83847/url
WARNING|05/Dec/2018 17:33:11|connectionpool|urlopen|662|Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fcee7ada048>: Failed to establish a new connection: [Errno 111] Connection refused',)': /session/119df5b95710793a0421c13ec3a83847/url
...seems to be an issue in urllib3
's connection pooling which raised these WARNING while executing the def _put_conn(self, conn)
method in connectionpool.py.
def _put_conn(self, conn):
"""
Put a connection back into the pool.
:param conn:
Connection object for the current host and port as returned by
:meth:`._new_conn` or :meth:`._get_conn`.
If the pool is already full, the connection is closed and discarded
because we exceeded maxsize. If connections are discarded frequently,
then maxsize should be increased.
If the pool is closed, then the connection will be closed and discarded.
"""
try:
self.pool.put(conn, block=False)
return # Everything is dandy, done.
except AttributeError:
# self.pool is None.
pass
except queue.Full:
# This should never happen if self.block == True
log.warning(
"Connection pool is full, discarding connection: %s",
self.host)
# Connection never got put back into the pool, close it.
if conn:
conn.close()
ThreadPoolExecutor is an Executor subclass that uses a pool of threads to execute calls asynchronously. Deadlocks can occur when the callable associated with a Future waits on the results of another Future.
class concurrent.futures.ThreadPoolExecutor(max_workers=None, thread_name_prefix='', initializer=None, initargs=())
As per your question as you are trying to launch 10-20 instances the default connection pool size of 10 seems not to be enough in your case which is hardcoded in adapters.py.
Moreover, @EdLeafe in the discussion Getting error: Connection pool is full, discarding connection mentions:
It looks like within the requests code, None objects are normal. If
_get_conn()
gets None from the pool, it simply creates a new connection. It seems odd, though, that it should start with all those None objects, and that _put_conn() isn't smart enough to replace None with the connection.
However the merge Add pool size parameter to client constructor have fixed this issue.
Increasing the default connection pool size of 10 which was earlier hardcoded in adapters.py and now configurable will solve your issue.
As per your comment update ...submit a single instance and the outcome is the same.... as per @meferguson84 within the discussion Getting error: Connection pool is full, discarding connection:
I stepped into the code to the point where it mounts the adapter just to play with the pool size and see if it made a difference. What I found was that the queue is full of NoneType objects with the actual upload connection being the last item in the list. The list is 10 items long (which makes sense). What doesn't make sense is that the unfinished_tasks parameter for the pool is 11. How can this be when the queue itself is only 11 items? Also, is it normal for the queue to be full of NoneType objects with the connection we are using being the last item on the list?
That sounds like a possible cause in your usecase as well. It may sound redundant but you may still perform a couple of ad-hoc steps as follows: