I'm new to multi-threading in Python and am currently writing a script that appends to a csv file. If I was to have multiple threads submitted to an concurrent.futures.ThreadPoolExecutor
that appends lines to a csv file. What could I do to guarantee thread safety if appending was the only file-related operation being done by these threads?
Simplified version of my code:
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
for count,ad_id in enumerate(advertisers):
downloadFutures.append(executor.submit(downloadThread, arguments.....))
time.sleep(random.randint(1,3))
And my thread class being:
def downloadThread(arguments......):
#Some code.....
writer.writerow(re.split(',', line.decode()))
Should I set up a seperate single-threaded executor to handle writing or is it woth worrying about if I am just appending?
EDIT: I should elaborate that when the write operations occur can vary greatly with minutes between when the file is next appended to, I am just concerned that this scenario has not occurred when testing my script and I would prefer to be covered for that.
I am not sure if csvwriter
is thread-safe. The documentation doesn't specify, so to be safe, if multiple threads use the same object, you should protect the usage with a threading.Lock
:
# create the lock
import threading
csv_writer_lock = threading.Lock()
def downloadThread(arguments......):
# pass csv_writer_lock somehow
# Note: use csv_writer_lock on *any* access
# Some code.....
with csv_writer_lock:
writer.writerow(re.split(',', line.decode()))
That being said, it may indeed be more elegant for the downloadThread
to submit write tasks to an executor, instead of explicitly using locks like this.