I have a large read-only data structure (a graph loaded in networkx, though this shouldn't be important) that I use in my web service. The webservice is built in Flask and then served through Gunicorn. Turns out that for every gunicorn worker I spin up, that worked holds its own copy of my data-structure. Thus, my ~700mb data structure which is perfectly manageable with one worker turns into a pretty big memory hog when I have 8 of them running. Is there any way I can share this data structure between gunicorn processes so I don't have to waste so much memory?
It looks like the easiest way to do this is to tell gunicorn to preload your application using the preload_app
option. This assumes that you can load the data structure as a module-level variable:
from flask import Flask
from your.application import CustomDataStructure
CUSTOM_DATA_STRUCTURE = CustomDataStructure('/data/lives/here')
# @app.routes, etc.
Alternatively, you could use a memory-mapped file (if you can wrap the shared memory with your custom data structure), gevent with gunicorn to ensure that you're only using one process, or the multi-processing module to spin up your own data-structure server which you connect to using IPC.