I am researching high performance logging in Python and so far have been disappointed by the performance of the python standard logging module - but there seem to be no alternatives. Below is a piece of code to performance test 4 different ways of logging:
import logging
import timeit
import time
import datetime
from logutils.queue import QueueListener, QueueHandler
import Queue
import threading
tmpq = Queue.Queue()
def std_manual_threading():
start = datetime.datetime.now()
logger = logging.getLogger()
hdlr = logging.FileHandler('std_manual.out', 'w')
logger.addHandler(hdlr)
logger.setLevel(logging.DEBUG)
def logger_thread(f):
while True:
item = tmpq.get(0.1)
if item == None:
break
logging.info(item)
f = open('manual.out', 'w')
lt = threading.Thread(target=logger_thread, args=(f,))
lt.start()
for i in range(100000):
tmpq.put("msg:%d" % i)
tmpq.put(None)
lt.join()
print datetime.datetime.now() - start
def nonstd_manual_threading():
start = datetime.datetime.now()
def logger_thread(f):
while True:
item = tmpq.get(0.1)
if item == None:
break
f.write(item+"\n")
f = open('manual.out', 'w')
lt = threading.Thread(target=logger_thread, args=(f,))
lt.start()
for i in range(100000):
tmpq.put("msg:%d" % i)
tmpq.put(None)
lt.join()
print datetime.datetime.now() - start
def std_logging_queue_handler():
start = datetime.datetime.now()
q = Queue.Queue(-1)
logger = logging.getLogger()
hdlr = logging.FileHandler('qtest.out', 'w')
ql = QueueListener(q, hdlr)
# Create log and set handler to queue handle
root = logging.getLogger()
root.setLevel(logging.DEBUG) # Log level = DEBUG
qh = QueueHandler(q)
root.addHandler(qh)
ql.start()
for i in range(100000):
logging.info("msg:%d" % i)
ql.stop()
print datetime.datetime.now() - start
def std_logging_single_thread():
start = datetime.datetime.now()
logger = logging.getLogger()
hdlr = logging.FileHandler('test.out', 'w')
logger.addHandler(hdlr)
logger.setLevel(logging.DEBUG)
for i in range(100000):
logging.info("msg:%d" % i)
print datetime.datetime.now() - start
if __name__ == "__main__":
"""
Conclusion: std logging about 3 times slower so for 100K lines simple file write is ~1 sec while std
logging ~3. If threads are introduced some overhead causes to go to ~4 and if QueueListener and events
are used with enhancement for thread sleeping that goes to ~5 (probably because log records are being
inserted into queue).
"""
print "Testing"
#std_logging_single_thread() # 3.4
std_logging_queue_handler() # 7, 6, 7 (5 seconds with sleep optimization)
#nonstd_manual_threading() # 1.08
#std_manual_threading() # 4.3
My question is, why is there so much performance overhead with the logging module and are there any alternatives? Being a performance sensitive app does it even make sense to use the logging module?
p.s.: I have profiled the different scenarios and seems like LogRecord creation is expensive.
The stdlib logging
package provides a lot of flexibility and functionality for developers / devops / support staff, and that flexibility comes at some cost, obviously. If the need for performance trumps the need for flexibility, you need to go with something else. Did you take the steps to optimise described in the docs? A typical logging call takes of the order of tens of microseconds on reasonable hardware, which hardly seems excessive. However, logging in tight loops is seldom advisable, if only because the amount of info generated might take too much time to wade through.
The code to find the caller can be quite expensive, but is needed if you want e.g. filename and line number where the logging call was made.
QueueHandler
is intended for scenarios where the logging I/O will take significant time and can't be done in-band. For example, a web application whose logs need to be sent by email to site administrators cannot risk using SMTPHandler
directly, because the email handshake can be slow.
Don't forget that thread context switching in Python is slow. Did you try SocketHandler
? There is a suitable starting point in the docs for a separate receiver process that does the actual I/O to file, email etc. So your process is only doing socket I/O and not doing context switches just for logging. And using domain sockets or UDP might be faster still, though the latter is of course lossy.
There are other ways to optimise. For example, standard handlers in logging do locking around emit()
, for thread safety - if in a specific scenario under your control there is no contention for the handler, you could have a handler subclass that no-ops the lock acquisition and release. And so on.