I have around 6,00,000 entries in MongoDB
in the following format:
feature:category:count
where
I want to cache the top 1000 tuples, let's say so as not to query database each time.
How does one build an LRU cache in Python? Or are there any known solutions to this?
The LRU cache in Python3.3 has O(1) insertion, deletion, and search.
The design uses a circular doubly-linked list of entries (arranged oldest-to-newest) and a hash table to locate individual links. Cache hits use the hash table to find the relevant link and move it to the head of the list. Cache misses delete the oldest link and create a new link at the head of the linked list.
Here's a simplified (but fast) version in 33 lines of very basic Python (using only simple dictionary and list operations). It runs on Python2.0 and later (or PyPy or Jython or Python3.x):
class LRU_Cache:
def __init__(self, original_function, maxsize=1024):
# Link structure: [PREV, NEXT, KEY, VALUE]
self.root = [None, None, None, None]
self.root[0] = self.root[1] = self.root
self.original_function = original_function
self.maxsize = maxsize
self.mapping = {}
def __call__(self, *key):
mapping = self.mapping
root = self.root
link = mapping.get(key)
if link is not None:
link_prev, link_next, link_key, value = link
link_prev[1] = link_next
link_next[0] = link_prev
last = root[0]
last[1] = root[0] = link
link[0] = last
link[1] = root
return value
value = self.original_function(*key)
if len(mapping) >= self.maxsize:
oldest = root[1]
next_oldest = oldest[1]
root[1] = next_oldest
next_oldest[0] = root
del mapping[oldest[2]]
last = root[0]
last[1] = root[0] = mapping[key] = [last, root, key, value]
return value
if __name__ == '__main__':
p = LRU_Cache(ord, maxsize=3)
for c in 'abcdecaeaa':
print(c, p(c))