Caching is an inescapable and incredibly valuable part of any data pipeline or software system. When it comes to the Python ecosystem, there are several libraries that provide helpful APIs for caching data in-memory. In this guide, you will learn about the Expiring Dict library within the Python ecosystem and how you can use it as a performant caching mechanism within your own data pipeline.
The Expiring Dict library provides a single class,
ExpiringDict, which can be used to instantiate an in-memory cache.
At the time of this writing, the Expiring Dict library works with the following Python versions:
You can install Expiring DIct by using
To install using
pip install expiringdict
To install using
conda install -c conda-forge expiringdict
Once the library is downloaded, you can import it and start using the supplied
ExpiringDict type like this:
from expiringdict import ExpiringDict
You can create an in-memory cache using
ExpiringDict like this:
user_cache = ExpiringDict(max_len=50, max_age_seconds=25, items=None)
ExpiringDict class provides three named arguments that you can use during instantiation. The
max_len argument specifies how many items are allowed in your cache. When this limit is reached, the cache will start dumping items from itself to make room. The
max_age_seconds argument marks the time-to-live (TTL) of each item within the cache. Finally, the
items argument populates the cache initially. The type of the
items argument can be any of the following:
You can add an entry to your cache like this:
user_cache['user_one'] = 'Alfred'
To get a single value out of your cache, use the
get method. Here is an example:
user_one = user_cache.get('user_one') print(user_one) # -> 'Alfred'
You can set a default value in case the key doesn't exist like this:
user_two = user_cache.get('user_two', default='Fallback Charlie') print(user_two) # -> 'Fallback Charlie'
You can also get the TTL of a key-value pair like this:
user_one = user_cache.get('user_one', with_age=True) print(user_one) # -> ('Alfred', 20)
ExpiringDict class provides the method
items_with_timestamp for allowing you to grab a list
of triples, which include, for each key, the key, value, and ttl. You can use this method like this:
users_with_timestamps = user_cache.items_with_timestamp() print(users_with_timestamps) # Returns type list(('user_one', 'Alfred', 5))
There is also a
ttl method that will allow you to grab the TTL of a given key within the cache. Here is an example:
user_one_ttl = user_cache.ttl('user_one') print(user_one) # -> 20
One of the nice things about the Expiring Dict library is that it ensures that all of the key, internal API methods are wrapped in locks so as to make altering the cache thread-safe. You can see how they achieve this by digging into the source code. The following is the source code for the internal API method, which grabs an item out of the
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
def __getitem__(self, key, with_age=False): """ Return the item of the dict. Raises a KeyError if key is not in the map. """ with self.lock: item = OrderedDict.__getitem__(self, key) item_age = time.time() - item if item_age < self.max_age: if with_age: return item, item_age else: return item else: del self[key] raise KeyError(key)
with self.lock portion of the code. This piece of code creates a context guard around the fetching of the item from the internal
OrderedDict. This ensures that the access of the given key is conducted exclusively and means that, while using Expiring Dict, it is safe to attempt to get items and set items in the same cache across two different threads. The Expiring Dict library successfully locks core operations so that you don't have to manually keep track of locks within your own code. That's pretty nice!
In this guide, you learned about the Python Expiring Dict library and the caching capability that it provides. You are now well-equipped to use the provided
ExpiringDict class to create efficient, performant, and even multi-threaded, in-memory caches. You also know how to handle cache invalidation properly by setting TTLs on your instances of
You can be confident in your ability to easily create in-memory caches in either a single-threaded or multi-threaded environment.