Caching is an inescapable and incredibly valuable part of any data pipeline or software system. When it comes to the Python ecosystem, there are several libraries that provide helpful APIs for caching data in-memory. In this guide, you will learn about the Expiring Dict library within the Python ecosystem and how you can use it as a performant caching mechanism within your own data pipeline.
The Expiring Dict library provides a single class, ExpiringDict
, which can be used to instantiate an in-memory cache.
At the time of this writing, the Expiring Dict library works with the following Python versions:
3.6
You can install Expiring DIct by using pip
or conda
.
To install using pip
:
1 pip install expiringdict
To install using conda
:
1 conda install -c conda-forge expiringdict
Once the library is downloaded, you can import it and start using the supplied ExpiringDict
type like this:
1 from expiringdict import ExpiringDict
You can create an in-memory cache using ExpiringDict
like this:
1 user_cache = ExpiringDict(max_len=50, max_age_seconds=25, items=None)
The ExpiringDict
class provides three named arguments that you can use during instantiation. The max_len
argument specifies how many items are allowed in your cache. When this limit is reached, the cache will start dumping items from itself to make room. The max_age_seconds
argument marks the time-to-live (TTL) of each item within the cache. Finally, the items
argument populates the cache initially. The type of the items
argument can be any of the following:
dict
OrderedDict
ExpiringDict
You can add an entry to your cache like this:
1 user_cache['user_one'] = 'Alfred'
To get a single value out of your cache, use the get
method. Here is an example:
1 user_one = user_cache.get('user_one')
2 print(user_one) # -> 'Alfred'
You can set a default value in case the key doesn't exist like this:
1 user_two = user_cache.get('user_two', default='Fallback Charlie')
2 print(user_two) # -> 'Fallback Charlie'
You can also get the TTL of a key-value pair like this:
1 user_one = user_cache.get('user_one', with_age=True)
2 print(user_one) # -> ('Alfred', 20)
The ExpiringDict
class provides the method items_with_timestamp
for allowing you to grab a list
of triples, which include, for each key, the key, value, and ttl. You can use this method like this:
1 users_with_timestamps = user_cache.items_with_timestamp()
2 print(users_with_timestamps) # Returns type list(('user_one', 'Alfred', 5))
There is also a ttl
method that will allow you to grab the TTL of a given key within the cache. Here is an example:
1 user_one_ttl = user_cache.ttl('user_one')
2 print(user_one) # -> 20
One of the nice things about the Expiring Dict library is that it ensures that all of the key, internal API methods are wrapped in locks so as to make altering the cache thread-safe. You can see how they achieve this by digging into the source code. The following is the source code for the internal API method, which grabs an item out of the ExpiringDict
cache:
1 def __getitem__(self, key, with_age=False):
2 """ Return the item of the dict.
3 Raises a KeyError if key is not in the map.
4 """
5 with self.lock:
6 item = OrderedDict.__getitem__(self, key)
7 item_age = time.time() - item[1]
8 if item_age < self.max_age:
9 if with_age:
10 return item[0], item_age
11 else:
12 return item[0]
13 else:
14 del self[key]
15 raise KeyError(key)
Notice the with self.lock
portion of the code. This piece of code creates a context guard around the fetching of the item from the internal OrderedDict
. This ensures that the access of the given key is conducted exclusively and means that, while using Expiring Dict, it is safe to attempt to get items and set items in the same cache across two different threads. The Expiring Dict library successfully locks core operations so that you don't have to manually keep track of locks within your own code. That's pretty nice!
In this guide, you learned about the Python Expiring Dict library and the caching capability that it provides. You are now well-equipped to use the provided ExpiringDict
class to create efficient, performant, and even multi-threaded, in-memory caches. You also know how to handle cache invalidation properly by setting TTLs on your instances of ExpiringDict
.
You can be confident in your ability to easily create in-memory caches in either a single-threaded or multi-threaded environment.