Author avatar

Zachary Bennett

Explore Python Libraries: In-Memory Caching Using Expiring Dict

Zachary Bennett

  • Jul 22, 2020
  • 5 Min read
  • 339 Views
  • Jul 22, 2020
  • 5 Min read
  • 339 Views
Data
Data Analytics
Machine Learning

Introduction

Caching is an inescapable and incredibly valuable part of any data pipeline or software system. When it comes to the Python ecosystem, there are several libraries that provide helpful APIs for caching data in-memory. In this guide, you will learn about the Expiring Dict library within the Python ecosystem and how you can use it as a performant caching mechanism within your own data pipeline.

Expiring Dict Installation

The Expiring Dict library provides a single class, ExpiringDict, which can be used to instantiate an in-memory cache.

At the time of this writing, the Expiring Dict library works with the following Python versions:

  • 2
  • 2.7
  • 3
  • 3.6

You can install Expiring DIct by using pip or conda.

To install using pip:

1
    pip install expiringdict
bash

To install using conda:

1
    conda install -c conda-forge expiringdict
bash

Once the library is downloaded, you can import it and start using the supplied ExpiringDict type like this:

1
    from expiringdict import ExpiringDict
python

Expiring Dict API

You can create an in-memory cache using ExpiringDict like this:

1
    user_cache = ExpiringDict(max_len=50, max_age_seconds=25, items=None)
python

The ExpiringDict class provides three named arguments that you can use during instantiation. The max_len argument specifies how many items are allowed in your cache. When this limit is reached, the cache will start dumping items from itself to make room. The max_age_seconds argument marks the time-to-live (TTL) of each item within the cache. Finally, the items argument populates the cache initially. The type of the items argument can be any of the following:

  • dict
  • OrderedDict
  • ExpiringDict

Getting/Setting Values

You can add an entry to your cache like this:

1
    user_cache['user_one'] = 'Alfred'
python

To get a single value out of your cache, use the get method. Here is an example:

1
2
    user_one = user_cache.get('user_one')
    print(user_one) # -> 'Alfred'
python

You can set a default value in case the key doesn't exist like this:

1
2
    user_two = user_cache.get('user_two', default='Fallback Charlie')
    print(user_two) # -> 'Fallback Charlie'
python

You can also get the TTL of a key-value pair like this:

1
2
    user_one = user_cache.get('user_one', with_age=True)
    print(user_one) # -> ('Alfred', 20) 
python

Miscellaneous API Methods

The ExpiringDict class provides the method items_with_timestamp for allowing you to grab a list of triples, which include, for each key, the key, value, and ttl. You can use this method like this:

1
2
   users_with_timestamps = user_cache.items_with_timestamp()
   print(users_with_timestamps) # Returns type list(('user_one', 'Alfred', 5))
python

There is also a ttl method that will allow you to grab the TTL of a given key within the cache. Here is an example:

1
2
   user_one_ttl = user_cache.ttl('user_one')
   print(user_one) # -> 20
python

Multi-threaded Caching

One of the nice things about the Expiring Dict library is that it ensures that all of the key, internal API methods are wrapped in locks so as to make altering the cache thread-safe. You can see how they achieve this by digging into the source code. The following is the source code for the internal API method, which grabs an item out of the ExpiringDict cache:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
    def __getitem__(self, key, with_age=False):
        """ Return the item of the dict.
        Raises a KeyError if key is not in the map.
        """
        with self.lock:
            item = OrderedDict.__getitem__(self, key)
            item_age = time.time() - item[1]
            if item_age < self.max_age:
                if with_age:
                    return item[0], item_age
                else:
                    return item[0]
            else:
                del self[key]
                raise KeyError(key)
python

Notice the with self.lock portion of the code. This piece of code creates a context guard around the fetching of the item from the internal OrderedDict. This ensures that the access of the given key is conducted exclusively and means that, while using Expiring Dict, it is safe to attempt to get items and set items in the same cache across two different threads. The Expiring Dict library successfully locks core operations so that you don't have to manually keep track of locks within your own code. That's pretty nice!

Conclusion

In this guide, you learned about the Python Expiring Dict library and the caching capability that it provides. You are now well-equipped to use the provided ExpiringDict class to create efficient, performant, and even multi-threaded, in-memory caches. You also know how to handle cache invalidation properly by setting TTLs on your instances of ExpiringDict.

You can be confident in your ability to easily create in-memory caches in either a single-threaded or multi-threaded environment.

1