Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Quick Profiling in Python

Find the most quick and easy performance bottlenecks when profiling your code by using this high-level approach to profiling and speeding up code in Python.

Apr 17, 2019 • 7 Minute Read

Introduction

This is my 'high-level' approach to profiling and speeding up the code in Python. This is by no means an exhaustive approach and will most likely only point out quick and easy places to speed up your code.

The approach is pretty standard for profiling any code: start at the top-level and methodically work your way down the codebase to find the bottlenecks.

This approach is designed to find the most obvious performance bottlenecks quickly and allow you to get back to coding. Fortunately, it's easy enough to run on your code periodically and find some pain points sooner rather than later.

It would be useful to automate some of this process into your Continuous Integration server to find performance regressions, but that is a different guide altogether.

Step One: Profile Entire Application

The first step is to run the script/application with the built-in profiler and make sure to stress the slower or more interesting code sections:

      `python -m cProfile -o stats <my_awesome_code>.py`
    

Next, exit the application and use this little script to see where most of the running time was spent:

      gist 4272487
    

This script will output all the function calls your application performed and sort them by cumulative time.

Step Two: Reduce Calls to Expensive Code

This profile output represents an idea of where the majority of the processing time was spent. This may or not be useful at this point, but it can yield a few clues.

  1. Take a look at the percall column
    • Is any particular function taking a big chunk of time in a single call?
  2. Take a look at the ncalls column
    • Can you reduce the number of times you call this function?

Step Three: Focus on a Single Function

It's possible that steps one and two did not yield any useful performance increases. So, now is the time to drop down a level and narrow down on a single function to focus on.

Take another look at the profiling output from step one. Find the function with the largest percall column. Now, profile each line in this function to see if there is something slow that can be easily refactored.

For this step a few additional profiling tools are needed:

  1. Line Profiler
  2. Kernprof.py

You can find some documentation on the above here. In general, I've found that these are a little awkward to use and documentation is a bit lacking. Luckily, this simple approach to profiling doesn't need too much documentation.

The first, and probably most awkward, step is to place the @profile decorator around the function you're interested in profiling. Don't worry, there is nothing to import for @profile because it's magic.

I've seen problems trying to use @profile on more than one function at a time.

I wish @profile was done in a different way. The magic of inserting this into the __builtins__ really bothers me philosophically.

Next, run the kernprof.py script with your script/application as the argument:

      `kernprof.py -v -l <script> <your_script_args>`.
    

Now, perform the operations that will cause the profiled function to run.

Finally, we can use the line_profiler module to look at the results. The above invocation of the kernprof.py script created a profiler data file, typically called <your_profiled_script>.lprof.

Feed this data file to line_profiler module, and it will print the timing of the function broken down by line:

      `python -m line_profiler <data_file>`
    

At this point, you will probably see a few lines that stand out, as far as the Time and % Time columns go. In my experience, the slowness at this level tends to be something like building a list with a lot of elements, constantly looking for the existence of an item in a big list, and other operations that deal with large iterables.

Step Four: Back to Basics

The easy part is finished. It's time to think about the algorithm and data structures. The process of improving your slow performing code is a bit outside of this post since that process is usually very specific to the code itself.

Remember all that algorithm complexity and Big O homework from college? Bust out your books, refresh your memory, and let the real journey begin.

Example Optimization

This is a pretty classic case, but you would be surprised how often it shows up.

For example, consider a snippet of code that spends a lot of time looking for the existence of an object in a large list within a tight loop. This could be a great place to use a dictionary instead.

The dictionary will greatly improve your lookup time but will waste more memory. This is the classic trade-off and only you can decide if the increased memory usage is worth it in the context of your application.

Still Slow?

As I mentioned in the beginning, this approach might leave you needing more speed. Luckily, there are still several options.

  1. Cython

    • Cython is a common approach to speeding up Python code. It provides a way to add data type information to your existing code and other mechanisms to generally push your code closer to the C layer.
  2. PyPy

    • PyPy is an alternative implementation of the Python language spec. PyPy is, in a nutshell, Python implemented in Python. A real explanation of PyPy is beyond this guide and my expertise. However, if you're interested in languages and some serious Computer Science, you should definitely look into it. It will test your thought process, which is always a good thing.
  3. Numba

    • Numba is a concept similar to Cython. It uses the ever popular LLVM compiler toolchain to achieve super fast results. Another benefit of Numba is that it's written by the great folks at Continuum Analytics. Thus, Numba has some of the founders of the scientific Python community and numpy behind it. This project is still relatively young, but I expect big things from it in the future.

Caveats

Some bottlenecks like storage I/O, network I/O, etc. are not going to easily show up with this type of profiling. This approach is a quick way to profile and fix CPU bound tasks.

Conclusion

Profiling and optimization is a very complicated topic, so my simple approach barely scratches the surface. This is a topic you will definitely want to learn more about if you are interested in becoming a better programmer. Luckily, there are some really great talks on this subject to help you learn more from real experts.