This is my 'high-level' approach to profiling and speeding up the code in Python. This is by no means an exhaustive approach and will most likely only point out quick and easy places to speed up your code.
The approach is pretty standard for profiling any code: start at the top-level and methodically work your way down the codebase to find the bottlenecks.
This approach is designed to find the most obvious performance bottlenecks quickly and allow you to get back to coding. Fortunately, it's easy enough to run on your code periodically and find some pain points sooner rather than later.
It would be useful to automate some of this process into your Continuous Integration server to find performance regressions, but that is a different guide altogether.
The first step is to run the script/application with the built-in profiler and make sure to stress the slower or more interesting code sections:
1`python -m cProfile -o stats <my_awesome_code>.py`
Next, exit the application and use this little script to see where most of the running time was spent:
1[gist 4272487]
This script will output all the function calls your application performed and sort them by cumulative time.
This profile output represents an idea of where the majority of the processing time was spent. This may or not be useful at this point, but it can yield a few clues.
percall
columnncalls
columnIt's possible that steps one and two did not yield any useful performance increases. So, now is the time to drop down a level and narrow down on a single function to focus on.
Take another look at the profiling output from step one. Find the function with the largest percall
column. Now, profile each line in this function to see if there is something slow that can be easily refactored.
For this step a few additional profiling tools are needed:
You can find some documentation on the above here. In general, I've found that these are a little awkward to use and documentation is a bit lacking. Luckily, this simple approach to profiling doesn't need too much documentation.
The first, and probably most awkward, step is to place the @profile
decorator
around the function you're interested in profiling. Don't worry, there is nothing to import for @profile
because it's magic.
I've seen problems trying to use
@profile
on more than one function at a time.I wish
@profile
was done in a different way. The magic of inserting this into the__builtins__
really bothers me philosophically.
Next, run the kernprof.py
script with your script/application as the argument:
1`kernprof.py -v -l <script> <your_script_args>`.
Now, perform the operations that will cause the profiled function to run.
Finally, we can use the line_profiler
module to look at the results. The above invocation of the kernprof.py
script created a profiler data file, typically called <your_profiled_script>.lprof
.
Feed this data file to line_profiler
module, and it will print the timing of the function broken down by line:
1`python -m line_profiler <data_file>`
At this point, you will probably see a few lines that stand out, as far as the Time
and % Time
columns go. In my experience, the slowness at this level tends to be something like building a list with a lot of elements, constantly looking for the existence of an item in a big list, and other operations that deal with large iterables.
The easy part is finished. It's time to think about the algorithm and data structures. The process of improving your slow performing code is a bit outside of this post since that process is usually very specific to the code itself.
Remember all that algorithm complexity and Big O homework from college? Bust out your books, refresh your memory, and let the real journey begin.
This is a pretty classic case, but you would be surprised how often it shows up.
For example, consider a snippet of code that spends a lot of time looking for the existence of an object in a large list within a tight loop. This could be a great place to use a dictionary instead.
The dictionary will greatly improve your lookup time but will waste more memory. This is the classic trade-off and only you can decide if the increased memory usage is worth it in the context of your application.
As I mentioned in the beginning, this approach might leave you needing more speed. Luckily, there are still several options.
Some bottlenecks like storage I/O, network I/O, etc. are not going to easily show up with this type of profiling. This approach is a quick way to profile and fix CPU bound tasks.
Profiling and optimization is a very complicated topic, so my simple approach barely scratches the surface. This is a topic you will definitely want to learn more about if you are interested in becoming a better programmer. Luckily, there are some really great talks on this subject to help you learn more from real experts.