Profiling and optimizing Python code
Premature optimization is the root of all evil.
There are two opposite directions a programmer can take when writing a piece of software: coming up with an elegant software design or with an heavily optimized code. A good design leads to better readability and maintenance, often at the expense of pure performance. Conversely, highly optimized code tends to be more difficult to read, and can lead to bugs that are hard to fix.
It appears that of those two possible directions, design and code readability are more important at first than premature optimization. It might seem counterintuitive: why bother trying to write good code if the code architecture needs to be changed later anyway due to bad performance?
First, one needs to begin somewhere. After all, it is the normal life for any software to be progressively improved over time. And it is much easier to improve a well written code than a messy one filled up with hideous optimizing tricks. Second, the performance may not be that bad. The only way to know is to try with a first, well-written version. It may even not be necessary to optimize anything, if the performance is just good enough. Finally, and most importantly, knowing in advance which part of the code will require optimization is very often unpredictable. That's something I surprisingly discovered only recently despite many years of programming.
Whenever your code is too slow, you can, at least at first, make some guesses. Maybe that function, maybe this code snippet, is the bottleneck. This algorithm may have a too large complexity, the hard drive may be too slow when writing this file, allocating this amount of memory, binding this socket may take too much time, spawning this process may be the bottleneck, etc. In my experience, whatever you best guess is, you can be pretty confident that you'll be wrong. The vast majority of time, the actual bottleneck will be completely unexpected, sometimes incredibly stupid, sometimes very subtle.
The only way to know for sure is to profile. With the right tools, profiling code can be highly valuable. Whenever your code is becoming too slow, profile and find the bottleneck, the small part of the code that is taking too long. Very often, it is a small portion of the code that is responsible for most of the slow-down (Pareto principle).
I've never taken the time to play with the profiling tools in Python until recently, and I want to share here what I've been using lately.
Profiling Python code
The Python standard library contains the
cProfile
module for determining
the time that takes every Python function when running the code. The
pstats
module allows to read the profiling results. Third party profiling
libraries include in particular
line_profiler
for profiling code line after
line, and
memory_profiler
for profiling memory usage. All these tools are
very powerful and extremely useful when optimizing some code, but they might not
be very easy to use at first.
So, how to profile Python code? First, prepare a Python script which executes the code you want to profile. Ideally, this code should be deterministic (e.g., use a fixed seed if you use a pseudorandom number generator, etc.).
Then, use cProfile
to execute and profile this script. The cProfile
module
generates a binary file with all the information related to the profiling
session.
In order to convert this file into an human-readable format, one has to use the
pstats
module. I found the following solution convenient: I create a .bat file
with the following lines (this is for Windows):
python -m cProfile -o prof script.py
python dumpprof.py > stats.txt
The first line profiles 'script.py' and saves the result in a 'prof' file. The second line converts the 'prof' binary file into a text file.
The dumpprof.py
file contains the following code:
import pstats
pstats.Stats('prof').strip_dirs().sort_stats("cumulative").print_stats()
You can take a look to the Python documentation to find all the options
in the
pstats
module.
When I launch the .bat file, the script is executed,
and I get a text file with the profiling result, that is, with the time
each function took.
Here is a toy example with the following script:
import numpy as np
import numpy.random as rdn
# uncomment for line_profiler
# @profile
def test():
a = rdn.randn(100000)
b = np.repeat(a, 100)
c = b ** 2
test()
An excerpt of the output of cProfile
on my laptop is:
10895 function calls (10706 primitive calls) in 0.472 CPU seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.474 0.474 {execfile}
1 0.025 0.025 0.474 0.474 script.py:1(<module>)
1 0.009 0.009 0.240 0.240 __init__.py:106(<module>)
1 0.074 0.074 0.209 0.209 script.py:5(test)
[...]
1 0.000 0.000 0.126 0.126 fromnumeric.py:300(repeat)
1 0.126 0.126 0.126 0.126 {method 'repeat' of 'numpy.ndarray' objects}
[...]
1 0.009 0.009 0.009 0.009 {method 'randn' of 'mtrand.RandomState' objects}
The test
function took 209 ms, among which 126 were spent in the repeat
function (allocating and copying large amounts of data takes time). But
we don't have directly the time that the square operation took.
For this,
we can use line_profiler
to profile the code line by line.
After installation (using
Christoph Gohlke website
on Windows), you can use the kernprof.py
module. I had to copy it from the
Python scripts directory (e.g. C:\Python27\Scritps
) to my script directory.
Then, add the @profile
decorator to every function you want to profile line
by line. Finally, use the following command:
python -m kernprof -l -v script.py > statsline.txt
This executes the script and generates a text file with the line profiler output. Here is the output on the toy example:
Wrote profile results to script.py.lprof
Timer unit: 5.13284e-07 s
File: script.py
Function: test at line 4
Total time: 0.205144 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 @profile
5 def test():
6 1 18069 18069.0 4.5 a = rdn.randn(100000)
7 1 241293 241293.0 60.4 b = np.repeat(a, 100)
8 1 140307 140307.0 35.1 c = b ** 2
The percentage of time and the actual time (in unit of the Timer unit, about 0.5 microseconds here) spent on each line is given. This line by line profiling can be extremely valuable when optimizing a complex function.
Conclusion
That's it for this quick introduction. In conclusion:
- Do not try to optimize you code prematurely: always favor code readability versus unnecessary optimization.
- Do not try to optimize unless you really need to: very often, good enough performance is better than over-optimization.
- Always profile your code before optimizing it: you need to know exactly the portion of your code that needs to be optimized in priority.
- When encountering a bottleneck, do not guess where it's hiding: you'll probably be wrong.
- In Python, you can profile your code thanks to the following tools:
cProfile
,line_profiler
andmemory_profiler
.
Notes
Related links: