Here at HumanGeo we use a lot of Python, and it is tons of fun. Python is a great language for writing beautiful and functional code amazingly fast, and it is most definitely my favorite language to use both privately and professionally. However, even though it is a wonderful language, Python can be painfully slow. Luckily, there are some amazing tools to help profile your code so that you can keep your beautiful code fast.

When I started working here at HumanGeo, I was tasked with taking a program that took many hours to run, finding the bottlenecks, and then doing whatever I could to make it run faster. I used many tools, including cProfilePyCallGraph (source), and even PyPy (an alternate, fast, interpreter for Python), to determine the best ways to optimize the program. I will go through how I used all of these programs, except for PyPy (which I ruled out to maintain interpreter consistency in production), and how they can help even the most seasoned developers find ways to better optimize their code.

Disclaimer: do not prematurely optimize! I'll just leave this here.

Tools

Lets talk about some of the handy tools that you can use to profile Python code.

cProfile

The CPython distribution comes with two profiling tools, profile and cProfile. Both share the same API, and should act the same, however the former has greater runtime overhead, so we shall stick with cProfile for this post.

cProfile is a handy tool for getting a nice greppable profile of your code, and for getting a good idea of where the hot parts of your code are. Lets look at some sample slow code:

--> % cat slow.py
import time

def main():
    sum = 0
    for i in range(10):
    	sum += expensive(i // 2)
    return sum

def expensive(t):
    time.sleep(t)
    return t

if __name__ == '__main__':
    print(main())

Here we are simulating a long running program by calling time.sleep, and pretending that the result matters. Lets profile this and see what we find:

--> % python -m cProfile slow.py
20
         34 function calls in 20.030 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 __future__.py:48(<module>)
        1    0.000    0.000    0.000    0.000 __future__.py:74(_Feature)
        7    0.000    0.000    0.000    0.000 __future__.py:75(__init__)
       10    0.000    0.000   20.027    2.003 slow.py:11(expensive)
        1    0.002    0.002   20.030   20.030 slow.py:2(<module>)
        1    0.000    0.000   20.027   20.027 slow.py:5(main)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {print}
        1    0.000    0.000    0.000    0.000 {range}
       10   20.027    2.003   20.027    2.003 {time.sleep}

Now this is very trivial code, but I find this not as helpful as it could be. The list of calls is sorted alphabetically, which has no importance to us, and I would much rather see the list sorted by number of calls, or by cumulative run time. Luckily, the -s argument exists, and we can sort the list and see the hot parts of our code right now!

--> % python -m cProfile -s calls slow.py
20
         34 function calls in 20.028 seconds

   Ordered by: call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       10    0.000    0.000   20.025    2.003 slow.py:11(expensive)
       10   20.025    2.003   20.025    2.003 {time.sleep}
        7    0.000    0.000    0.000    0.000 __future__.py:75(__init__)
        1    0.000    0.000   20.026   20.026 slow.py:5(main)
        1    0.000    0.000    0.000    0.000 __future__.py:74(_Feature)
        1    0.000    0.000    0.000    0.000 {print}
        1    0.000    0.000    0.000    0.000 __future__.py:48(<module>)
        1    0.003    0.003   20.028   20.028 slow.py:2(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {range}

Ah! Now we see that the hot code is in our expensive function, which ends up being calling time.sleep enough times to cause an annoying slowdown.

The list of valid arguments to the -s parameter can be found in the Python documentation.  Make sure to use the output option, -o, if you want to save these results to a file!

With the basics down, lets look at some other ways that we can find hot code using profiling tools.

PyCallGraph

PyCallGraph can be seen as a visual extension of cProfile, where we can follow the flow of the code with a nice Graphviz image to look through. PyCallGraph is not part of the standard Python installation, and therefore can be simply installed with:

-> % pip install pycallgraph

We can run this graphical application with the following command:

-> % pycallgraph graphviz -- python slow.py
A pycallgraph.png file should be created in the directory where you ran the script, and it should give you some familiar results (if you have already run cProfile). The numbers should be the same as the data we got from cProfile, however, the benefit of PyCallGraph is in its ability to show the relationships of functions being called.

Let us look at what that graph looks like

This is so handy! It shows the flow of the program, and nicely notifies us of each function, module, and file that the program runs through, along with runtime and number of calls. Running this in a big application generates a large image, but with the coloring, it is quite easy to find the code that matters. Here is a graph from the PyCallGraph documentation, showing the flow of code involving complex regular expression calls:

source code of graph

What can we do with this information?

Once we determine the cause of the slow code, we can then properly choose a course of action to speed up our code. Lets talk about some possible solutions to slow code, given an issue.

I/O

If you find your code is heavily input/output dependant, including sending many web requests, then you may be able to solve this problem by using Python's standard threading module. Non I/O related threading is not suited for Python, due to  CPython's GIL, which precludes it from using more than one core at a time for code centric tasks.

Regex

You know what they say, once you decide to use regular expressions to fix a problem, you now have two problems. Regular expressions are hard to get right, and hard to maintain. I could write a whole separate blog post on this (and I will not, regexes are hard, and there are much better posts than one that I could write), but I will give a few quick tips:

  1. Avoid .*, greedy catchalls are slow, and using character classes as much as possible can help with this
  2. Do not use regex! many regexes can be solved with simple string methods anyways, such as str.startswith, and str.endswith. Check out the str documentation for tons of great info.
  3. Use re.VERBOSE! Python's Regex engine is great and super helpful, use it!

Thats all I will say on Regex, there are some great posts all over the internet if you want more information.

Python Code

In the case of the code I was profiling, we were running a Python function tens of thousands of times in order to stem English words. The best part about finding that this was the culprit was that this kind of operation is easily cachable. We were able to save the results of this function, and in the end, the code ran 10 times as fast as it did before. Creating a cache in Python is super easy:

from functools import wraps
def memoize(f):
    cache = {}
    @wraps(f)
    def inner(arg):
        if arg not in cache:
            cache[arg] = f(arg)
        return cache[arg]
    return inner

This technique is called memoization, and is shown being implemented as a decorator, which can be easily applied to Python functions as so:

import time
@memoize
def slow(you):
    time.sleep(3)
    print("Hello after 3 seconds, {}!".format(you))
    return 3

Now if we run this function multiple times, then the result will only be computed once.

>>> slow("Davis")
Hello after 3 seconds, Davis!
3
>>> slow("Davis")
3
>>> slow("Visitor")
Hello after 3 seconds, Visitor!
3
>>> slow("Visitor")
3

This was a great speedup for the project, and the code runs without a hitch.

disclaimer: make sure this is only used for pure functions! If memoization is used for functions with side effects such as I/O, then caching can have unintended results.

Other cases

If your code is not readily memoizable, your algorithm is not something crazy like O(n!), or your profile is 'flat', as in there are no obvious hot sections of code, then maybe you should look into another runtime or language. PyPy is a great option, along with possibly writing a C-extension for the meat of your algorithm. Luckily, what I was working on did not come to this, but the option is there if it is needed.

Conclusion

Profiling code can help you understand the flow of the project, where the hot code is, and what you can do as a developer to speed it up. Python profiling tools are great, super easy to use, and in depth enough to help you get to the root of the cause fast. Python is not meant to be a fast language, but that does not mean that you should be writing slow code! Take charge of your algorithms, do not forget to profile, and never prematurely optimize.

We’re hiring!  If you’re interested in geospatial, big data, social media analytics, Amazon Web Services (AWS), visualization, and/or the latest UI and server technologies, drop us an e-mail at info@thehumangeo.com.