PyPy: Faster Than an Unladen Swallow (Part 2)

In our last article, we discussed some of the alternative runtimes available for Python and the potential benefits of using them. Of the many we mentioned, we focused mostly on Pyston and PyPy.

But there’s a question we left unanswered: Why did Dropbox pull the plug on Pyston after less than three years and despite substantial performance improvements?

Why Speeding Up Python is Hard

Before we delve into that question, let’s back up a bit and gain a little more context. There are a lot of challenges (both technical and non-technical) to implementing a performance-oriented alternative Python runtime:

Compatibility: Achieving and maintaining compatibility with CPython’s syntax is only a first step. Any successful alternative Python implementation must also be compatible with at least a critical subset of the standard and third-party Python modules. There are other compatibility issues which creep up as well, and PyPy’s compatibility page provides examples of some of them.

Porting to other architectures: Implementing a new Python runtime requires more direct interfacing with the underlying microprocessor architecture. The more architectures supported, the more work (and money) is required.

Consistency of speedup: Not all Python code will run significantly faster, and the degree of improvement will vary depending on the algorithm and how that algorithm is implemented by the coder. This is why we see large differences in the speedup benchmarks between different tests from both PyPy and Pyston.

Managing tradeoffs: Whether it’s additional syntax for the developer to learn to use effectively, longer and/or less readable code, navigating around some of the compatibility issues, quirks of the new memory management scheme, etc . . . there are always practical tradeoffs which need to be made either in the alternative runtime itself or by the coders who use it.

That’s because not everything can be implemented perfectly all the time due to. . .

Scarce development resources: The number of developers contributing to a project, their time, and the project’s funding are all finite. This is true not only for open-source efforts like PyPy (mostly a volunteer effort with some outside funding), but for company-backed projects like Pyston.

Why Dropbox Dropped Pyston

Given this context, the reasons Dropbox gave for ending support of Pyston make a lot of sense.

Since Dropbox has heavily relied on Python in the past, investing in Pyston makes business sense since it would have lead to Dropbox’s own products requiring fewer CPU resources for the same workload, thus leading to decreased operational costs.

Recently, however Dropbox is increasingly using other languages like Go for high-performance code, thus the company is no longer as reliant on Python as it once was. Add to that the unexpected time investment needed to fix Pyston’s memory management and compatibility issues.

In short, launching Pyston was business decision, and when the business incentive was no longer as compelling, supporting the project no longer made much business sense.

Since PyPy is the lone leading high-performance runtime for Python, we’ll now explain how PyPy speeds up Python code.

What Makes PyPy Faster?

In short, PyPy is a tracing just-in-time (JIT) compiler. Here’s good summary of what that means from PyPy’s website:

A JIT like PyPy's works based on the assumption that the only thing worth optimizing are loops that are executed often. Whenever the interpreter enters a loop in the interpreted program, the JIT records what the interpreter does, creating a trace. This trace is optimized, compiled to machine code and executed when the loop is hit with the conditions observed during tracing

Due to this JIT tracing, PyPy is often able to cut through the layers of programming abstraction which normally cause Python code to slow down, effectively giving developers “abstraction for free” as core PyPy developer Antonio Cuni explains and demonstrates in this EuroPython Conference talk from last year.

Abstraction without a performance penalty is a great benefit for developers, because abstraction makes it easier for developers to express their ideas in code.

Optimization, on the other hand, forces the developer to state more clearly those ideas so that the computer can do more efficiently what the developer intends it to do. A perfect example of this can be found in this article which discusses how to speed up large matrix multiplication via optimization (e.g. rewriting the Python code in C, using AVX instructions) and by using PyPy.

So how can developers harness PyPy’s potential from their existing code?

Effectively Harnessing PyPy

Fortunately, the PyPy project itself provides advice to guide coders along the process of writing more performant Python code. Only after a developer has properly profiled their code, tightened the loops, and optimized their algorithm to avoid quadratic costs should he or she then attempt to implement PyPy-specific optimizations. One such optimization is loop unrolling, which in certain instances can increase execution speed by a factor 80. Although, PyPy’s biggest advantage is that it can provide an instant speed boost to almost any Python code without modifying that code first.

When speed matters, developers can turn to PyPy to boost the performance their code. Thanks to its high compatibility and speed advantages PyPy is a valuable tool in a Python developer’s toolbox.