Software Development is All About the Details

Software Development is All About the Details

In this article we’ll dive into a couple of recent Python changes and show how they illustrate the constant challenges and constraints faced by developers of large software packages like Python.

Fixes and Refixes for Expat

Last year, we discussed a vulnerability in the XML parser expat which allowed an overflow attack. That vulnerability was fixed in version 2.1.1 of expat, the version used in Python 2.7.12 and 3.4.5. The story didn’t end there, as we explained,

“As of this writing, however the current version of expat is 2.2.0, and one of the four security fixes in this version is an improvement to the ‘insufficient fix’ which expat 2.1.1 implemented. Python developers should expect this re-fix to eventually flow downstream into both Python 2.7 and 3.X.”

That change has landed. Python 3.6.2’s built-in version of expat was leveled up to 2.2.0 earlier this summer, and then almost immediately updated again to 2.2.1. Both changes occurred when 3.6.2 was still in release candidate stage.

Why the back-to-back version bumps? Two reasons: A refix and regression fix.

First, the refix. The original overflow vulnerability in expat (CVE-2015-1283) was fixed in 2.1.1, but that fix wasn’t optimal (resulting in CVE-2016-4472), so a better solution was implemented in expat 2.2.0 which also implemented a better fix for a related bug (CVE-2015-2716).

Now, the regression fix. Version 2.2.0 of expat introduced a new problem caused by a fix for CVE-2016-0718, a bug which allowed nefarious XML to cause a heap overflow, resulting in either an application crash (bad) or possible arbitrary code execution (really bad). This regression was fixed in expat 2.2.1.

Chronologically, these two versions of expat are separated by almost exactly one year. But due to the release cycle for Python not lining up with expat’s, Python 3.6.2rc2 (July 7, 2017) included expat 2.2.1, while Python 3.6.2rc1 (June 17, 2017), a minor version bump between release candidates of the same Python version in just under a month, while the upstream versions were separated by a full year.

Python on Windows Momentarily Lost its Random 

In its release candidate era, Python 3.6 briefly didn’t seed Random() well enough on Windows. Interestingly, the root cause of the problem stemmed from a combination of Windows’ system clock resolution (15ms) and a recent patch of os.random() on Linux. Since the process ID number and system time were used in some instances for random number generation, if a Python program requested two or more random numbers within that 15ms window, it would produce the same number more than once.

Small Issues, Big Lessons

None of the above two changes were big headline features or fixes for recent Python releases, so why did we bother to devote this month’s post to them?

Sometimes a closer look at how the sausage is made makes you appreciate its flavor even more. These issues illustrate important insights about Python and software development in general.

Did you have trouble following along with the CVE numbers, Python versions and expat version numbers—and which of those included or superseded the others?

We included all those dates and details to show that Python, like many other widely used software packages, not one monolithic piece of software, but instead consists of many individual pieces, including third party libraries like expat. Python only appears monolithic and stable due to the massive investment of time, effort and expertise of Python core developers (many of them volunteers). Good quality software requires investing in integration and testing.

Expat is only one of the pieces which comprise Python, and each of those pieces have their own release cycle. When a version of Python is released (whether it’s release candidate of an official release), it must be decided which upstream updates will be included in the next version and which ones will be skipped and incorporated into the next release. Among the factors that must be considered are the severity of security bugs, stability (use an older, more stable version of an upstream library or the newer, less tested one with more features?), and the desire to improve the software by adding new features.

Another lesson: in the rush to plug security holes, the first repair is often not the best solution, even though the quick fix may be the best decision at the time. Sometimes, as we saw with expat 2.2.0 and 2.2.1, the bugfixes introduce new bugs.

One of Python’s key features is that the language is cross-platform—the same code, even if it has to be modified for different operating systems—will run on Windows, macOS, or Linux. This is because Python (unlike C, C++ or assembly) was designed to shift the burden of dealing with OS differences from the developer who writes Python code to the developers who write Python itself. The Random()bug on Windows is a perfect example of the fact that Python users have it easy because the core Python developers constantly have to consider and address such technical minutiae as operating system clock resolution and finding high-quality sources of randomness.

Python continues to be a robust, high-quality programming language capable of enabling many of powering AI software, making our data conveniently accessible in the cloud, and harnessing the Internet of Things. That quality and robustness results from a large, active community of developers working behind the scenes, paying attention to the small things so that we can use Python to build the big things.

Copyright © Python People