Python: Lord of Strings (Part 1)

Python: Lord of Strings (Part 1)

Developers often work with text strings. Strings can be handy by themselves, but are much more useful when we’re able to modify them based on the results of input or calculations. Think about a web-based email system that tells you the last time you logged in.

Here’s a more detailed example: Imagine you are writing some Python code for an Internet of Things (IoT) project in your house and you need to process data from sensors connected to an Arduino or Raspberry Pi. You might, for example, have three digital sensors: one each for humidity, ambient light and air temperature. These values are sent over a web or serial interface and thus are reported as strings. From previous experimentation, you’ve determined the average nominal values for each of these sensors, and recorded those numbers in a dictionary. You’ve also written a for loop to iterate over each sensor and compare the reported value (contained in a list) to the nominal value, with the percentage of error printed out to the console. The index of each item in the reported value list corresponds to the key in dictionary containing the nominal values.

Being a good programmer, you’re interested how your error reporting code handles reported values which are too low, too high and way, way off. After a few run-and-fix cycles, the test program looks like this:

nominal = {0: 200, 1: 3000, 2: 50000}
reported = ["100", "5000", "15"]
for sensor in range(3):
   reported_num = float((reported[sensor]))
   error = (reported_num-(nominal[sensor]))/nominal[sensor]
   error_str = str(error)
   sensor_str = str(sensor)
   print("Sensor " + sensor_str + " reads an error of " + error_str)

Since the reported values are originally strings, we must first convert them via float() to numerical values before we can perform calculations with them. Almost right after that, we need convert the resulting numerical error values into strings because print() requires a string and the interpreter won’t allow an implicit cast to str.

The above code is great example of what I like to call the “stack and pack” approach. The “stack” consists of all the preparatory lines of code needed for converting our strings into numbers and then back into the substrings which we finally “pack” together with “+” into the final text string.

Although code written this way can be easy to follow, there are a number of problems with this example program:

Data Duplication:  The explicit type conversions create two intermediate variables (error_str and sensor_str) which contain the exact same information in strings as the original numerical variables did as floats.  This results in unnecessary memory use.

Inefficient “Pack”aging: Remember that building longer strings by concatenating shorter strings is neither fast nor memory efficient.

Verbosity: Due to extra type conversions, this program is longer than it really needs to be.  Half of the lines in that for loop are type conversions.

Unconstrained Output: By default, division in Python 3 returns a float and thus can include several trailing decimal places. This is obvious from the program’s output:

Sensor 0 reads an error of -0.5
Sensor 1 reads an error of 0.6666666666666666
Sensor 2 reads an error of -0.9997

It would be really nice to limit the number of decimal places with rounding. That second sensor was only 12-bit precision anyway.

As long as this program remains small and runs on a desktop or laptop, all these problems can be safely ignored…but this is just the beginning of your project, and this program will get longer over time. As any professional coder can attest, bad programming habits which can be ignored in small programs often grow into big problems in larger projects.

We could fix all the above problems if Python made it possible to:

Use a “template” string complete with all the unchanging parts as well as placeholders for the changing parts in one piece so that we could then avoid using +.

Perform the conversion to string only when we need to instead of having to waste memory by creating (holding onto) a string copy of the data we already have.

Control the formatting of our substituted substrings, to avoid excessive decimal places, for example.

…and do all the above at the same time in a way that is concise and flexible.

The features in our wish list actually fall under the umbrella term of string formatting.  Before the debut of literal string formatting in 3.6, none of the three existing ways to format strings in Python fulfilled all of these criteria.

In Part 2, we’ll see why.

Copyright © Python People