What Python Lists Are (and Aren’t) Good For

What Python Lists Are (and Aren’t) Good For

One of the Python’s basic data types is the list. Lists are a lot like arrays in C or C++ in that a Python list is a sequential collection of data.  Python lists, unlike C/C++ arrays, aren’t restricted to containing the same type of data in every element.  This means a single Python list can contain a floating point number, a text string, a boolean value and even another list at the same time.

There is a more fundamental difference between Python lists and C arrays: their relationship to text strings. In C, a text string is merely an array of characters.  In Python,  strings are the fundamental text data type (and thus there is no char data type).  This means that a single letter “P” is simply a string with a length of 1.

Even though lists and strings are different basic data types in Python, developers can access elements in the list (and sub-strings in a string) by both index and slice notation.  This makes sense as both lists and strings are basically sequences.  This convenient and logical overlap between strings and lists is one of the many reasons why Python is often called a “high-level” language.  Developers can easily use lists and strings in their code, despite the “low-level” distinction between “mutable sequences, typically used to store collections of homogeneous items” and UTF-8 default encoded bytes.

So now that we know what lists are, what are they good for?

Since lists are sequences, we can use a loop to iterate over them, performing some action (or not) depending on each element depending on what value it contains.  Since lists are mutable (“modifiable in place”) we could even delete elements or even insert new elements into the list, changing the size of the list while we are looping.  We could, but we shouldn’t.  In fact, Python’s official tutorial has a specific warning against doing exactly that.

Astute readers of this blog will notice that the “safer” approach recommended in the that link was exactly the one taken by the log condensing program in one of our earlier posts.  That combination of iterating over a list produced by the str.split() method and storing a modified version in a new “output” list is a very powerful tool for when you need to condense, analyze or otherwise process text.

Lists are also very good for implementing a stack.  The list methods pop() and append() make this easy.  Since only the end of the list gets removed, none of the other elements change their place.  This means using a list as a stack is very fast.

Lists are a great tool for many string manipulation applications since it is faster (and more memory efficient) to str.join() a list of sub-strings than it is to  use + to concatenate them:

substring_list = ["M", "A", "S", "H"]

print("*".join(substring_list))

These two lines are able to output the text string M*A*S*H while using much less memory than:

print(“M” + “A” + “S” + “H”)

Lists are really useful for easy multiprocessing using “data parallelism” via Pool():

from multiprocessing import Pool

def f(x):

   return x*x

num_list = [1, 2, 3]

if __name__ ‘__main__’:

   with Pool(2) as p:

      print(p.map(f, num_list))

The above example code is straight from the official Python documentation with one minor twist: I made the number of worker processes in the Pool() less than the number of elements in the list because I wanted to show that Pool()will handle assigning data to the worker processes (after they finish their previous input) until all the data is processed for you. Compare the above multiprocessing code to the equivalent iterative approach:

def f(x):

   return x*x

num_list = [1,2,3]

for p in num_list:

   print(f(p))

Compared to this, the implementation using Pool() provides the performance boost of parallel processing with very little added code length and complexity.

What Aren’t Lists Good For?

Lists are a bad match for situations where you are generating or reading in a lot of elements, only to iterate over them one by one anyway.  Recall our previous discussion of iterables vs. lists when it comes to memory usage.

Lists are bad for FIFO (First-In-First-Out) queues because as elements are added at the beginning of the list, all the other elements have shift over by one. These extra shifting operations make FIFO queues with lists slower than equivalent list stacks.

Python lists can be fast and powerful abstractions when used appropriately.  Due to the thoughtful design of this basic Python data type, programmers can more easily tackle development tasks both mundane and massive.

Copyright © Python People