How to handle Sequences the Pythonic way?
Advanced tips for better sequence manipulation
Sequences and Collections are an integral part of any programming language, Python particularly handles them uniformly, ie, any type of sequence from Strings and XML elements to arrays, lists and Tuples are handled in a uniform way.
Understanding the variety of these sequences will spare us the reinvention of the wheel as the existing common sequence interface allows us to leverage and support any new sequence types.
The built-in sequences and how to group them?
There are 2 types of sequences:
The container sequences: these hold items of different types, including nested containers. Like list, tuple, and double-ended queues.
The flat sequences: these hold items of simple types. Like str and byte.
A container sequence holds references to the objects it contains, which may be of any type, while a flat sequence stores the value of its contents in its own memory space, not as distinct Python objects.
Another way to group sequences, and it’s Mutability VS Immutability:
Mutable sequences: sequences that can change their values, for example, list, bytearray, array.array, and collections.deque.
Immutable sequences: sequences that can not change their values, for example, tuple, str, and bytes.
ListComps and GenExps
List Comprehensions(ListComps) and Generator expressions(GenExps) are Python's solutions to build sequences in a readable and clear way.
The first deals mainly with Lists, the latter with any other type of sequences.
List Comprehensions
ListComp is a very concise way to create a list from an existing list, sometimes by operating on the existing items, using a simpler cleaner more compact syntax, all of this without having to deal with Python lambda.
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
newlistUsingForLoop = []
newlistUsingListComp = []
# using traditional for loop
for x in fruits:
if "a" in x:
newlistUsingForLoop.append(x)
print(newlistUsingForLoop) # ['apple', 'banana', 'mango']
# using ListComp
newlistUsingListComp = [x for x in fruits if "a" in x]
print(newlistUsingListComp) # ['apple', 'banana', 'mango']
# we can notice how easily readable the list comprehension version
Generator Expressions
A generator expression(GenExps) is an expression that returns a generator object, ie, a function that contains a yield statement and returns a generator object.
They use the same syntax as ListComps, but are enclosed in parentheses rather than brackets.
# create the generator object
squares_generator = (i * i for i in range(5))
# iterate over the generator and print the values
for i in squares_generator:
print(i)
# this will output the square of numbers from 0 to 4
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
for tshirt in (f'{c} {s}' for c in colors for s in sizes):
print(tshirt)
# black S
# black M
# black L
# white S
# white M
# white L
Tuples are Not Just Immutable Lists
Python presents Tuples as Immutable Lists, only this does not do them justice as they can be used as immutable lists and also as records with no field names.
Tuples as records
Tuples can serve as temporary records with no field names, this can be done only if the number of items is fixed and the order of items is respected as it is important.
coordinates = [(33.9425, -118.408056), (31.9425, -178.408056)]
for lat, _ in coordinates:
print(f 'cordinate latitude: {lat}')
# this will print the latitude each time ignoring the longitude value
There are two ways of creating tuples with named fields but in this instance, they will be regarded as data classes and it is not what we want to discuss here.
Tuples as immutable lists
Tuples are highly used in Python standard library as they are lists that do not change in size which brings clarity and performance optimization to the code.
a = (10, 'alpha', [1, 2])
b = (10, 'alpha', [1, 2])
print(a == b)
# True
b[-1].append(99
print(a == b)
# False
print(b)
# (10, 'alpha', [1, 2, 99])
There is one caveat to mention, as portrayed above in the code example, only the references contained in the tuple are immutable, the objects held in the references can change their values.
Changing the value of an item in a tuple can lead to serious bugs as tuples are hashable, it is better to use a different data structure for your specific use case.
How to Unpack and Pattern Match sequences?
Unpacking Sequences and Iterables
Unpacking is an operation that consists of assigning an iterable of values to a list of variables in a single assignment statement, it avoids unnecessary and error-prone use of indexes to extract elements from sequences.
coordinates = (33.9425, -118.408056)
latitude, longitude = coordinates # unpacking
print(latitude)
# 33.9425
print(longitude)
# -118.408056
We can use the excess for tuples and the excess * for dictionaries when unpacking too.
The target of unpacking can use nesting, ie, (a, b, (c, d)). Python will do the right thing if the value has the same nesting structure.
# Tuple Example
fruits = ("apple", "mango", "papaya", "pineapple", "cherry")
(green, *tropic, red) = fruits
print(green)
# apple
print(tropic)
# ['mango', 'papaya', 'pineapple']
print(red)
# cherry
# Dictionary Example
def myFish(**fish):
for name, value in fish.items():
print(f'I have {value} {name}')
fish = {
'guppies': 2,
'zebras' : 5,
'bettas': 10
}
myFish(**fish)
# I have 2 guppies
# I have 5 zebras
# I have 10 bettas
# Nested Unpacking Example
metro_areas = [
('Tokyo', 'JP', 36.933, (35.689722, 139.691667)),
('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),
('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),
('São Paulo', 'BR', 19.649, (-23.547778, -46.635833)),
]
print(f'{"":15} | {"latitude":>9} | {"longitude":>9}')
for name, _, _, (lat, lon) in metro_areas:
if lon <= 0:
print(f'{name:15} | {lat:9.4f} | {lon:9.4f}')
# | latitude | longitude
# Mexico City | 19.4333 | -99.1333
# New York-Newark | 40.8086 | -74.0204
# São Paulo | -23.5478 | -46.6358
Sequence Pattern Matching
Pattern matching is done with the match/case statement. It is very similar to the if-elif-else statement only cleaner more readable and wields the power of Destructuring.
Destructuring is a more advanced form of unpacking, as it allows writing sequence patterns as tuples or lists or any combination of both.
Here’s a nice example I found on the GUICommits website that could simplify pattern matching with sequences.
baskets = [
["apple", "pear", "banana"],
["chocolate", "strawberry"],
["chocolate", "banana"],
["chocolate", "pineapple"],
["apple", "pear", "banana", "chocolate"],
]
match basket:
# Matches any 3 items
case [i1, i2, i3]:
print(f"Wow, your basket is full with: '{i1}', '{i2}' and '{i3}'")
# Matches >= 4 items
case [_, _, _, *_] as basket_items:
print(f"Wow, your basket has so many items: {len(basket_items)}")
# 2 items. First should be chocolate, second should be strawberry or banana
case ["chocolate", "strawberry" | "banana"]:
print("This is a superb combination")
# Any amount of items starting with chocolate
case ["chocolate", *_]:
print("I don't know what you plan but it looks delicious")
# If nothing matched before
case _:
print("Don't be cheap, buy something else")
One thing to point out, it’s that the *_ matches any number of items, without binding them to a variable. Using * extra instead of *_ would bind the items to extra as a list with 0 or more items.
Time to Slice them Up
A common feature of list, tuple, str, and all sequence types in Python is the support of slicing operations.
We slice a list like this: seq[start, stop, step], step is the number of items to skip. To evaluate the expression seq[start:stop:step], Python calls The Special Method seq.getitem(slice(start, stop, step)).
# On a list
l = [10, 20, 30, 40, 50, 60]
l[:2] # [10, 20]
l[2:] # [30, 40, 50, 60]
l[:3] # [10, 20, 30]
# On a str
s = 'bicycle'
s[::3] # 'bye'
s[::-1] # 'elcycib'
s[::-2] # 'eccb'
# Add, Mul, Del, assign operations on slices
l = list(range(10)) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
l[2:5] = [20, 30] # [0, 1, 20, 30, 5, 6, 7, 8, 9]
del l[5:7] # [0, 1, 20, 30, 5, 8, 9]
print(5 * 'abcd') # 'abcdabcdabcdabcdabcd'
What's next?
There are different uses for different types of sequences, and depending on your use case, there are better options. Arrays, Memory views and Double ended queues are prime examples of that.
Memory Views
The built-in memoryview class is a shared-memory sequence type that lets you handle slices of arrays without copying bytes. Using notation similar to the array module, the memoryview.cast method lets you change the way multiple bytes are read or written as units without moving bits around.
memoryview.cast returns yet another memoryview object, always sharing the same memory.
octets = array('B', range(6)) # array of 6 bytes (typecode 'B')
m1 = memoryview(octets)
m1.tolist() # [0, 1, 2, 3, 4, 5]
m2 = m1.cast('B', [2, 3])
m2.tolist() # [[0, 1, 2], [3, 4, 5]]
m3 = m1.cast('B', [3, 2])
m3.tolist() # [[0, 1], [2, 3], [4, 5]]
m2[1,1] = 22
m3[1,1] = 33
print(octets) # array('B', [0, 1, 2, 33, 22, 5])
You can find more about Specialised sequence types on my Medium blog where I get into more detail about Sequences.
Further reading
The Full article on my Blog
“Data Structures” chapter of the Python Cookbook, 3rd edition
Extended Iterable Unpacking is the canonical source to read about the
uses of * extra.