6 Practical Consequences of Set Theory in Python

6 Practical Consequences of Set Theory in Python

A Guide to Implement and Use Sets efficiently

Python has had sets for a while, but they aren't used as much as lists. Nevertheless, they can be very beneficial in certain cases.

What Are Sets?

A set is a collection of unique objects. A basic use case is removing duplication.

test_set = ['spam', 'spam', 'eggs', 'spam', 'bacon', 'eggs']
print(set(test_set))
# {'eggs', 'spam', 'bacon'}
print(list(set(l)))
#['eggs', 'spam', 'bacon']

There are two types of sets in Python:

  • The mutable set Type.

  • The immutable sibling frozenset Type.

The set Type itself is not hashable however, the set elements must be hashable to enforce uniqueness.

The frozenset Type is hashable since it's immutable by default.

In addition to this, set Type implements many operations as Infix Operators, for example, given two sets a and b:

  • a | b returns their union.

  • a & b computes the intersection.

  • a - b the difference.

  • a ^ b the symmetric difference.

# Defining the two sets
first_set = {1, 5, 7, 4, 5}
second_set = {4, 5, 6, 7, 8}
# Creating the intersection of the two sets
intersect_set = first_set & second_set
print(intersect_set) # Output: {4, 5}
# Creating the union of the two sets
union_set = first_set | second_set
print(union_set) # Output: {1, 2, 3, 4, 5, 6, 7, 8}
# Creating the difference of the two sets
diff_set = first_set - second_set
print(diff_set) # Output: {1, 2, 3}

How do I create a Set?

Python provides different ways to create or instantiate set objects, the first is through Literal Syntax and the second is Set Comprehension, and this comes as no surprise since a set is a Python sequence.

Set Literals

The standard string representation of sets always uses the {…} notation.

The syntax is quite simple with the one caveat, there is no way to create an empty Set, so we must call the constructor directly.

not_an_empty_set = {1, 5, 7, 4, 5} # Set Literal
empty_sest = set() # Empty Set

The syntax {1, 2, 3} is quicker and easier to read than writing out set([1, 2, 3]). It's slower to evaluate the latter form since Python has to check the set name, build a list, and afterwards pass it to the constructor.

Set Comprehension

SetComp is similar to listComp and dictComp, it’s the creation of a set based on existing iterables.

new_set = {s for s in [1, 2, 1, 0]}
print(new_set) 
# set([0, 1, 2])

Practical Consequences of Using Sets

The set Type and frozenset Type are implemented with a hash table, and this can have certain effects:

  • Set elements must be hashable objects. They must implement proper "hash" and "eq" methods.

  • Membership testing is very efficient. A set may have millions of elements, but an element can be located directly by computing its hash code and deriving an index offset.

  • Sets have a significant memory overhead, sometimes a low-level array would be more compact and efficient.

  • If two elements are different but have the same hash code, their position depends on which element is added first, so basically, element ordering depends on the insertion order.

  • Adding elements to a set may change the order of existing elements. That’s because the algorithm becomes less efficient if the hash table is more than two-thirds full.

  • Dict views implement similar operations to set operations, using set operators with views will save a lot of loops and ifs when inspecting the contents of dictionaries in your code.

Further Reading

Did you find this article valuable?

Support Algoryst's Corner by becoming a sponsor. Any amount is appreciated!