Collection like objects in python

python

Collections are an important part of every programming language. In python, there is a couple of built-in collections like list, set, dictionary but I’m not going to dig into them now. In this post, I’ll explore what it takes to implement collection like objects on your own using collections protocol.

Slices

Before we start implementing (or emulating) collection like objects we should get familiar with slice object. It does exactly what the name suggests - helps with slicing of collections. Beside obvious attributes like start, stop and step which you can provide in the constructor it provides indices method. It can be used to determine start, stop and step attributes that are tailored to collection size provided as an argument.

s = slice(2, 10, 2)
s.start, s.stop, s.step
# (2, 10, 2)
s.indices(5), list(range(*s.indices(5)))
# ((2, 5, 2), [2, 4])

Notice how easy it is to convert the result of slice.indices to a range object which produces indexes of items requested in a slice.

Collections

class SlotsMachine:
    def __init__(self, slots_count):
        self._slots = list(range(1, slots_count+1))

    def __len__(self):
        return len(self._slots)

    def __getitem__(self, key):
        return self._slots[key]

    def __setitem__(self, key, value):
        self._slots[key] = value

    def __delitem__(self, key):
        del self._slots[key]

    def __iter__(self):
        return iter(self._slots)

    def __reversed__(self):
        return iter(reversed(self._slots))

    def __contains__(self, value):
        return value in self._slots

slot_machine = SlotsMachine(5)

There is a lot that’s happening in there. I’m delegating all operations to the “private” list instance so we’ll be able to focus on what methods do instead of how they are implemented.

# __len__
len(slot_machine)
# 5

# __getitem__
slot_machine[0], slot_machine[1:]
# (1, [2, 3, 4, 5])

# __setitem__
slot_machine[0] = 100

# __setitem__ with slice is also possible
slot_machine[1:2] = (20, 30)

# __delitem__
del slot_machine[0]

# __delitem__ with slice
del slot_machine[:2]

In __getitem__, __setitem__, __delitem__ key passed as an argument doesn’t have to be int or slice. It can be a string or any other object and you can implement your class in a way that simulates more dictionary-like behavior. If you want to handle negative indices in those methods it’s up to you to implement it (with the list as an underlying collection I can delegate all heavy lifting to it).

# __iter__
list(slot_machine), \
list(filter(lambda s: s > 10, slot_machine)), \
[s for s in slot_machine if s % 10 == 0]
# ([100, 20, 30, 3, 4, 5], [100, 20, 30], [100, 20, 30])

# __reveresed__
list(reversed(slot_machine)), list(slot_machine)
([5, 4, 3, 30, 20, 100], [100, 20, 30, 3, 4, 5])

Notice how implementing __iter__ method allows your objects to feel like native Python objects. With it, you can now do list comprehensions and all functional operations on it.

# __contains__
4 in slot_machine, 1000 in slot_machine
(True, False)

A similar time-saver can be __contains__ method which allows us to easily check if a particular object is inside.

Last on to mention is __missing__ method which doesn’t make a lot of sense unless you are extending dict class and is useful only when you don’t override __getitem__:

class DefaultDictLike(dict):
    def __init__(self, default_value):
        super().__init__()
        self.default_value = default_value

    def __missing__(self, key):
        return self.default_value(key)

d = DefaultDictLike(lambda key: "missing " + key)
d["asd"]
# 'missing asd'

__missing__ method is called from dict.__getitem__ method when item corresponding to a key is not found.

To give you another example on how you can use those methods to expand your DSL you can implement following class:

from collections import namedtuple

Book = namedtuple("Book", "name, isbn, author")

class Library:
    def __init__(self, *books):
        self.books = [*books]

    def __getitem__(self, key):
        book = next((b for b in self.books if Library._matches(b, key)), None)
        if book is None:
            raise KeyError(key)
        return book

    def __contains__(self, key):
        return any(book for book in self.books if Library._matches(book, key))

    def __iter__(self):
        return iter(self.books)

    @staticmethod
    def _matches(book, key):
        return book.name.lower() == key.lower() or book.isbn == key.lower()

library = Library(
    Book("First", "isbn1", "author1"),
    Book("Second", "isbn2", "author2"))

library["isbn1"], library["second"]
# (Book(name='First', isbn='isbn1', author='author1'),
# Book(name='Second', isbn='isbn2', author='author2'))

library["unknown isbn"]
# KeyError...

"isbn1" in library, "first" in library, "unknown" in library
# (True, True, False)

[book.name for book in library]
# ['First', 'Second']

Library instances can be used almost like something built into python. As we don’t have operator overloading in java it’s refreshing to have the possibility to add such synthetic sugar without problems or weird methods.

15 Apr 2020 #basics