How you make iterators and iterables
There are two ways to do this:
- Implement
__iter__
toreturn self
and nothing else, implement__next__
on the same class. You’ve written an iterator. - Implement
__iter__
to return some other object that follows the rules of #1 (a cheap way to do this is to write it as a generator function so you don’t have to hand-implement the other class). Don’t implement__next__
. You’ve written an iterable that is not an iterator.
For correctly implemented versions of each protocol, the way you tell them apart is the __iter__
method. If the body is just return self
(maybe with a logging statement or something, but no other side-effects), then either it’s an iterator, or it was written incorrectly. If the body is anything else, then either it’s a non-iterator iterable, or it was written incorrectly. Anything else is violating the requirements for the protocols.
In case #2, the other object would be of another class by definition (because you either have an idempotent __iter__
and implement __next__
, or you only have __iter__
, without __next__
, which produces a new iterator).
Why the protocol is designed this way
The reason you need __iter__
even on iterators is to support patterns like:
iterable = MyIterable(...)
iterator = iter(iterable) # Invokes MyIterable.__iter__
next(iterator, None) # Throw away first item
for x in iterator: # for implicitly calls iterator's __iter__; dies if you don't provide __iter__
The reason you always return a new iterator for iterables, rather than just making them iterators and resetting the state when __iter__
is invoked is to handle the above case (if MyIterable
just returned itself and reset iteration, the for
loop’s implicit call to __iter__
would reset it again and undo the intended skip of the first element) and to support patterns like this:
for x in iterable:
for y in iterable: # Operating over product of all elements in iterable
If __iter__
reset itself to the beginning and only had a single state, this would:
- Get the first item and put it in
x
- Reset, then iterate through the whole of
iterable
putting each value iny
- Try to continue outer loop, discover it’s already exhausted, never give any other value to
x
It’s also needed because Python assumes that iter(x) is x
is a safe, side-effect free way to test if an iterable is an iterator. If your __iter__
modifies your own state, it’s not side-effect free. At worst, for iterables, it should waste a little time making an iterator that is immediately thrown away. For iterators, it should be effectively free (since it just returns itself).
To answer your questions directly:
Does this mean you can put
__iter__()
and__next__()
in two different objects?
For iterators, you can’t (it must have both methods, though __iter__
is trivial). For non-iterator iterables, you must (it must only have __iter__
, and return some other iterator object). There is no “can”.
Can it be done for objects belonging to different classes?
Yes.
Can it only be done for objects belonging to different classes?
Yes.
Examples
Example of iterable:
class MyRange:
def __init__(self, start, stop):
self.start = start
self.stop = stop
def __iter__(self):
return MyRangeIterator(self) # Returns new iterator, as this is a non-iterator iterable
# Likely to have other methods (because iterables are often collections of
# some sort and support many other behaviors)
# Does *not* have __next__, as this is not an iterator
Example of iterator:
class MyRangeIterator: # Class is often non-public and or defined inside the iterable as
# nested class; it exists solely to store state for iterator
def __init__(self, rangeobj): # Constructed from iterable; could pass raw values if you preferred
self.current = rangeobj.start
self.stop = rangeobj.stop
def __iter__(self):
return self # Returns self, because this is an iterator
def __next__(self): # Has __next__ because this is an iterator
retval = self.current # Must cache current because we need to modify it before we return
if retval >= self.stop:
raise StopIteration # Indicates iterator exhausted
self.current += 1 # Ensure state updated for next call
return retval # Return cached value
# Unlikely to have other methods; iterators are generally iterated and that's it
Example of “easy iterable” where you don’t implement your own iterator class, by making __iter__
a generator function:
class MyEasyRange:
def __init__(self, start, stop): ... # Same as for MyRange
def __iter__(self): # Generator function is simpler (and faster)
# than writing your own iterator class
current = self.start # Can't mutate attributes, because multiple iterators might rely on this one iterable
while current < self.stop:
yield current # Produces value and freezes generator until iteration resumes
current += 1
# reaching the end of the function acts as implicit StopIteration for a generator
CLICK HERE to find out more related problems solutions.