Let’s explore two Python libraries itertools and more_itertools and see how to take advantage of them to process data. There are many great Python libraries, but most of them are not close to what itertools integrates and more_itertools provides. These two libraries are really very complete when processing / repeating some data in Python. However, at first glance, the functions in those libraries may not seem to be useful, so let’s take a look at the most interesting ones, including examples of how to get the most out of them!
dates = [
"2020-01-01",
"2020-02-04",
"2020-02-01",
"2020-01-24",
"2020-01-08",
"2020-02-10",
"2020-02-15",
"2020-02-11",
]
counts = [1, 4, 3, 8, 0, 7, 9, 2]
from itertools import compress
bools = [n > 3 for n in counts]
print(list(compress(dates, bools))) # Compress returns iterator!
# ['2020-02-04', '2020-01-24', '2020-02-10', '2020-02-15']
You have quite a few options when filtering strings, one of which is compress which has an iterable and boolean selector and outputs iterable entries where the corresponding element in the selector is true.
We can use this to apply the result of filtering one string to another, as in the example above, where we create a list of dates in which the corresponding number is greater than 3.
As the name suggests - we will use this function to accumulate the results of some (binary) functions. Examples of this can be run max or factorial:
from itertools import accumulate
import operator
data = [3, 4, 1, 3, 5, 6, 9, 0, 1]
list(accumulate(data, max)) # running maximum
# [3, 4, 4, 4, 5, 6, 9, 9, 9]
list(accumulate(range(1, 11), operator.mul)) # Factorial
# [1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]
If you are not interested in intermediate results, you can use functools.reduce to keep only the final value and higher memory efficiency.
This function can repeat and create infinite cycles from it. This can be useful for example in games where players take turns. Another wonderful cycle creates infinite rotation
# Cycling through players
from itertools import cycle
players = ["John", "Ben", "Martin", "Peter"]
next_player = cycle(players).__next__
player = next_player()
# "John"
player = next_player()
# "Ben"
# ...
# Infinite Spinner
import time
for c in cycle('/-\|'):
print(c, end = '\r')
time.sleep(0.2)
Finally, the itertools module is tee, which creates multiple loops from one, allowing us to remember what happened. An example of that is the pairing function from the itertools formulas (and also more_itertools), which returns the value pairs from the input iterable (current and previous values):
from itertools import tee
def pairwise(iterable):
"""
s -> (s0, s1), (s1, s2), (s2, s3), ...
"""
a, b = tee(iterable, 2)
next(b, None)
return zip(a, b)
This function is useful whenever you need multiple separate pointers for the same data stream. Be careful when using it, as it can be quite expensive when it comes to memory. It should also be noted that you should not use an original after you use tee as it has become the new tee object
First up from more_itertools is devide. As the name suggests, it divides iterable into the number of sub iterations. As you can see in the example below, the length of the extra iterations may not be the same, since it depends on the number of elements to be divided and the number of sub-iterations.
from more_itertools import divide
data = ["first", "second", "third", "fourth", "fifth", "sixth", "seventh"]
[list(l) for l in divide(3, data)]
# [['first', 'second', 'third'], ['fourth', 'fifth'], ['sixth', 'seventh']]
Avatar
nguyen chi thanh @ nguyen.chi.thanh
682 27 37
Published Monday, 9:46 AM 5 min read
45
The power of Python Intertools
learn python
Let’s explore two Python libraries itertools and more_itertools and see how to take advantage of them to process data. There are many great Python libraries, but most of them are not close to what itertools integrates and more_itertools provides. These two libraries are really very complete when processing / repeating some data in Python. However, at first glance, the functions in those libraries may not seem to be useful, so let’s take a look at the most interesting ones, including examples of how to get the most out of them!
counts = [1, 4, 3, 8, 0, 7, 9, 2]
from itertools import compress
bools = [n > 3 for n in counts]
print(list(compress(dates, bools))) # Compress returns iterator!
You have quite a few options when filtering strings, one of which is compress which has an iterable and boolean selector and outputs iterable entries where the corresponding element in the selector is true.
We can use this to apply the result of filtering one string to another, as in the example above, where we create a list of dates in which the corresponding number is greater than 3.
from itertools import accumulate
import operator
data = [3, 4, 1, 3, 5, 6, 9, 0, 1]
list(accumulate(data, max)) # running maximum
list(accumulate(range(1, 11), operator.mul)) # Factorial
If you are not interested in intermediate results, you can use functools.reduce to keep only the final value and higher memory efficiency.
from itertools import cycle
players = [“John”, “Ben”, “Martin”, “Peter”]
next_player = cycle(players).next
player = next_player()
player = next_player()
import time
for c in cycle(‘/-|’):
print(c, end = ‘\r’)
time.sleep(0.2)
4. Tee
Finally, the itertools module is tee, which creates multiple loops from one, allowing us to remember what happened. An example of that is the pairing function from the itertools formulas (and also more_itertools), which returns the value pairs from the input iterable (current and previous values):
from itertools import tee
def pairwise(iterable):
“”"
s -> (s0, s1), (s1, s2), (s2, s3), …
“”"
a, b = tee(iterable, 2)
next(b, None)
return zip(a, b)
This function is useful whenever you need multiple separate pointers for the same data stream. Be careful when using it, as it can be quite expensive when it comes to memory. It should also be noted that you should not use an original after you use tee as it has become the new tee object
from more_itertools import divide
data = [“first”, “second”, “third”, “fourth”, “fifth”, “sixth”, “seventh”]
[list(l) for l in divide(3, data)]
With this function, we will also split the loop, however this time, using a predicate
# Split based on age
from datetime import datetime, timedelta
from more_itertools import partition
dates = [
datetime(2015, 1, 15),
datetime(2020, 1, 16),
datetime(2020, 1, 17),
datetime(2019, 2, 1),
datetime(2020, 2, 2),
datetime(2018, 2, 4)
]
is_old = lambda x: datetime.now() - x < timedelta(days=30)
old, recent = partition(is_old, dates)
list(old)
# [datetime.datetime(2015, 1, 15, 0, 0), datetime.datetime(2019, 2, 1, 0, 0), datetime.datetime(2018, 2, 4, 0, 0)]
list(recent)
# [datetime.datetime(2020, 1, 16, 0, 0), datetime.datetime(2020, 1, 17, 0, 0), datetime.datetime(2020, 2, 2, 0, 0)]
# Split based on file extension
files = [
"foo.jpg",
"bar.exe",
"baz.gif",
"text.txt",
"data.bin",
]
ALLOWED_EXTENSIONS = ('jpg','jpeg','gif','bmp','png')
is_allowed = lambda x: x.split(".")[1] in ALLOWED_EXTENSIONS
allowed, forbidden = partition(is_allowed, files)
list(allowed)
# ['bar.exe', 'text.txt', 'data.bin']
list(forbidden)
# ['foo.jpg', 'baz.gif']
In the first example above, we are splitting the list of dates into recent and old dates, using simple lambda. For the second example, we are partitioning files based on their extensions, once again using the lambda function to split the filenames into names and extensions and check if the extension is in the list. Book extensions are allowed or not.
If you need to find sequential numbers, dates, letters, booleans, or any other unordered object, you can find consecutive_groups:
# Consecutive Groups of dates
import datetime
import more_itertools
dates = [
datetime.datetime(2020, 1, 15),
datetime.datetime(2020, 1, 16),
datetime.datetime(2020, 1, 17),
datetime.datetime(2020, 2, 1),
datetime.datetime(2020, 2, 2),
datetime.datetime(2020, 2, 4)
]
ordinal_dates = []
for d in dates:
ordinal_dates.append(d.toordinal())
groups = [list(map(datetime.datetime.fromordinal, group)) for group in more_itertools.consecutive_groups(ordinal_dates)]
In this example, we have a list of dates, some of which are consecutive. To be able to convert these dates into consecutive functions, we must first convert them into ordinal numbers. Then, using the list comprehension feature, we iterate through the sequential groups of dates created by consecutive_groups and convert them back to datetime.datetime using map and fromordinal functions.
Let’s say you need to cause side effects when repeating a list of items. This side effect could be an example of logging, writing to a file, or the same in the example below counting the number of events that occurred:
import more_itertools
num_events = 0
def _increment_num_events(_):
nonlocal num_events
num_events += 1
# Iterator that will be consumed
event_iterator = more_itertools.side_effect(_increment_num_events, events)
more_itertools.consume(event_iterator)
print(num_events)
#python #programming