Python lists are a great data structure to use when you’re working with many related values. They are a mutable, ordered sequence of elements that can hold heterogeneous elements.
Getting started with lists is really easy. They’re written in square brackets, as shown below:
brands = ['McD', 'KFC', 'Nike', 'Apple', 'Google']
Let’s do a quick round-up of a few basic operations on Python lists:
in
keyword is used to determine if an element exists in the list.remove()
method removes the specified element, while the pop()
method returns and removes the element at a given index (or the last element if an index isn’t specified).*
operator can be used to multiply lists. It replicates the list by the number we specify.Python lists are widely used, so it’s of the utmost importance that we handle them efficiently. In the next few sections, we’ll explore a few use cases of Python lists and ensure that we’re using them efficiently.
There are many ways to copy a list, but using an assignment operator isn’t one of them. Let’s confirm this:
>>> a = [1, 2, 3, 4, 5]
>>> b = a
>>> id(a)
4345924656
>>> id(b)
4345924656
The assignment just creates a reference to the list a
. This implies both of the lists now point to the same memory and any changes in one list would affect the other.
Following are some possible ways to create a standalone “shallow” copy of a Python list, ranked from the most efficient to the least in terms of speed:
b = [*a]
b = a * 1
b = a[:]
b = a.copy()
(Python 3 — shallow copy)b = [x for x in a]
b = copy.copy(a)
(Python 2)While the difference in speeds is comparable, sometimes doing a deepcopy
(which is obviously the slowest and most memory-needing approach) is unavoidable.
Unlike a deep copy, a shallow copy doesn’t do a clone of the nested objects. Instead, it just copies the reference of the nested objects. Let’s look at the following example to validate this:
>>> a = [[0,1],[2,3]]
>>> b = [*a]
>>> a[1][0] = 5
#Output of b: [[0, 1], [5, 3]]
Updating the nested list element a[1][0] = 5
changes the list b
as well. In such scenarios where we aren’t using a 1D list, the following ways work best for doing a deep copy of all the list elements:
b = [x[:] for x in a]
b = copy.deepcopy(a)
(Import the copy
library)We can find the number of occurrences of an element using list comprehension, filter()
, or count()
. While list comprehensions make it easy to write code that is elegant, in the current case, they are the least efficient. The most efficient is the built-in method count()
:
#List Comprehension
>>> counter = [i for i in a if i == 'orange']
#Filter
>>> a = ['apple', 'orange', 'orange', 'grape', 'apple']
>>> len(list(filter(lambda x: x=='apple', a)))
#Count
>>> a.count('apple')
#Output 2
Using the Counter library, we can retrieve the number of occurrences of each element. It is an unordered collection container with elements representing the keys, and their count is set in values.
>>> from collections import Counter
>>> a = ['apple', ‘orange’, ‘orange’, 'grape', 'apple']
>>> Counter(a)
#Output: Counter({'apple': 2, 'orange': 2, 'grape': 1})
Here’s one way of finding the most common element in a list:
x = [1,2,2,3,2,1,1]
common_element = max(set(x), key=x.count)
While the implementation above might look smart, concise, and elegant, the use of the count
method causes an additional loop overhead and an obvious O( n2 ) time.
By using the Counter
library that we just saw, finding the most common element (or second-most common) is much faster. The following code snippet displays how we can achieve this:
from collections import Counter
a = [1,2,2,3,2,1,1]
c = Counter(a)
print(c.most_common(1))
#Output: 1
print(c.most_common(2))
#Output: [(1, 3), (2, 3)]
Parsing files to a Python list is a very common use case, and while the following code looks good, it is inefficient:
with open('filename.txt', 'r') as f:
lines = [line.split() for line in f.readlines()]
readlines()
takes the entire file into memory at once and parses it into a list. Now, this might not cause much trouble with small files, but as the file size increases, the program would take a while to start.
Luckily, you do not really need to read the file in order to parse it. It turns out that a file is iterable by default, so the following code snippet would do the job for you by putting each line into a nested list in an efficient way:
a = [i.split() for i in open(‘filename.txt’)]
The split()
function helps us get rid of the annoying new line characters in each nested list element.
The slower way of searching through dictionaries contained in a list is by using Filter + Lamda
:
next(filter(lambda obj: obj.get('name') == 'Anupam', dicts), None)
#Time Elapsed: 0.001047
Now, leveraging list comprehensions is not just concise but also saves a function call overhead, thereby being a lot faster:
next((item for item in dicts if item["name"] == "Anupam"), None)
#Time Elapsed: 0.0009327
The concatenation or merging of lists isn’t equivalent to appending. An append
function adds a single element to the list, thereby incrementing the count by one, as shown below:
x = [1, 2, 3]
x.append([4, 5])
#Output: [1,2,3,[4,5]]
On the other hand, extend
, by iterating over its arguments, adds each element of the collection to the list. It’s worth noting that passing a single element, like an integer, doesn’t work in the extend
function.
x = [1, 2, 3]
y = [4,5]
x.extend(y)
#Output of x: [1,2,3,4,5]
x = [1,2,3]
x.extend(4)
# TypeError: 'int' object is not iterable
Concatenating or merging lists can be rightly done with the extend
function or by using the +=
operator.
In terms of speed, there’s just a marginal difference between extend
and +=
, and while they might look equivalent, they aren’t really. For instance, you can’t use +=
for non-local variables. Also, unlike extend
, +=
doesn’t work on chained function calls (getListA() += listB
would throw a syntax error).
a = [1, 2]
b = [2, 3]
b.extend(a)
#Time elapsed: 0.000189
#Second way
b += a
#Time elapased: 0.0002098
The splat operator helps in unpacking and concatenating lists while also letting us append literal elements in it.
a = [1,2]
b = [2,3]
a = [*a, *b]
#Output of a: [1, 2, 2, 3]
#Time elapsed : 0.00021719
To iterate over lists in parallel, we’d leverage the zip
operator. The zip
operator returns an iterator of tuples by combining elements at the same index together from each list.
If the length of the lists is different, the zip operator stops when the shortest list is completed. To get a list of tuples, use list(zip(a, b))
. The following code shows an example of iterating over multiple lists, error-free in parallel:
country = ['India', 'US', 'Australia']
capital = ['New Delhi', 'Washington DC', 'Tansmania']
names = zip(country, capital)
for name in names:
print(name)
# Outputs:
('India', 'New Delhi')
('US', 'Washington DC')
('Australia', 'Tansmania')
Among the various possible ways to flatten out a list of lists, let’s look at three popular approaches and compare their efficiency over 500 iterations in timeit
.
l = [[1, 2, None], [3, 4], [5, 6]]
[item for items in l for item in items]
#Time is 0.0009621297940611839
import itertools
list(itertools.chain.from_iterable(l))
#Time is 0.0007339292205870152
Using Functools (reduce-iconcat)
is the fastest approach (though just marginally) to flatten out a list of lists, although it does require importing some libraries:
import functools
import operator
l = functools.reduce(operator.iconcat, l, [])
# Output: [1, 2, None, 3, 4, 5, 6]
#Time is 0.0006934418343007565
Python provides built-in methods present in the itertools
package to generate permutations and combinations of a sequence. Let’s look at how to create them from a Python list.
It generates n! permutations for the input sequence length n. We can explicitly pass a custom argument to generate a different length sequence as well.
import itertools
print(list(itertools.permutations([1,2,3])))
#Output: [(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]
Just like the seventh-grade math concept we breezed through, this takes an input r
and generates all possible tuple combinations.
import itertools
a = [1,2,3,4]
list(itertools.combinations(a, r=2))
#Output: [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
To create a list from a string, we simply need to iterate over it using a list comprehension, as shown in the code below:
a = [_ for _ in 'abcdefghi']
For doing the opposite, we’ll use a join()
method that takes each element of the iterable and concatenates it into the string:
b = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
x = ', '.join(mylist)
#Output x is:
'a, b, c, d, e, f, g, h, i'
This piece did a round-up of the basic syntax and some common use cases of Python lists while placing emphasis on the efficiency of different approaches.
Thanks for reading.
#python #data science #programming