Proficient and Efficient use Python Lists

Python lists are a great data structure to use when you’re working with many related values. They are a mutable, ordered sequence of elements that can hold heterogeneous elements.

Getting started with lists is really easy. They’re written in square brackets, as shown below:

brands = ['McD', 'KFC', 'Nike', 'Apple', 'Google']

Let’s do a quick round-up of a few basic operations on Python lists:

You can access an element, or a range of elements, by specifying the indexes (slice notations).
The in keyword is used to determine if an element exists in the list.
The remove() method removes the specified element, while the pop() method returns and removes the element at a given index (or the last element if an index isn’t specified).
The * operator can be used to multiply lists. It replicates the list by the number we specify.

Python lists are widely used, so it’s of the utmost importance that we handle them efficiently. In the next few sections, we’ll explore a few use cases of Python lists and ensure that we’re using them efficiently.

1. Copy List by Value

There are many ways to copy a list, but using an assignment operator isn’t one of them. Let’s confirm this:

>>> a = [1, 2, 3, 4, 5]
>>> b = a

>>> id(a)
4345924656

>>> id(b)
4345924656

The assignment just creates a reference to the list a. This implies both of the lists now point to the same memory and any changes in one list would affect the other.

Following are some possible ways to create a standalone “shallow” copy of a Python list, ranked from the most efficient to the least in terms of speed:

b = [*a]
b = a * 1
b = a[:]
b = a.copy()(Python 3 — shallow copy)
b = [x for x in a]
b = copy.copy(a) (Python 2)

While the difference in speeds is comparable, sometimes doing a deepcopy (which is obviously the slowest and most memory-needing approach) is unavoidable.

Unlike a deep copy, a shallow copy doesn’t do a clone of the nested objects. Instead, it just copies the reference of the nested objects. Let’s look at the following example to validate this:

>>> a = [[0,1],[2,3]]
>>> b = [*a]
>>> a[1][0] = 5

#Output of b: [[0, 1], [5, 3]]

Updating the nested list element a[1][0] = 5 changes the list b as well. In such scenarios where we aren’t using a 1D list, the following ways work best for doing a deep copy of all the list elements:

b = [x[:] for x in a]
b = copy.deepcopy(a) (Import the copy library)

2. Number of Occurrences in the List

We can find the number of occurrences of an element using list comprehension, filter(), or count(). While list comprehensions make it easy to write code that is elegant, in the current case, they are the least efficient. The most efficient is the built-in method count():

#List Comprehension
>>> counter = [i for i in a if i == 'orange']
#Filter
>>> a = ['apple', 'orange', 'orange', 'grape', 'apple']
>>> len(list(filter(lambda x: x=='apple', a))) 
#Count
>>> a.count('apple')
#Output 2

Using the Counter library, we can retrieve the number of occurrences of each element. It is an unordered collection container with elements representing the keys, and their count is set in values.

>>> from collections import Counter
>>> a = ['apple', ‘orange’, ‘orange’, 'grape', 'apple']
>>> Counter(a)

#Output: Counter({'apple': 2, 'orange': 2, 'grape': 1})

3. Most Common Element in List

Here’s one way of finding the most common element in a list:

x = [1,2,2,3,2,1,1]
common_element = max(set(x), key=x.count)

While the implementation above might look smart, concise, and elegant, the use of the count method causes an additional loop overhead and an obvious O( n2 ) time.

By using the Counter library that we just saw, finding the most common element (or second-most common) is much faster. The following code snippet displays how we can achieve this:

from collections import Counter

a = [1,2,2,3,2,1,1]

c = Counter(a)

print(c.most_common(1))
#Output: 1
print(c.most_common(2))
#Output: [(1, 3), (2, 3)]

4. Parsing Files to List

Parsing files to a Python list is a very common use case, and while the following code looks good, it is inefficient:

with open('filename.txt', 'r') as f:     
   lines = [line.split() for line in f.readlines()]

readlines() takes the entire file into memory at once and parses it into a list. Now, this might not cause much trouble with small files, but as the file size increases, the program would take a while to start.

Luckily, you do not really need to read the file in order to parse it. It turns out that a file is iterable by default, so the following code snippet would do the job for you by putting each line into a nested list in an efficient way:

a = [i.split() for i in open(‘filename.txt’)]

The split() function helps us get rid of the annoying new line characters in each nested list element.

5. Search Through List of Dictionaries

The slower way of searching through dictionaries contained in a list is by using Filter + Lamda:

next(filter(lambda obj: obj.get('name') == 'Anupam', dicts), None)

#Time Elapsed: 0.001047

Now, leveraging list comprehensions is not just concise but also saves a function call overhead, thereby being a lot faster:

next((item for item in dicts if item["name"] == "Anupam"), None)

#Time Elapsed: 0.0009327

6. Merging of Lists

The concatenation or merging of lists isn’t equivalent to appending. An append function adds a single element to the list, thereby incrementing the count by one, as shown below:

x = [1, 2, 3]
x.append([4, 5])

#Output: [1,2,3,[4,5]]

On the other hand, extend, by iterating over its arguments, adds each element of the collection to the list. It’s worth noting that passing a single element, like an integer, doesn’t work in the extend function.

x = [1, 2, 3]
y = [4,5]
x.extend(y)

#Output of x: [1,2,3,4,5]

x = [1,2,3]
x.extend(4)

# TypeError: 'int' object is not iterable

Concatenating or merging lists can be rightly done with the extend function or by using the += operator.

extend vs. +=

In terms of speed, there’s just a marginal difference between extend and +=, and while they might look equivalent, they aren’t really. For instance, you can’t use += for non-local variables. Also, unlike extend, += doesn’t work on chained function calls (getListA() += listB would throw a syntax error).

a = [1, 2]
b = [2, 3]
b.extend(a)

#Time elapsed: 0.000189

#Second way

b += a
#Time elapased: 0.0002098

Splat operator

The splat operator helps in unpacking and concatenating lists while also letting us append literal elements in it.

a = [1,2]
b = [2,3]
a = [*a, *b]

#Output of a: [1, 2, 2, 3]
#Time elapsed : 0.00021719

7. Iterate Multiple Lists in Parallel

To iterate over lists in parallel, we’d leverage the zip operator. The zip operator returns an iterator of tuples by combining elements at the same index together from each list.

If the length of the lists is different, the zip operator stops when the shortest list is completed. To get a list of tuples, use list(zip(a, b)). The following code shows an example of iterating over multiple lists, error-free in parallel:

country = ['India', 'US', 'Australia']
capital = ['New Delhi', 'Washington DC', 'Tansmania']
names = zip(country, capital)
for name in names:
  print(name)
# Outputs:
('India', 'New Delhi')
('US', 'Washington DC')
('Australia', 'Tansmania')

8. Flatten Out a List of Lists

Among the various possible ways to flatten out a list of lists, let’s look at three popular approaches and compare their efficiency over 500 iterations in timeit.

Approach 1: Using list comprehensions — slow

l = [[1, 2, None], [3, 4], [5, 6]]

[item for items in l for item in items]

#Time is 0.0009621297940611839

Approach 2: Using chain iterables — faster

import itertools
list(itertools.chain.from_iterable(l))

#Time is 0.0007339292205870152

Approach 3: Using functools — fastest

Using Functools (reduce-iconcat) is the fastest approach (though just marginally) to flatten out a list of lists, although it does require importing some libraries:

import functools
import operator

l = functools.reduce(operator.iconcat, l, [])

# Output: [1, 2, None, 3, 4, 5, 6]
#Time is 0.0006934418343007565

9. Permutations and Combinations in a List

Python provides built-in methods present in the itertools package to generate permutations and combinations of a sequence. Let’s look at how to create them from a Python list.

Permutations

It generates n! permutations for the input sequence length n. We can explicitly pass a custom argument to generate a different length sequence as well.

import itertools 
print(list(itertools.permutations([1,2,3])))

#Output: [(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]

Combinations

Just like the seventh-grade math concept we breezed through, this takes an input r and generates all possible tuple combinations.

import itertools
a = [1,2,3,4]
list(itertools.combinations(a, r=2)) 

#Output: [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]

10. String to List, List to String

To create a list from a string, we simply need to iterate over it using a list comprehension, as shown in the code below:

a = [_ for _ in 'abcdefghi']

For doing the opposite, we’ll use a join() method that takes each element of the iterable and concatenates it into the string:

b = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
x = ', '.join(mylist)

#Output x is:

'a, b, c, d, e, f, g, h, i'

Conclusion

This piece did a round-up of the basic syntax and some common use cases of Python lists while placing emphasis on the efficiency of different approaches.

Thanks for reading.

#python #data science #programming