Faster Lists In Python save your time

Faster Lists In Python save your time

Python lists are beautifully dynamic data structures. They are the choice data structure for so many things, so it is of the utmost importance to be conscious of the speed of each operation. This is commonly referred to as the Big O or time complexity of the solution.

Python lists are beautifully dynamic data structures. They are the choice data structure for so many things, so it is of the utmost importance to be conscious of the speed of each operation. This is commonly referred to as the Big O or time complexity of the solution.

List Creation

There are many ways to create arrays in python so let's take a look at which take the least time. Here is the setup we used:

import random
from time import time
import numpy as np
import pandas as pd

# Initilize Random
random.seed(time())

#### List Tests ####

# How many test passes to run
num_tests = 25

# Create a dict for which to store tests
data = {'length':[], 'listComprehension': [], 'append': [], 'preAllocate': [], 'while': []}

for i in range(0, num_tests):
    # Randomize list length
    end = random.randint(100_000, 10_000_000)

    startTime = time()
    listComprehension = [i for i in range(0, end)]
    endTime = time()
    
    # Add timed entry
    tmpdata['listComprehension'].append(endTime - startTime)

# Get the median for all the rows
df = pd.DataFrame(data)
print(df.describe())

List Comprehensions are the first method for creating arrays. They are simple and easy to read which makes them just as easy to write.

listComp = [i for i in range(0, end)]

Append is the next method. I have a tendency to jump to this solution as it is one of the easier to use. It is similar to how you add to lists in most other languages.

append_list = []
for i in range(0,end):
  append_list.append(i)

Pre-Allocating is the final method we will test. This only works for arrays with a predetermined length. This involves creating all of the array objects beforehand and then modifying their values by index.

preAllocate = [0] * end
for i in range(0, end):
  preAllocate[i] = i

Results: While list comprehensions don’t always make the most sense here they are the clear winner. I am not recommending you try and jam every for loop into a comprehension. Where it makes sense is when you are using simple for loops or creating especially large arrays.

Method:                            Median Time:
listComprehension            0.5556 ms
preAllocate                        1.1466 ms
append                              1.3842 ms
Comparing Lists

The test for this was similar to the one above. We created two lists with 50,000 random numbers between 0 and 50,000. Then we compare the two lists to see if

Double For Loop: This should have a time complexity of O(n²). For each element, we will have to iterate over part or all of the second list.

for i in list_a:
  for j in list_b:
    if i == j:
      break

In Search: The time complexity of ‘in’ is confusing. Knowing that the double for loop executes in O(n²) time, it would make sense for the ‘in’ access to operate in a similar manner. This does not seem to be the case. It seems to operate in closer to O(n log n) time. If someone happens to be familiar with the ‘in’ implementation in CPython leave a comment below and I will include the answer here.

for i in list_a:
  if i in list_b:
    continue

Set Search time complexity is a little different. The implementation of set in Python is essentially that of a hash table so it has O(1) access. Therefore because we are going through the list one time and checking in the second list is an O(1) operation the set search should operate in O(n) time.

unique = set(list_b)
for i in list_a:
  if i in unique:
    continue

Results: Comparisons with sets are great. For anyone wondering, the double for loop was one I never expected to do well. What did surprise me was just how much less time using the ‘in’ operator took than the double for.

Method:               Median Time:        Big O: 
Set Syntax:            0.0046ms            O(n)
In Syntax:              3.4710ms            O(n log n)
Double For:          42.3418ms           O(n^2)
Sorting Lists

For our last section, we will talk about sorting lists in python. There are many common sorting algorithms we will compare.

When talking about sorting we need to remember that there are stable and unstable sort algorithms. An algorithm is stable if it preserves the original order of equal items. Here is an example of why stability may be important:

Stability Example

Say we have a bug in our code and we want to see what went wrong. We might go to the log file and sort the requests by the user. Here is our sample log file:

user10: { 'request_number': 2 }
user11: { 'request_number': 1 }
user10: { 'request_number': 1 }

In an unstable sort we might end up with something like this, where the requests are sorted by the user, but because it is unstable we have lost the original sequence of the events. We might never find our bug because we wouldn't be able to see that user10 had request 2 reach the server before request 1.

# Unstable Sort Result
user10: { 'request_number': 1 }
user10: { 'request_number': 2 }
user11: { 'request_number': 1 }
# Stable Sort Result
user10: { 'request_number': 2 }
user10: { 'request_number': 1 }
user11: { 'request_number': 1 }

Heapsort has a typical time complexity of O(n log n). This is not a stable sorting algorithm however so there is no guarantee that items will be in their original order. This algorithm works by placing items on a binary tree where the root node is the maximum value. Then it will deconstruct that tree to form the sorted array.

# Courtesy of https://stackabuse.com/sorting-algorithms-in-python/

def heapify(nums, heap_size, root_index):
    # Assume the index of the largest element is the root index
    largest = root_index
    left_child = (2 * root_index) + 1
    right_child = (2 * root_index) + 2

    # If the left child of the root is a valid index, and the element is greater
    # than the current largest element, then update the largest element
    if left_child < heap_size and nums[left_child] > nums[largest]:
        largest = left_child

    # Do the same for the right child of the root
    if right_child < heap_size and nums[right_child] > nums[largest]:
        largest = right_child

    # If the largest element is no longer the root element, swap them
    if largest != root_index:
        nums[root_index], nums[largest] = nums[largest], nums[root_index]
        # Heapify the new root element to ensure it's the largest
        heapify(nums, heap_size, largest)


def heap_sort(nums):
    n = len(nums)

    # Create a Max Heap from the list
    # The 2nd argument of range means we stop at the element before -1 i.e.
    # the first element of the list.
    # The 3rd argument of range means we iterate backwards, reducing the count
    # of i by 1
    for i in range(n, -1, -1):
        heapify(nums, n, i)

    # Move the root of the max heap to the end of
    for i in range(n - 1, 0, -1):
        nums[i], nums[0] = nums[0], nums[i]
        heapify(nums, i, 0)

Mergesort is a divide and conquer algorithm that has a typical time complexity of O(n log n). This algorithm separates the list in half and those halves in half until there are individual elements left. Then the elements are paired up, sorted, and merged repeatedly until the list is complete again. This algorithm is stable so it is great where that is necessary.

# Courtesy of https://stackabuse.com/sorting-algorithms-in-python/

def merge(left_list, right_list):
    sorted_list = []
    left_list_index = right_list_index = 0

    # We use the list lengths often, so its handy to make variables
    left_list_length, right_list_length = len(left_list), len(right_list)

    for _ in range(left_list_length + right_list_length):
        if left_list_index < left_list_length and right_list_index < right_list_length:
            # We check which value from the start of each list is smaller
            # If the item at the beginning of the left list is smaller, add it
            # to the sorted list
            if left_list[left_list_index] <= right_list[right_list_index]:
                sorted_list.append(left_list[left_list_index])
                left_list_index += 1
            # If the item at the beginning of the right list is smaller, add it
            # to the sorted list
            else:
                sorted_list.append(right_list[right_list_index])
                right_list_index += 1

        # If we've reached the end of the of the left list, add the elements
        # from the right list
        elif left_list_index == left_list_length:
            sorted_list.append(right_list[right_list_index])
            right_list_index += 1
        # If we've reached the end of the of the right list, add the elements
        # from the left list
        elif right_list_index == right_list_length:
            sorted_list.append(left_list[left_list_index])
            left_list_index += 1

    return sorted_list


def merge_sort(nums):
    # If the list is a single element, return it
    if len(nums) <= 1:
        return nums

    # Use floor division to get midpoint, indices must be integers
    mid = len(nums) // 2

    # Sort and merge each half
    left_list = merge_sort(nums[:mid])
    right_list = merge_sort(nums[mid:])

Quicksort is one very commonly used algorithm as it is easy to implement and has an average time complexity of O(n log n). The problem with this is everything is based around picking a good pivot point that is already in the proper place and sorting around it. Picking a poor pivot point could lead to an O(n²) time complexity. This tied with the fact that it is not stable leads me to not recommend it.

# Courtesy of https://stackabuse.com/sorting-algorithms-in-python/

def partition(nums, low, high):
    # We select the middle element to be the pivot. Some implementations select
    # the first element or the last element. Sometimes the median value becomes
    # the pivot, or a random one. There are many more strategies that can be
    # chosen or created.
    pivot = nums[(low + high) // 2]
    i = low - 1
    j = high + 1
    while True:
        i += 1
        while nums[i] < pivot:
            i += 1

        j -= 1
        while nums[j] > pivot:
            j -= 1

        if i >= j:
            return j

        # If an element at i (on the left of the pivot) is larger than the
        # element at j (on right right of the pivot), then swap them
        nums[i], nums[j] = nums[j], nums[i]


def quick_sort(nums):
    # Create a helper function that will be called recursively
    def _quick_sort(items, low, high):
        if low < high:
            # This is the index after the pivot, where our lists are split
            split_index = partition(items, low, high)
            _quick_sort(items, low, split_index)
            _quick_sort(items, split_index + 1, high)

    _quick_sort(nums, 0, len(nums) - 1)

Timsort is a sort that you may not have ever heard of. This sort, named for its inventor Tim Peters, is a combination of a binary search, insert sort, and a merge sort. Its average time complexity is O(n log n) however it is able to hit this with better consistency and has a best-case complexity of O(n). Also, this is a stable algorithm which makes it great to use anywhere.

The best part is using Timsort in python only takes one line. That's right. It is the built-in sort function.

array.sort()

Results: Timsort outperforms all the others which is why it is the core sorting algorithm for python. Next Mergesort and Quicksort are neck and neck. Finally, Heapsort is pulling up the rear… well, the rear of this list at least. There are plenty of worse sorting algorithms such as random sort.

Algorithm:           Time:          
timsort             0.001594 ms
mergesort       0.033845 ms
quicksort         0.097097 ms
heapsort         0.365555 ms

Here is a chart that shows the Big O of all of these sorting algorithms. Also, we have space complexity which is how much extra space is needed to complete the algorithm.

That is all for this list. Now go forth and use these in your day to day Python and follow along for more tips and tricks.

Thank you.

Python Programming & Data Handling

Python Programming & Data Handling

Python Programming & Data Handling

Description
Students will be able to know various commands include:

Python Basic Programming

string,if else,Range, for loop, while loop, making functions, tuple, dictionary etc

WEB WORKING:

Download files from Web, Exceptional Handling etc

Data Handling:

Get data from files create , read, write, delete.

IMAGE PROCESSING: (NEW DEVELOPING ART)

Image transformation , Image crop , edit and much more

Basic knowledge
Listener should know how about very basic C Language basic commands to understand Python Language
What will you learn
I will more update if any student want need to add more skills related to python

Python For Data Science - How to use Data Science with Python

Python For Data Science - How to use Data Science with Python

This Edureka video on 'Python For Data Science - How to use Data Science with Python - Data Science using Python ' will help you understand how we can use python for data science along with various use cases. What is Data Science? Why Python? Python Libraries For Data Science. Roadmap To Data Science With Python. Data Science Jobs and Salary Trends

This Edureka video on 'Python For Data Science - How to use Data Science with Python - Data Science using Python
' will help you understand how we can use python for data science along with various use cases. Following are the topics discussed this Python Data Science Tutorial:

  • What is Data Science?
  • Why Python?
  • Python Libraries For Data Science
  • Roadmap To Data Science With Python
  • Data Science Jobs and Salary Trends
  • How Edureka Helps?

Guide to Python Programming Language

Guide to Python Programming Language

Guide to Python Programming Language

Description
The course will lead you from beginning level to advance in Python Programming Language. You do not need any prior knowledge on Python or any programming language or even programming to join the course and become an expert on the topic.

The course is begin continuously developing by adding lectures regularly.

Please see the Promo and free sample video to get to know more.

Hope you will enjoy it.

Basic knowledge
An Enthusiast Mind
A Computer
Basic Knowledge To Use Computer
Internet Connection
What will you learn
Will Be Expert On Python Programming Language
Build Application On Python Programming Language