Dijkstra's algorithm in python: algorithms for beginners

<em>Photo by Ishan @seefromthesky on Unsplash</em>

Dijkstra's algorithm can find for you the shortest path between two nodes on a graph. It's a must-know for any programmer. There are nice gifs and history in its Wikipedia page.

In this post I'll use the time-tested implementation from Rosetta Codechanged just a bit for being able to process weighted and unweighted graph data, also, we'll be able to edit the graph on the fly. I'll explain the code block by block.

The algorithm

The algorithm is pretty simple. Dijkstra created it in 20 minutes, now you can learn to code it in the same time.

  1. Mark all nodes unvisited and store them.
  2. Set the distance to zero for our initial node and to infinity for other nodes.
  3. Select the unvisited node with the smallest distance, it's current node now.
  4. Find unvisited neighbors for the current node and calculate their distances through the current node. Compare the newly calculated distance to the assigned and save the smaller one. For example, if the node A has a distance of 6, and the A-B edge has length 2, then the distance to B through A will be 6 + 2 = 8. If B was previously marked with a distance greater than 8 then change it to 8.
  5. Mark the current node as visited and remove it from the unvisited set.
  6. Stop, if the destination node has been visited (when planning a route between two specific nodes) or if the smallest distance among the unvisited nodes is infinity. If not, repeat steps 3-6.
Python implementation

First, imports and data formats. The original implementations suggests using namedtuple for storing edge data. We'll do exactly that, but we'll add a default value to the cost argument. There are many ways to do that, find what suits you best.

from collections import deque, namedtuple


# we'll use infinity as a default distance to nodes.
inf = float('inf')
Edge = namedtuple('Edge', 'start, end, cost')


def make_edge(start, end, cost=1):
    return Edge(start, end, cost)

Let's initialize our data:

class Graph:
    def __init__(self, edges):
        # let's check that the data is right
        wrong_edges = [i for i in edges if len(i) not in [2, 3]]
        if wrong_edges:
            raise ValueError('Wrong edges data: {}'.format(wrong_edges))

        self.edges = [make_edge(*edge) for edge in edges]

Let's find the vertices. In the original implementation the vertices are defined in the _ _ init _ _, but we'll need them to update when edges change, so we'll make them a property, they'll be recounted each time we address the property. Probably not the best solution for big graphs, but for small ones it'll go.

    @property
    def vertices(self):
        return set(
            # this piece of magic turns ([1,2], [3,4]) into [1, 2, 3, 4]
            # the set above makes it's elements unique.
            sum(
                ([edge.start, edge.end] for edge in self.edges), []
            )
        )

Now, let's add adding and removing functionality.

    def get_node_pairs(self, n1, n2, both_ends=True):
        if both_ends:
            node_pairs = [[n1, n2], [n2, n1]]
        else:
            node_pairs = [[n1, n2]]
        return node_pairs

    def remove_edge(self, n1, n2, both_ends=True):
        node_pairs = self.get_node_pairs(n1, n2, both_ends)
        edges = self.edges[:]
        for edge in edges:
            if [edge.start, edge.end] in node_pairs:
                self.edges.remove(edge)

    def add_edge(self, n1, n2, cost=1, both_ends=True):
        node_pairs = self.get_node_pairs(n1, n2, both_ends)
        for edge in self.edges:
            if [edge.start, edge.end] in node_pairs:
                return ValueError('Edge {} {} already exists'.format(n1, n2))

        self.edges.append(Edge(start=n1, end=n2, cost=cost))
        if both_ends:
            self.edges.append(Edge(start=n2, end=n1, cost=cost))

Let's find neighbors for every node:

    @property
    def neighbours(self):
        neighbours = {vertex: set() for vertex in self.vertices}
        for edge in self.edges:
            neighbours[edge.start].add((edge.end, edge.cost))

        return neighbours

It's time for the algorithm! I renamed the variables so it would be easier to understand.

    def dijkstra(self, source, dest):
        assert source in self.vertices, 'Such source node doesn\'t exist'

        # 1. Mark all nodes unvisited and store them.
        # 2. Set the distance to zero for our initial node 
        # and to infinity for other nodes.
        distances = {vertex: inf for vertex in self.vertices}
        previous_vertices = {
            vertex: None for vertex in self.vertices
        }
        distances[source] = 0
        vertices = self.vertices.copy()

        while vertices:
            # 3. Select the unvisited node with the smallest distance, 
            # it's current node now.
            current_vertex = min(
                vertices, key=lambda vertex: distances[vertex])

            # 6. Stop, if the smallest distance 
            # among the unvisited nodes is infinity.
            if distances[current_vertex] == inf:
                break

            # 4. Find unvisited neighbors for the current node 
            # and calculate their distances through the current node.
            for neighbour, cost in self.neighbours[current_vertex]:
                alternative_route = distances[current_vertex] + cost

                # Compare the newly calculated distance to the assigned 
                # and save the smaller one.
                if alternative_route < distances[neighbour]:
                    distances[neighbour] = alternative_route
                    previous_vertices[neighbour] = current_vertex

            # 5. Mark the current node as visited 
            # and remove it from the unvisited set.
            vertices.remove(current_vertex)


        path, current_vertex = deque(), dest
        while previous_vertices[current_vertex] is not None:
            path.appendleft(current_vertex)
            current_vertex = previous_vertices[current_vertex]
        if path:
            path.appendleft(current_vertex)
        return path

Let's use it.

graph = Graph([
    ("a", "b", 7),  ("a", "c", 9),  ("a", "f", 14), ("b", "c", 10),
    ("b", "d", 15), ("c", "d", 11), ("c", "f", 2),  ("d", "e", 6),
    ("e", "f", 9)])

print(graph.dijkstra("a", "e"))
>>> deque(['a', 'c', 'd', 'e'])

The whole code from above:
from collections import deque, namedtuple


# we'll use infinity as a default distance to nodes.
inf = float('inf')
Edge = namedtuple('Edge', 'start, end, cost')


def make_edge(start, end, cost=1):
  return Edge(start, end, cost)


class Graph:
    def __init__(self, edges):
        # let's check that the data is right
        wrong_edges = [i for i in edges if len(i) not in [2, 3]]
        if wrong_edges:
            raise ValueError('Wrong edges data: {}'.format(wrong_edges))

        self.edges = [make_edge(*edge) for edge in edges]

    @property
    def vertices(self):
        return set(
            sum(
                ([edge.start, edge.end] for edge in self.edges), []
            )
        )

    def get_node_pairs(self, n1, n2, both_ends=True):
        if both_ends:
            node_pairs = [[n1, n2], [n2, n1]]
        else:
            node_pairs = [[n1, n2]]
        return node_pairs

    def remove_edge(self, n1, n2, both_ends=True):
        node_pairs = self.get_node_pairs(n1, n2, both_ends)
        edges = self.edges[:]
        for edge in edges:
            if [edge.start, edge.end] in node_pairs:
                self.edges.remove(edge)

    def add_edge(self, n1, n2, cost=1, both_ends=True):
        node_pairs = self.get_node_pairs(n1, n2, both_ends)
        for edge in self.edges:
            if [edge.start, edge.end] in node_pairs:
                return ValueError('Edge {} {} already exists'.format(n1, n2))

        self.edges.append(Edge(start=n1, end=n2, cost=cost))
        if both_ends:
            self.edges.append(Edge(start=n2, end=n1, cost=cost))

    @property
    def neighbours(self):
        neighbours = {vertex: set() for vertex in self.vertices}
        for edge in self.edges:
            neighbours[edge.start].add((edge.end, edge.cost))

        return neighbours

    def dijkstra(self, source, dest):
        assert source in self.vertices, 'Such source node doesn\'t exist'
        distances = {vertex: inf for vertex in self.vertices}
        previous_vertices = {
            vertex: None for vertex in self.vertices
        }
        distances[source] = 0
        vertices = self.vertices.copy()

        while vertices:
            current_vertex = min(
                vertices, key=lambda vertex: distances[vertex])
            vertices.remove(current_vertex)
            if distances[current_vertex] == inf:
                break
            for neighbour, cost in self.neighbours[current_vertex]:
                alternative_route = distances[current_vertex] + cost
                if alternative_route < distances[neighbour]:
                    distances[neighbour] = alternative_route
                    previous_vertices[neighbour] = current_vertex

        path, current_vertex = deque(), dest
        while previous_vertices[current_vertex] is not None:
            path.appendleft(current_vertex)
            current_vertex = previous_vertices[current_vertex]
        if path:
            path.appendleft(current_vertex)
        return path


graph = Graph([
    ("a", "b", 7),  ("a", "c", 9),  ("a", "f", 14), ("b", "c", 10),
    ("b", "d", 15), ("c", "d", 11), ("c", "f", 2),  ("d", "e", 6),
    ("e", "f", 9)])

print(graph.dijkstra("a", "e"))

P.S. For those of us who, like me, read more books about the Witcher than about algorithms, it's Edsger Dijkstra, not Sigismund.

Dijkstra's algorithm in python: algorithms for beginners

Dijkstra's algorithm can find for you the shortest path between two nodes on a graph. It's a must-know for any programmer. There are nice gifs and history in its&nbsp;<a href="https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm" target="_blank">Wikipedia page</a>.

Dijkstra's algorithm can find for you the shortest path between two nodes on a graph. It's a must-know for any programmer. There are nice gifs and history in its Wikipedia page.

In this post I'll use the time-tested implementation from Rosetta Codechanged just a bit for being able to process weighted and unweighted graph data, also, we'll be able to edit the graph on the fly. I'll explain the code block by block.

The algorithm

The algorithm is pretty simple. Dijkstra created it in 20 minutes, now you can learn to code it in the same time.

  1. Mark all nodes unvisited and store them.
  2. Set the distance to zero for our initial node and to infinity for other nodes.
  3. Select the unvisited node with the smallest distance, it's current node now.
  4. Find unvisited neighbors for the current node and calculate their distances through the current node. Compare the newly calculated distance to the assigned and save the smaller one. For example, if the node A has a distance of 6, and the A-B edge has length 2, then the distance to B through A will be 6 + 2 = 8. If B was previously marked with a distance greater than 8 then change it to 8.
  5. Mark the current node as visited and remove it from the unvisited set.
  6. Stop, if the destination node has been visited (when planning a route between two specific nodes) or if the smallest distance among the unvisited nodes is infinity. If not, repeat steps 3-6.
Python implementation

First, imports and data formats. The original implementations suggests using namedtuple for storing edge data. We'll do exactly that, but we'll add a default value to the cost argument. There are many ways to do that, find what suits you best.

from collections import deque, namedtuple
we'll use infinity as a default distance to nodes.

inf = float('inf')
Edge = namedtuple('Edge', 'start, end, cost')

def make_edge(start, end, cost=1):
return Edge(start, end, cost)

Let's initialize our data:

class Graph:
def init(self, edges):
# let's check that the data is right
wrong_edges = [i for i in edges if len(i) not in [2, 3]]
if wrong_edges:
raise ValueError('Wrong edges data: {}'.format(wrong_edges))

    self.edges = [make_edge(*edge) for edge in edges]

Let's find the vertices. In the original implementation the vertices are defined in the _ _ init _ _, but we'll need them to update when edges change, so we'll make them a property, they'll be recounted each time we address the property. Probably not the best solution for big graphs, but for small ones it'll go.

    @property
def vertices(self):
return set(
# this piece of magic turns ([1,2], [3,4]) into [1, 2, 3, 4]
# the set above makes it's elements unique.
sum(
([edge.start, edge.end] for edge in self.edges), []
)
)

Now, let's add adding and removing functionality.

    def get_node_pairs(self, n1, n2, both_ends=True):
if both_ends:
node_pairs = [[n1, n2], [n2, n1]]
else:
node_pairs = [[n1, n2]]
return node_pairs

def remove_edge(self, n1, n2, both_ends=True):
    node_pairs = self.get_node_pairs(n1, n2, both_ends)
    edges = self.edges[:]
    for edge in edges:
        if [edge.start, edge.end] in node_pairs:
            self.edges.remove(edge)

def add_edge(self, n1, n2, cost=1, both_ends=True):
    node_pairs = self.get_node_pairs(n1, n2, both_ends)
    for edge in self.edges:
        if [edge.start, edge.end] in node_pairs:
            return ValueError('Edge {} {} already exists'.format(n1, n2))

    self.edges.append(Edge(start=n1, end=n2, cost=cost))
    if both_ends:
        self.edges.append(Edge(start=n2, end=n1, cost=cost))

Let's find neighbors for every node:

    @property
def neighbours(self):
neighbours = {vertex: set() for vertex in self.vertices}
for edge in self.edges:
neighbours[edge.start].add((edge.end, edge.cost))

    return neighbours

It's time for the algorithm! I renamed the variables so it would be easier to understand.

    def dijkstra(self, source, dest):
assert source in self.vertices, 'Such source node doesn't exist'

    # 1. Mark all nodes unvisited and store them.
    # 2. Set the distance to zero for our initial node 
    # and to infinity for other nodes.
    distances = {vertex: inf for vertex in self.vertices}
    previous_vertices = {
        vertex: None for vertex in self.vertices
    }
    distances[source] = 0
    vertices = self.vertices.copy()

    while vertices:
        # 3. Select the unvisited node with the smallest distance, 
        # it's current node now.
        current_vertex = min(
            vertices, key=lambda vertex: distances[vertex])

        # 6. Stop, if the smallest distance 
        # among the unvisited nodes is infinity.
        if distances[current_vertex] == inf:
            break

        # 4. Find unvisited neighbors for the current node 
        # and calculate their distances through the current node.
        for neighbour, cost in self.neighbours[current_vertex]:
            alternative_route = distances[current_vertex] + cost

            # Compare the newly calculated distance to the assigned 
            # and save the smaller one.
            if alternative_route &lt; distances[neighbour]:
                distances[neighbour] = alternative_route
                previous_vertices[neighbour] = current_vertex

        # 5. Mark the current node as visited 
        # and remove it from the unvisited set.
        vertices.remove(current_vertex)


    path, current_vertex = deque(), dest
    while previous_vertices[current_vertex] is not None:
        path.appendleft(current_vertex)
        current_vertex = previous_vertices[current_vertex]
    if path:
        path.appendleft(current_vertex)
    return path

Let's use it.

graph = Graph([
("a", "b", 7), ("a", "c", 9), ("a", "f", 14), ("b", "c", 10),
("b", "d", 15), ("c", "d", 11), ("c", "f", 2), ("d", "e", 6),
("e", "f", 9)])

print(graph.dijkstra("a", "e"))
>>> deque(['a', 'c', 'd', 'e'])

The whole code from above:
from collections import deque, namedtuple

we'll use infinity as a default distance to nodes.

inf = float('inf')
Edge = namedtuple('Edge', 'start, end, cost')

def make_edge(start, end, cost=1):
return Edge(start, end, cost)

class Graph:
def init(self, edges):
# let's check that the data is right
wrong_edges = [i for i in edges if len(i) not in [2, 3]]
if wrong_edges:
raise ValueError('Wrong edges data: {}'.format(wrong_edges))

    self.edges = [make_edge(*edge) for edge in edges]

@property
def vertices(self):
    return set(
        sum(
            ([edge.start, edge.end] for edge in self.edges), []
        )
    )

def get_node_pairs(self, n1, n2, both_ends=True):
    if both_ends:
        node_pairs = [[n1, n2], [n2, n1]]
    else:
        node_pairs = [[n1, n2]]
    return node_pairs

def remove_edge(self, n1, n2, both_ends=True):
    node_pairs = self.get_node_pairs(n1, n2, both_ends)
    edges = self.edges[:]
    for edge in edges:
        if [edge.start, edge.end] in node_pairs:
            self.edges.remove(edge)

def add_edge(self, n1, n2, cost=1, both_ends=True):
    node_pairs = self.get_node_pairs(n1, n2, both_ends)
    for edge in self.edges:
        if [edge.start, edge.end] in node_pairs:
            return ValueError('Edge {} {} already exists'.format(n1, n2))

    self.edges.append(Edge(start=n1, end=n2, cost=cost))
    if both_ends:
        self.edges.append(Edge(start=n2, end=n1, cost=cost))

@property
def neighbours(self):
    neighbours = {vertex: set() for vertex in self.vertices}
    for edge in self.edges:
        neighbours[edge.start].add((edge.end, edge.cost))

    return neighbours

def dijkstra(self, source, dest):
    assert source in self.vertices, 'Such source node doesn\'t exist'
    distances = {vertex: inf for vertex in self.vertices}
    previous_vertices = {
        vertex: None for vertex in self.vertices
    }
    distances[source] = 0
    vertices = self.vertices.copy()

    while vertices:
        current_vertex = min(
            vertices, key=lambda vertex: distances[vertex])
        vertices.remove(current_vertex)
        if distances[current_vertex] == inf:
            break
        for neighbour, cost in self.neighbours[current_vertex]:
            alternative_route = distances[current_vertex] + cost
            if alternative_route &lt; distances[neighbour]:
                distances[neighbour] = alternative_route
                previous_vertices[neighbour] = current_vertex

    path, current_vertex = deque(), dest
    while previous_vertices[current_vertex] is not None:
        path.appendleft(current_vertex)
        current_vertex = previous_vertices[current_vertex]
    if path:
        path.appendleft(current_vertex)
    return path

graph = Graph([
("a", "b", 7), ("a", "c", 9), ("a", "f", 14), ("b", "c", 10),
("b", "d", 15), ("c", "d", 11), ("c", "f", 2), ("d", "e", 6),
("e", "f", 9)])

print(graph.dijkstra("a", "e"))

P.S. For those of us who, like me, read more books about the Witcher than about algorithms, it's Edsger Dijkstra, not Sigismund.

By : Maria Boldyreva


A Review of Basic Algorithms and Data Structures in Python: Graph Algorithms

A Review of Basic Algorithms and Data Structures in Python: Graph Algorithms

<strong>Originally published by </strong><a href="https://medium.com/@diogoribeiro_94486" target="_blank">Diogo Ribeiro</a> <em>at&nbsp;</em><a href="https://medium.com/@diogoribeiro_94486/a-review-of-basic-algorithms-and-data-structures-in-python-graph-algorithms-d73691d86211" target="_blank"><em>Medium</em></a>

Introduction

Recently, while reviewing basic graph algorithms, I decided to write down my study notes as an article in case someone else finds them useful. To verify my understanding, I wrote minimal implementations of the algorithms in Python which make up the bulk of this article. Simple unit tests accompany the code. The unit tests can also be used as examples of using the code.

I’m hoping to write at least a few follow-up posts, focusing on combinatorial algorithms, string algorithms, and maybe even one on computational geometry.

Most of the code was written to be easy to understand without having to reference much else (with a few exceptions, for example, Kruskal’s algorithm uses the disjoint set structure defined in another section). This results in some duplication, especially in the unit tests. I consider this to be acceptable, given that the purpose of the code is to be used as educational material and not as code in production use that needs a day to day maintenance.

One last thing before we start: I wrote the article and all the code relatively quickly. Mistakes and bugs are definitely possible. Corrections are appreciated; please comment below if you find any.

Table Of Contents

Algorithms and data structures in this article:

  • Disjoint Set (Union-Find)
  • Kruskal’s Minimum Spanning Tree (MST)
  • Depth First Search (DFS)
  • Breadth First Search (BFS)
  • Kahn’s Topological Sort Algorithm
  • Dijkstra’s Shortest Path Algorithm
  • Bellman-Ford Shortest Path Algorithm

Disjoint Set (Union-Find)

The disjoint set structure is used to keep track of a partitioning of a set of objects into subsets. The main question it needs to answer is “do X and Y belong to the same subset?” and the main operation it needs to support is joining two subsets so that elements in either of the subsets will belong to the same larger subset afterward.

Quick and minimal implementation is provided below. The implementation below uses a forest to keep track of the subsets in the partition. Each tree in the forest is one subset, and the root of the tree is the “representative” element of the subset. To check if two elements belong to the same subset, we check if they have the same representative element.

Noting that the ideal tree in this implementation is a star (this minimizes the number of recursive find calls), we "compress" the paths on each call to find. That is, we set the parent of all the elements on the path to the representative to the representative as we unwind down the recursive call stack.

class DisjointSet(object):
  def __init__(self, n):
    """
    Initializes a disjoint set structure consisting of n disjoint sets.
    """
    self.parent = list(range(n))
  def find(self, x):
    """Returns the representative element of the set x belongs to."""
    if self.parent[x] != x:
      self.parent[x] = self.find(self.parent[x])
    return self.parent[x]
  def union(self, x, y):
    """Joins the sets containing x and y."""
    self.parent[self.find(x)] = self.find(y)

And the accompanied unit test:

import unittest
from union_find import DisjointSet

class DisjointSetTest(unittest.TestCase):
  def test_initialized_state(self):
    d = DisjointSet(3)
    self.assertEqual(d.find(0), 0)
    self.assertEqual(d.find(1), 1)
    self.assertEqual(d.find(2), 2)
  def test_basic_union(self):
    d = DisjointSet(3)
    d.union(0, 1)
    self.assertEqual(d.find(0), d.find(1))
    self.assertNotEqual(d.find(1), d.find(2))
  def test_basic_union_idempotent(self):
    d = DisjointSet(2)
    d.union(0, 1)
    d.union(0, 1)
    self.assertEqual(d.find(0), d.find(1))
  def test_union_all(self):
    d = DisjointSet(100)
    for i in range(1, 100):
      d.union(i - 1, i)
    for i in range(1, 100):
      self.assertEqual(d.find(0), d.find(i))

Kruskal’s Minimum Spanning Tree (MST)

Kruskal’s minimum spanning tree algorithm is a good example of a greedy algorithm. Starting with a forest consisting of individual disjoint vertices, at each step we pick the next best edge (one with minimal weight) provided it does not introduce a cycle into the forest, and continue until the forest becomes a tree. It’s rather easy to prove that the resulting tree is a minimum spanning tree.

Using the disjoint set structure shown above to keep track of the minimum spanning forest, the implementation below is very simple:

from collections import namedtuple
from union_find import DisjointSet

# Putting weight as the first element means Edges will sort by weight first,
# then source and target (lexicographically).
Edge = namedtuple('Edge', ['weight', 'source', 'target'])

def kruskal_mst(n, edges):
  """
  Given a positive integer n (number of vertices) and a collection of Edge
  namedtuple objects representing the undirected edges of a graph, returns a
  list of edges forming a minimal spanning tree of the graph. Assumes the
  vertices are numbers in the range 0 to n - 1.  Also assumes input is a
  valid connected undirected graph and that for two vertices v and w only one
  of (v, w) or (w, v) is an edge in the input. Output is undefined if these
  assumptions are not satisfied.
  """
  d = DisjointSet(n)
  mst_tree = []
  for edge in sorted(edges):
    if d.find(edge.source) != d.find(edge.target):
      mst_tree.append(edge)
      if len(mst_tree) == n - 1:
        break
      d.union(edge.source, edge.target)
  return mst_tree

And the accompanied unit test:

import unittest
from kruskal import kruskal_mst, Edge

class KruskalMSPTest(unittest.TestCase):
  def test_single_vertex_graph(self):
    self.assertEqual(kruskal_mst(1, []), [])
  def test_single_edge_graph(self):
    edges = [Edge(source=0, target=1, weight=10)]
    self.assertEqual(kruskal_mst(2, edges), edges)
  def test_cycle_5(self):
    edges = [
      Edge(source=0, target=1, weight=50),
      Edge(source=1, target=2, weight=30),
      Edge(source=2, target=3, weight=60),
      Edge(source=3, target=4, weight=20),
      Edge(source=4, target=0, weight=10),
    ]
    # Everything except the heaviest edge. Output sorted by weight.
    self.assertEqual(kruskal_mst(5, edges), [
      Edge(source=4, target=0, weight=10),
      Edge(source=3, target=4, weight=20),
      Edge(source=1, target=2, weight=30),
      Edge(source=0, target=1, weight=50),
    ])
  def test_complete_graph_4(self):
    edges = [
      Edge(source=0, target=1, weight=10),
      Edge(source=0, target=2, weight=30),
      Edge(source=0, target=3, weight=40),
      Edge(source=1, target=2, weight=20),
      Edge(source=1, target=3, weight=50),
      Edge(source=2, target=3, weight=60),
    ]
    self.assertEqual(kruskal_mst(4, edges), [
      Edge(source=0, target=1, weight=10),
      Edge(source=1, target=2, weight=20),
      Edge(source=0, target=3, weight=40),
    ])

Depth First Search (DFS)

Depth-first search is arguably the simplest graph traversal algorithm. It’s a simple recursive algorithm that just needs to keep track of which vertices have already been processed. In fact, many other recursive algorithms can be thought of as a DFS on some underlying graph (e.g. binary search is guided DFS on the binary search tree). DFS can be used to determine if there is a path from a vertex to another and to visit every vertex starting from a source vertex. Variations of DFS can be used for determining connected components and doing topological sorting. The code below simply uses DFS to return all vertices reachable from a starting vertex.

def dfs(graph, source):
  """
  Given a directed graph (format described below), and a source vertex,
  returns a set of vertices reachable from source.
  The graph parameter is expected to be a dictionary mapping each vertex to a
  list of vertices indicating outgoing edges. For example if vertex v has
  outgoing edges to u and w we have graph[v] = [u, w].
  """
  visited = set()
  def _recurse(v):
    if v in visited:
      return
    visited.add(v)
    for w in graph[v]:
      _recurse(w)
  _recurse(source)
  return visited

And the accompanied unit test:

import unittest
from dfs import dfs

class DFSTest(unittest.TestCase):
  def test_single_vertex(self):
    graph = {0: []}
    self.assertEqual(dfs(graph, 0), {0})
  def test_single_vertex_with_loop(self):
    graph = {0: [0]}
    self.assertEqual(dfs(graph, 0), {0})
  def test_two_vertices_no_path(self):
    graph = {
      0: [],
      1: [],
    }
    self.assertEqual(dfs(graph, 0), {0})
    self.assertEqual(dfs(graph, 1), {1})
  def test_two_vertices_with_simple_path(self):
    graph = {
      0: [1],
      1: [],
    }
    self.assertEqual(dfs(graph, 0), {0, 1})
    self.assertEqual(dfs(graph, 1), {1})
  def test_complete_graph(self):
    def _complete_graph(n):
      return {v: list(set(range(n)) - {v}) for v in range(n)}
    for n in range(2, 10):
      graph = _complete_graph(n)
      for v in range(n):
        self.assertEqual(dfs(graph, v), set(range(n)))
  def test_cycle_5(self):
    graph = {
      0: [1],
      1: [2],
      2: [3],
      3: [4],
      4: [0],
    }
    for v in range(5):
      self.assertEqual(dfs(graph, v), {0, 1, 2, 3, 4})

Breadth First Search (BFS)

BFS is one of the simplest graph algorithms and a good algorithm to understand prior to Dijkstra’s, which is coming up next. It can be used to simply traverse a graph and visit every vertex, to search for a particular vertex, or find the shortest path (assuming edges don’t have weights) to every vertex starting from a single vertex.

from collections import deque

def bfs(graph, source, target):
  """
  Given a directed graph (format described below), and source and target
  vertices, returns a shortest unweighted path as a list of vertices going
  from source to target, or None if no such path exists. Returned path will
  not include the source vertex in it.
  The graph parameter is expected to be a dictionary mapping each vertex to a
  list of vertices indicating outgoing edges. For example if vertex v has
  outgoing edges to u and w we have graph[v] = [u, w].
  """
  q = deque([source])
  # previous_vertex[v] holds the immediate vertex before v in the shortest
  # path from source to v. This dictionary also acts as our "visited" set
  # since we set previous_vertex[v] as soon as the vertex enters our queue.
  previous_vertex = {source: source}
  while q:
    v = q.popleft()
    if v == target:
      return _construct_path(previous_vertex, source, target)
    for w in graph[v]:
      if w not in previous_vertex:
        previous_vertex[w] = v
        q.append(w)
  return None

def _construct_path(previous_vertex, source, target):
  if source == target:
    return []
  return _construct_path(previous_vertex, source,
               previous_vertex[target]) + [target]

And the accompanied unit test:

import unittest
from bfs import bfs

class BFSTest(unittest.TestCase):
  def test_single_vertex(self):
    graph = {0: []}
    self.assertEqual(bfs(graph, 0, 0), [])
  def test_single_vertex_with_loop(self):
    graph = {0: [0]}
    self.assertEqual(bfs(graph, 0, 0), [])
  def test_two_vertices_no_path(self):
    graph = {
      0: [],
      1: [],
    }
    self.assertEqual(bfs(graph, 0, 1), None)
  def test_two_vertices_with_simple_path(self):
    graph = {
      0: [1],
      1: [],
    }
    self.assertEqual(bfs(graph, 0, 1), [1])
  def test_complete_graph(self):
    def _complete_graph(n):
      return {v: list(set(range(n)) - {v}) for v in range(n)}
    for n in range(2, 10):
      graph = _complete_graph(n)
      for v in range(n):
        for w in range(n):
          self.assertEqual(bfs(graph, v, w),
                   [] if v == w else [w])
  def test_cycle_5(self):
    graph = {
      0: [4, 1],
      1: [0, 2],
      2: [1, 3],
      3: [2, 4],
      4: [3, 0],
    }
    self.assertEqual(bfs(graph, 0, 2), [1, 2])
    self.assertEqual(bfs(graph, 0, 3), [4, 3])

Kahn’s Topological Sort Algorithm

Given a directed acyclic graph (DAG) representing a set of, say, tasks and their dependencies, the topological sort is the problem of finding an order of task execution that will satisfy all the dependencies. This problem arises in a variety of applications. Examples include task scheduling, build systems (e.g. Bazel), parallel pipelines (e.g. Hadoop), and formula evaluation (e.g. in spreadsheets).

While a variation of DFS can be used for topological sorting, my personal favorite algorithm for doing topological sorts is Kahn’s algorithm, due to its intuitiveness. The idea behind the algorithm is simple: start with vertices with no incoming edges, process them, and then remove them and all their outgoing edges from the graph and continue until there’s nothing left in the graph.

In the code below, instead of returning a particular topological sort, the algorithm assigns a “sequence” to each vertex, such that if sequence[v] < sequence[w] then v should be before w in any topological sort of the graph. This simplifies unit testing, and also allows for easier use of the output in cases where parallelization is possible (since all tasks with the same sequence number can be executed in parallel).

from collections import deque, namedtuple
Vertex = namedtuple('Vertex', ['name', 'incoming', 'outgoing'])

def build_doubly_linked_graph(graph):
  """
  Given a graph with only outgoing edges, build a graph with incoming and
  outgoing edges. The returned graph will be a dictionary mapping vertex to a
  Vertex namedtuple with sets of incoming and outgoing vertices.
  """
  g = {v:Vertex(name=v, incoming=set(), outgoing=set(o))
     for v, o in graph.items()}
  for v in g.values():
    for w in v.outgoing:
      if w in g:
        g[w].incoming.add(v.name)
      else:
        g[w] = Vertex(name=w, incoming={v}, outgoing=set())
  return g

def kahn_top_sort(graph):
  """
  Given an acyclic directed graph (format described below), returns a
  dictionary mapping vertex to sequence such that sorting by the sequence
  component will result in a topological sort of the input graph. Output is
  undefined if input is a not a valid DAG.
  The graph parameter is expected to be a dictionary mapping each vertex to a
  list of vertices indicating outgoing edges. For example if vertex v has
  outgoing edges to u and w we have graph[v] = [u, w].
  """
  g = build_doubly_linked_graph(graph)
  # sequence[v] < sequence[w] implies v should be before w in the topological
  # sort.
  q = deque(v.name for v in g.values() if not v.incoming)
  sequence = {v: 0 for v in q}
  while q:
    v = q.popleft()
    for w in g[v].outgoing:
      g[w].incoming.remove(v)
      if not g[w].incoming:
        sequence[w] = sequence[v] + 1
        q.append(w)
  return sequence

And the accompanied unit test:

import unittest
from kahn import kahn_top_sort

class KahnTopSortTest(unittest.TestCase):
  def test_single_vertex(self):
    graph = {
      0: [],
    }
    self.assertEqual(kahn_top_sort(graph), {
      0: 0,
    })
  def test_total_order_2(self):
    graph = {
      0: [1],
      1: [],
    }
    self.assertEqual(kahn_top_sort(graph), {
      0: 0,
      1: 1,
    })
  def test_total_order_3(self):
    graph = {
      0: [1],
      1: [2],
      2: [],
    }
    self.assertEqual(kahn_top_sort(graph), {
      0: 0,
      1: 1,
      2: 2,
    })
  def test_two_independent_total_orders(self):
    # 0 -> 1 -> 2
    # 3 -> 4 -> 5
    graph = {
      0: [1],
      1: [2],
      2: [],
      3: [4],
      4: [5],
      5: [],
    }
    self.assertEqual(kahn_top_sort(graph), {
      0: 0,
      3: 0,
      1: 1,
      4: 1,
      2: 2,
      5: 2,
    })
  def test_simple_dag_1(self):
    # 0 -> 1 -> 2
    #   \ /
    #  3
    graph = {
      0: [1, 3],
      1: [2],
      2: [],
      3: [1],
    }
    self.assertEqual(kahn_top_sort(graph), {
      0: 0,
      3: 1,
      1: 2,
      2: 3,
    })

Dijkstra’s Shortest Path Algorithm

Dijkstra’s shortest path algorithm is very similar to BFS, except a priority queue is used instead of a regular queue. A proper implementation would use a priority queue with an “update key” operation which would reduce the redundant items in the queue. The implementation below, for the sake of simplicity, uses the built-in Python PriorityQueue which does not support "update key".

The invariant in the algorithm is that each time we get an item from the queue, we know that we have the shortest path from source to it already (this is where the guarantee of non-negative weights is key, as this invariant can fail if we have negative weights.)

from collections import namedtuple, defaultdict
from Queue import PriorityQueue
Edge = namedtuple('Edge', ['target', 'weight'])

def dijkstra(graph, source, target):
  """
  Given a directed graph (format described below), and source and target
  vertices, returns a shortest path as a list of vertices going from source
  to target, along with the distance of the shortest path, or None if no such
  path exists. Returned path will not include the source vertex in it.
  Assumes non-negative weights.
  The graph parameter is expected to be a dictionary mapping each vertex to a
  list of Edge named tuples indicating the vertex's outgoing edges. For
  example if vertex v has outgoing edges to u and w with weights 10 and 20
  respectively, we have graph[v] = [Edge(u, 10), Edge(w, 20)].
  """
  q = PriorityQueue()
  q.put((0, source))
  # previous_vertex[v] holds the immediate vertex before v in the shortest
  # path from source to v. This dictionary also acts as our "visited" set
  # since we set previous_vertex[v] as soon as the vertex enters our queue.
  previous_vertex = {source: source}
  # Arguably not the best way to represent infinity but it works for the sake
  # of learning the algorithm.
  shortest_distance = defaultdict(lambda: float('inf'))
  shortest_distance[source] = 0
  while not q.empty():
    (distance, v) = q.get()
    if v == target:
      return (distance, _construct_path(previous_vertex, source, target))
    for edge in graph[v]:
      alt_distance = edge.weight + distance
      if alt_distance < shortest_distance[edge.target]:
        shortest_distance[edge.target] = alt_distance
        q.put((alt_distance, edge.target))
        previous_vertex[edge.target] = v
  return None

def _construct_path(previous_vertex, source, target):
  if source == target:
    return []
  return _construct_path(previous_vertex, source,
               previous_vertex[target]) + [target]

And the accompanied unit test:

import unittest
from dijkstra import dijkstra, Edge

class DijkstraTest(unittest.TestCase):
  def test_single_vertex(self):
    graph = {0: []}
    self.assertEqual(dijkstra(graph, 0, 0), (0, []))
  def test_two_vertices_no_path(self):
    graph = {
      0: [],
      1: [],
    }
    self.assertEqual(dijkstra(graph, 0, 1), None)
  def test_two_vertices_with_path(self):
    graph = {
      0: [Edge(target=1, weight=10)],
      1: [],
    }
    self.assertEqual(dijkstra(graph, 0, 1), (10, [1]))
  def test_cycle_3(self):
    graph = {
      0: [Edge(target=1, weight=10), Edge(target=2, weight=30)],
      1: [Edge(target=0, weight=10), Edge(target=2, weight=10)],
      2: [Edge(target=0, weight=30), Edge(target=1, weight=30)],
    }
    self.assertEqual(dijkstra(graph, 0, 2), (20, [1, 2]))
  def test_clrs_example(self):
    graph = {
      's': [
        Edge(target='t', weight=3),
        Edge(target='y', weight=5),
      ],
      't': [
        Edge(target='x', weight=6),
        Edge(target='y', weight=2),
      ],
      'y': [
        Edge(target='t', weight=1),
        Edge(target='z', weight=6),
      ],
      'x': [
        Edge(target='z', weight=2),
      ],
      'z': [
        Edge(target='x', weight=7),
        Edge(target='s', weight=3),
      ],
    }
    distance, path = dijkstra(graph, 's', 'z')
    self.assertEqual(distance, 11)
    self.assertIn(path, [
      ['y', 'z'],
      ['t', 'y', 'x', 'z'],
    ])
    distance, path = dijkstra(graph, 's', 'x')
    self.assertEqual(distance, 9)
    self.assertIn(path, [
      ['t', 'x'],
      ['y', 'x'],
    ])

Bellman-Ford Shortest Path Algorithm

Bellman-Ford is another single-source shortest path algorithm. It’s very easy to implement but has worse running time than Dijkstra’s. While in Dijkstra’s we relax edges greedily based on the next closest vertex to the source, in Bellman-Ford we relax every edge exactly n-1 times. Each such iteration guarantees to increase the number of vertices for which we have the shortest path by at least one, and hence after n-1 iterations, we have the shortest path to every vertex. We then do a final loop over all the edges and try to relax further. If we succeed, we know a negative cycle exists. This is the key advantage of Bellman-Ford as compared to Dijkstra’s (Dijkstra’s algorithm does not work if negative weights exist.)

Here’s a basic implementation:

from collections import namedtuple, defaultdict
Edge = namedtuple('Edge', ['target', 'weight'])

def bellman_ford(graph, source, target):
  """
  Given a directed graph (format described below), and source and target
  vertices, returns a shortest path as a list of vertices going from source
  to target, along with the distance of the shortest path, or None if no such
  path exists and -1 if a negative loop is found. Returned path will not
  include the source vertex in it. Assumes non-negative weights.
  The graph parameter is expected to be a dictionary mapping each vertex to a
  list of Edge named tuples indicating the vertex's outgoing edges. For
  example if vertex v has outgoing edges to u and w with weights 10 and 20
  respectively, we have graph[v] = [Edge(u, 10), Edge(w, 20)].
  """
  # previous_vertex[v] holds the immediate vertex before v in the shortest
  # path from source to v. This dictionary also acts as our "visited" set
  # since we set previous_vertex[v] as soon as the vertex enters our queue.
  previous_vertex = {source: source}
  # Arguably not the best way to represent infinity but it works for the sake
  # of learning the algorithm.
  shortest_distance = defaultdict(lambda: float('inf'))
  shortest_distance[source] = 0
  # Run n - 1 times. We start by knowing the shortest path to 1 vertex
  # (source itself) and each iteration below increases the vertices for which
  # we have the shortest path to by one. This means at the end we have the
  # shortest path to 1 + (n - 1) = n vertices.
  for i in range(len(graph) - 1):
    for v in graph:
      for edge in graph[v]:
        alt_distance = shortest_distance[v] + edge.weight
        if alt_distance < shortest_distance[edge.target]:
          shortest_distance[edge.target] = alt_distance
          previous_vertex[edge.target] = v
  # Final loop over all edges to check for negative loops. If at this point
  # we find a shorter alternative path it means a negative loop exists.
  for v in graph:
    for edge in graph[v]:
      alt_distance = shortest_distance[v] + edge.weight
      if alt_distance < shortest_distance[edge.target]:
        return -1
  if shortest_distance[target] < float('inf'):
    return (shortest_distance[target],
        _construct_path(previous_vertex, source, target))
  return None

def _construct_path(previous_vertex, source, target):
  if source == target:
    return []
  return _construct_path(previous_vertex, source,
               previous_vertex[target]) + [target]

And as before, accompanied unit test, which is a copy of the one used for Dijkstra’s, with an additional test for negative cycles:

import unittest
from bellman import bellman_ford, Edge

class BellmanFordTest(unittest.TestCase):
  def test_single_vertex(self):
    graph = {0: []}
    self.assertEqual(bellman_ford(graph, 0, 0), (0, []))
  def test_two_vertices_no_path(self):
    graph = {
      0: [],
      1: [],
    }
    self.assertEqual(bellman_ford(graph, 0, 1), None)
  def test_two_vertices_with_path(self):
    graph = {
      0: [Edge(target=1, weight=10)],
      1: [],
    }
    self.assertEqual(bellman_ford(graph, 0, 1), (10, [1]))
  def test_cycle_3(self):
    graph = {
      0: [Edge(target=1, weight=10), Edge(target=2, weight=30)],
      1: [Edge(target=0, weight=10), Edge(target=2, weight=10)],
      2: [Edge(target=0, weight=30), Edge(target=1, weight=30)],
    }
    self.assertEqual(bellman_ford(graph, 0, 2), (20, [1, 2]))
  def test_negative_cycle_3(self):
    graph = {
      0: [Edge(target=1, weight=10), Edge(target=2, weight=30)],
      1: [Edge(target=0, weight=10), Edge(target=2, weight=10)],
      2: [Edge(target=0, weight=-30), Edge(target=1, weight=30)],
    }
    self.assertEqual(bellman_ford(graph, 0, 2), -1)
  def test_clrs_example(self):
    graph = {
      's': [
        Edge(target='t', weight=3),
        Edge(target='y', weight=5),
      ],
      't': [
        Edge(target='x', weight=6),
        Edge(target='y', weight=2),
      ],
      'y': [
        Edge(target='t', weight=1),
        Edge(target='z', weight=6),
      ],
      'x': [
        Edge(target='z', weight=2),
      ],
      'z': [
        Edge(target='x', weight=7),
        Edge(target='s', weight=3),
      ],
    }
    distance, path = bellman_ford(graph, 's', 'z')
    self.assertEqual(distance, 11)
    self.assertIn(path, [
      ['y', 'z'],
      ['t', 'y', 'x', 'z'],
    ])
    distance, path = bellman_ford(graph, 's', 'x')
    self.assertEqual(distance, 9)
    self.assertIn(path, [
      ['t', 'x'],
      ['y', 'x'],
    ])

Python Algorithms for Interviews

Python Algorithms for Interviews

Learn about common algorithm concepts in Python and how to solve algorithm challenges you may encounter in an interview.


⭐️Contents⭐️

⌨️ (0:00:00) Big O Notation

⌨️ (0:22:08) Big O Examples

⌨️ (0:43:01) Array Sequences

⌨️ (0:53:23) Dynamic Arrays

⌨️ (1:06:26) Array Algorithms

⌨️ (1:20:40) Largest Sum

⌨️ (1:31:27) How to Reverse a String

⌨️ (1:57:32) Array Analysis

⌨️ (2:00:00) Array Common Elements

⌨️ (2:28:54) Minesweeper

⌨️ (3:08:16) Frequent Count

⌨️ (3:16:58) Unique Characters in Strings

⌨️ (3:28:35) Non-Repeat Elements in Array



Learn More

Complete Python Bootcamp: Go from zero to hero in Python 3

Complete Python Masterclass

Learn Python by Building a Blockchain & Cryptocurrency

Python and Django Full Stack Web Developer Bootcamp

The Python Bible™ | Everything You Need to Program in Python

Learning Python for Data Analysis and Visualization

Python for Financial Analysis and Algorithmic Trading

The Modern Python 3 Bootcamp

Original video source: https://www.youtube.com/watch?v=p65AHm9MX80