Demystifying dynamic connectivity

WQUPC (Weighted Quick Union with Path Compression) reduces computation time from 30 years to 6 seconds.

In this story, we're going to discuss the importance of the union-find data structure (aka dis-joint sets) in our lives. We'll also discuss how we implement underlying operations and some additional tricks which can be used to further make our life easy.

Why do we care about Union Find?

Union-Find represents the array of problems that can be modeled as disjoint sets. If we consider two sets A and B such that intersection(A, B) = empty, that denotes that A and B are disjoint. The same principle applies when we deal with n number of elements. There are many possible real-life examples of disjoint sets:

Social network (Find all the users which are connected directly or indirectly)
Semiconductor conductivity (Are two points electrically connected?)
Percolation (How many sites should be open before it percolates?)
Pixels in a digital photo (Are two pixels in a digital maze connected?)
and many more ..

Are two highlighted points connected in the maze below?

Screenshot 2021-01-16 at 4.30.14 PM.png

Now, we're slightly familiar with the importance of UF. Let's look at how it can be implemented using actual code examples.

Implementation

The basic API interface for a Disjoint set with Union-Find operations can be defined as follows,

class UF
    __init__(N: int) # initialize a union-find data structure with N singleton objects (0 to N – 1)
    def union(p: int, q: int) # add a connection between p and q
    def find(p: int): int # component identifier for p (0 to N – 1)
    def connected(p: int, q: int): bool # are p and q in the same component?

To implement UF using Quick-Find approach, consider the following premise:

Integer array id[] of length N.
Interpretation: id[i] is parent of i.
Root of i is id[id[id[...id[i]...]]]

The Union Find queries can be translated to the above thought process, as

Find -> What is the root of p?
Connected -> Do p and q have the same root?
Union -> To merge components containing p and q, set the id of p's root to the id of q's root.

class UF:
    def __init__(self, n):
        self.id = [0]*n
        for i in range(n):
            self.id[i] = i

    def find(self, i):
        while(i != self.id[i]):
            i = self.id[i]
        return i

    def union(self, p, q):
        idp = self.find(p)
        idq = self.find(q)

        if idp == idq:
            return

        self.id[idp] = idq

Isn't that very easy? That's the beauty of union-find, being powerful yet simple.

So, is it the best solution?

To answer these questions, let's see how much is the time complexity of the operations in the above-mentioned approach,

Union - O(N) - includes cost of finding roots
Find - O(N)
Connected - O(N)

Hence, a series of N union-find operations on a set of N sites takes O(N^2) time. The main factors contributing to worst-case in the above approach are:

Trees can get tall (imagine a long chain of id[id[id[...id[i]...]]])
Find/connected too expensive (could be N array accesses).

Can we do better? To our surprise, yes we can use couple of interesting techniques for further optimizing union-find operations.

Weighting

In order to tackle the problem of having skewed trees, we can consider weighting.

Modify quick-union to avoid tall trees.
Keep track of size of each tree (number of objects).
Balance by linking root of smaller tree to root of larger tree.

Screenshot 2021-01-16 at 3.52.54 PM.png

Path compression

We can modify the algorithm such that, subsequent queries are faster. Just after computing the root of p, set the id[] of each examined node to point to that root.

The easier way to implement that would be to every other node in the path point to its grandparent.

id[i] = id[id[i]]; # only single line of code

Optimized implementation

Incorporating above mentioned two tricks for further optimizations, we get WQUPC (Weighted Quick Union with Path Compression).

class UF:
    def __init__(self, n):
        self.id = [0]*n
        self.weights = [1]*n
        for i in range(n):
            self.id[i] = i

    def find(self, i):
        while(i != self.id[i]):
            i = self.id[i]
            self.id[i] = self.id[self.id[i]]
        return i

    def union(self, p, q):
        idp = self.find(p)
        idq = self.find(q)

        if idp == idq:
            return

        if self.weights[idp] > self.weights[idq]:
            self.id[idq] = idp
            self.weights[idp] += self.weights[idq]
        else:
            self.id[idp] = idq
            self.weights[idq] += self.weights[idp]

Amortized analysis. Starting from an empty data structure, any sequence of M union-find ops on N objects makes ≤ c ( N + M lg* N ) array accesses. Where lg* N is an interactive log function.

Analysis can be improved to N + M α(M, N)
Simple algorithm with fascinating mathematics

Hence, In theory, WQUPC is not quite linear, however, in practice, it is linear. (iterated log function grows extremely slow, example - lg*(2^65536) = 5)

Interesting fact

For a famous example of performing 10^9 union-find operations on 10^9 sites/elements. Weighted Quick Union Find with Path Compression reduced the processing time from 30 years to 6 seconds. This unblocked solutions to a wide array of real-world challenges. It's a good example to demonstrate the importance of putting more emphasis on defining good algorithms.

Leaf Nodes