Prin Algs Part 1 Union-Find

Union and Find

dynamic connectivity

given a set of N objects
union: connect 2 objects
find/connected: is there a path connecting the two objects?
Today’s problem: is there any paths? (the exact path to find will be discussed later)

public class UF
UF(int N)  //initialize union-find data structure with N objects (0 to N – 1)
void union(int p, int q) //add connection between p and q
boolean connected(int p, int q) //are p and q in the same component?
int find(int p) //component identifier for p (0 to N – 1)
int count() //number of components

eager-approach: quick-find

Integer array id[] of size N.
Interpretation: p and q are connected iff they have the same id.
check: check if p and q have the same id.
the code:

public class QuickFindUF
{
	private int[] id;
	public QuickFindUF(int N)
	{
		id = new int[N];
		for (int i = 0; i < N;i++)
		{
		id[i] = i;
		}
	}
	public boolean connected(int p, int q)
	{
		return id[p] == id[q];
	}
	public void union(int p, int q)
	{
		int pid = id[p];
		int qid = id[q];
		id[p] = qid;
	}
}

drawbacks: Too slow
Union too expansive (Take N^2^ array accesses to process sequences of N union commands on N objects.)
(quadratic 二次的 time is too slow)
For example:

prev: 1000 times per second –> 1000 boxes of memorys
now:10^9^ times per second –> 10^9^ boxes of memorys
for union-find operations:
prev: 10^6^ operations
now: 10^18^ operations
as the data grow bigger, it is much slower
quadratic algorithm –> many times slower when problems goes bigger

lazy approach: quick-union

Integer array id[] of size N.
id[i] is parent of i.
Root of i is id[id[id[…id[i]…]]] (keep going until is doesn’t change)
Find: check if the pand q have the same root
Union: merge components containing p and q. set the id of p‘s root to the id of q‘s root.
code:

public class QuickFindUF
{
	private int[] id;
	public QuickFindUF(int N)
	{
		id = new int[N];
		for (int i = 0; i < N;i++)
		{
		id[i] = i;
		}
	}
	
	public int root(int i)
	{
		while (i != id[i]) i = id[i];
		return i;
	}

	public boolean connected(int p, int q){
		return root(p) == root(q);
	}

	public void union(int p, int q){
		int i = root(p);
		int j = root(q);
		id[i] = j;
	}
}

Too slow – the tree can be too tall.
Find can be expansive.

Improvement

weighting

keep track of all the trees

avoid tall trees to be lower (connect the root of smaller trees to the taller trees)
code:

public class QuickFindUF
{
	private int[] id;
	public QuickFindUF(int N)
	{
		id = new int[N];
		sz = new int[N];
		for (int i = 0; i < N;i++)
		{
		id[i] = i;
		sz[i] = 1;
		}
	}
	
	public int root(int i)
	{
		while (i != id[i]) i = id[i];
		return i;
	}

	public boolean connected(int p, int q){
		return root(p) == root(q);
	}

	public void union(int p, int q){
		int i = root(p);
		int j = root(q);
		if (i == j) return;
		if (sz[i] < sz[j]) {
			id[i] = j;
			sz[j] += sz[i];
		}
		else {
			id[j] = i;
			sz[i] += sz[j];
		}
	}
}

running time: Depth of any node x is at most log2N. (for CS, lg = log2)
acceptable performance: great (when the data bigger, the programs also faster)

path compression

Just after computing the root of p, set the id of each examined node to point to that root.

private int root(int i) { 
	while (i != id[i]) 
	{ 
		id[i] = id[id[i]]; // code added
		i = id[i]; 
	} 
	return i; 
}