Union and Find

dynamic connectivity

given a set of N objects
union: connect 2 objects
find/connected: is there a path connecting the two objects?
Today’s problem: is there any paths? (the exact path to find will be discussed later)

1
2
3
4
5
6
public class UF
UF(int N) //initialize union-find data structure with N objects (0 to N – 1)
void union(int p, int q) //add connection between p and q
boolean connected(int p, int q) //are p and q in the same component?
int find(int p) //component identifier for p (0 to N – 1)
int count() //number of components

eager-approach: quick-find

Integer array id[] of size N.
Interpretation: p and q are connected iff they have the same id.
check: check if p and q have the same id.
the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
public class QuickFindUF
{
private int[] id;
public QuickFindUF(int N)
{
id = new int[N];
for (int i = 0; i < N;i++)
{
id[i] = i;
}
}
public boolean connected(int p, int q)
{
return id[p] == id[q];
}
public void union(int p, int q)
{
int pid = id[p];
int qid = id[q];
id[p] = qid;
}
}

drawbacks: Too slow
Union too expansive (Take N^2^ array accesses to process sequences of N union commands on N objects.)
(quadratic 二次的 time is too slow)
For example:

  • prev: 1000 times per second –> 1000 boxes of memorys
  • now:10^9^ times per second –> 10^9^ boxes of memorys
    for union-find operations:
  • prev: 10^6^ operations
  • now: 10^18^ operations
    as the data grow bigger, it is much slower
    quadratic algorithm –> many times slower when problems goes bigger

lazy approach: quick-union

Integer array id[] of size N.
id[i] is parent of i.
Root of i is id[id[id[…id[i]…]]] (keep going until is doesn’t change)
Find: check if the pand q have the same root
Union: merge components containing p and q. set the id of p‘s root to the id of q‘s root.
code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
public class QuickFindUF
{
private int[] id;
public QuickFindUF(int N)
{
id = new int[N];
for (int i = 0; i < N;i++)
{
id[i] = i;
}
}

public int root(int i)
{
while (i != id[i]) i = id[i];
return i;
}

public boolean connected(int p, int q){
return root(p) == root(q);
}

public void union(int p, int q){
int i = root(p);
int j = root(q);
id[i] = j;
}
}

Too slow – the tree can be too tall.
Find can be expansive.

Improvement

weighting

  • keep track of all the trees
  • avoid tall trees to be lower (connect the root of smaller trees to the taller trees)
    code:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    public class QuickFindUF
    {
    private int[] id;
    public QuickFindUF(int N)
    {
    id = new int[N];
    sz = new int[N];
    for (int i = 0; i < N;i++)
    {
    id[i] = i;
    sz[i] = 1;
    }
    }

    public int root(int i)
    {
    while (i != id[i]) i = id[i];
    return i;
    }

    public boolean connected(int p, int q){
    return root(p) == root(q);
    }

    public void union(int p, int q){
    int i = root(p);
    int j = root(q);
    if (i == j) return;
    if (sz[i] < sz[j]) {
    id[i] = j;
    sz[j] += sz[i];
    }
    else {
    id[j] = i;
    sz[i] += sz[j];
    }
    }
    }
    running time: Depth of any node x is at most log2N. (for CS, lg = log2)
    acceptable performance: great (when the data bigger, the programs also faster)

path compression

Just after computing the root of p, set the id of each examined node to point to that root.

1
2
3
4
5
6
7
8
private int root(int i) { 
while (i != id[i])
{
id[i] = id[id[i]]; // code added
i = id[i];
}
return i;
}

Application: Percolation