prev:[[61B-part-5-1-BST]]
next:

prev: BSTs

disadvantages:

comparable: need to be able to ask “X<Y”questions
O(logN) can be better

extreme approach: using data as the index

initial all values as false
when an item is added, the index is set to be true.
drawback: extreme waste of memory
suppose we want to add “cat”
problems:
what is the catth element of the list? –use cas the index
other words started with c collides with cat, and cannot store words like =equals()…
solution: use the “power of 27”
$$cat_{27} = 327^2+127^1+20*27^0=2234_{10}$$
the algs will give each lowercase English word a unique number!
ASCII Characters: >=127进制
Chinese characters: >=40959进制
However, the problem: integer overflow
if biggest number in int, it +1 = smallest number!
the code: Hashcode
challenges:
how to handle hashcoode collisions?(collision handling_)
how can we compute a hash code for arbitrary objects? (computing a hashCode)
solution: instead of storing true in position h, store a “bucket” oif these N items at the position h.
The bucket implementation: linked list, arraylist,arrayset…

separate chaining data indexed array

![[Pasted image 20240421132226.png]]

the bucket h is empty –> create a new list containing x, store it at index h.
the bucket h is already a list –> add x to this list if it is not already present.
we can use the modulus of hashcode to reduce bucket count! (downside: lists will be longer)

improvement

suppose:

fixed number of buckets M
increasing number of N
problem: even if the items are spread out evenly, the lists are of length Q=N/M, therefore the complexity is O(N).

example strategy

when N/M >= 1.5, double M
(N/M: “load factor”) how full the hast tabhle is.
Therefore, now we have:

increasing number of buckets M
increasing number of items N
O(N/M) =O(1)

Runtime

One important thing to consider: resizing operation.

resizing takes O(N) time
add takes O(1)+ time (O(N) when resizing)
contains takes O(1)+ time on average.

Ubiquity

great performance in practice
implementations are simple
python dictionary are just hash tables.
used in HashMap and HashSet, not implements Hashable (all objects in Java must implement a .hashcode() method)

Negative hash codes

unfortunately, -1%4=-1 will result in index errors.
Therefore, use Math.floorMod instead.

Warning

never store objects that can change in HashSet or HashMap!
never override equals without override hashCode!

HashCode

goal: make the table bushy
java’s HashCode() for strings: 31-based
It seemed like that 126 is better. However, the overflow is a bad problem for base 126!
major collision problem for base 126: Any string that ends in the same last 32 characters has the same hash code!
$$126^{32}=126^{33}=126^{34}=…0$$
See 61C for more.
Typical hash code: a small prime.