Professor: Arne Storjohann | Term: Fall 2024

Module 1

Much of Computer Science is problem solving: Write a program that converts the given input to the expected output.

When first learning to program, we emphasize correctness. With this course, we will also be concerned with efficiency. We will study efficient methods of storing, accessing, and organizing large collections of data.

Typical operations include: Inserting new data items, deleting data items, searching for specific data items, sorting.

We consider various abstract data types (ADTs), and how to realize them efficiently using appropriate data structures. There is a strong emphasis on mathematical analysis in this course.

Useful Logarithms facts:

  • means
  • in this course means
  • , ,
  • ,

Useful Sums:

  • if .

Geometric Sequences:

Harmonic Sequence:

Algorithms and Problems

Problem: Description of possible input and desired output.

Problem Instance: One possible input for the specified problem.

Algorithm: Step-by-step process for carrying out a series of computations, given an arbitrary instance .

Solving a problem: An algorithm solves a problem if, for every instance of , computes a valid output for the instance in finite time.

Program: A program is an implementation of an algorithm using a specified computer language.

Pseudocode: Communicate an algorithm to another person.

insertion-sort(A, n)
A: array of size n
for (i <- i; i < n; i++) do
	for (j <- i; j > 0 and A[j-1] > A[j]; j--) do
		swap A[j] and A[j-1]

Omits obvious details, and should be precise about exit-conditions.

For problem to program that solves it:

  • Design an algorithm that solves Algorithm Design.
  • Assess correctness and efficiency of each Algorithm Analysis. Correctness CS245.
  • If acceptable, implement algorithms.
  • If multiple acceptable algorithms, run experiments to determine the best solution.

This course will focus on the first two points.

Question

What do we mean by efficiency?

In this course, we are primarily concerned with the amount of time a program takes to run (run time).

We also may be interested in the amount of additional memory the program requires (auxiliary space).

The amount of time and or memory required by a program will usually depend on the given problem instance.

RAM model:

  • Each memory cell stores one datum, typically a number, character, or reference.
  • Any access to a memory location takes constant time.
  • Any primitive operation takes constant time.

We compare algorithms by considering the growth rate: What is the behaviour of algorithms as size gets large?

We use order notation (big- and friends). Ignore constants and lower order terms.

We study relationships between functions.

Definition

-notation: ( is asymptotically upper-bounded by ) if there exists constants and such that for all .

In order to prove that from first principles, we need to find and such that the following condition is satisfied:

Many, but not all, choices of and will work.

We also have that . We want a tight asymptotic bound.

Definition

-notation: ( is asymptotically lower-bounded by ) if there exists constants and such that for all .

Definition

-notation: ( is asymptotically tightly-bounded by ) if there exists constants , and such that .

Equivalently:

We also say that the growth rates of and are the same.

How growth rates affect running time. It is interesting to see how the running time is affected when the size of the problem instance doubles.

We have .

Question

How to express that grows slower than .

Definition

-notation: ( is asymptotically strictly smaller than ) if for all constants , there exists a constant such that .

Definition

-notation: ( is asymptotically strictly larger than ) if for all constants , there exists a constant such that .

Suppose that and for all . Suppose that

Then

If the fraction goes towards , then .

This can be computed using l’Hôspital’s rule. This result gives sufficient but not necessary conditions for the stated conclusion to hold.

Many rules are easily proved from first principle.

Identity rule: .

Transitivity:

  • If and then .
  • If and then .
  • If and then .

Suppose that and for all . Then:

Relationships between order notations

Avoid doing arithmetic with asymptotic notations. Do not write .

Techniques for run-time analysis:

  • Identify primitive operations that require time
  • The complexity of a loop is expressed as the sum of the complexities of each iteration of the loop.
  • Nested loops: start with the innermost loop and proceed outwards. This gives nested summations.

Let denote the running time of an algorithm on instance .

Worst-case (best-case) complexity of an algorithm: The worst-case (best-case) running time of an algorithm is a function mapping to the longest (shortest) running time for any input instance of size :

Average-case complexity of an algorithm: The average-case running time of an algorithm is a function mapping to the average running time of over all instances of size :

Question

Suppose algorithm has worst-case run-time and algorithm has worst-case run-time , and both the same problem. Is more efficient?

No! -notation is an upper-bound. may well have worst-case run-time . We should always give -bounds.

Question

Suppose the run-times are and , respectively. We consider to be better. But is it always more efficient?

No! The worst-case run-time of may only be achieved on some instances. Possibly is better on most instances.

Merge-sort

Step 1: Describe the overall idea

Input: Array of integers.

  1. We split into subarrays and that are roughly half as big.
  2. Recursively sort and
  3. After and have been sorted, use a function merge to merge them into a single sorted array.

Step 2: Give pseudo-code or detailed description

merge-sort(A,n)
if (n <= 1) then return
else
	m = (n-1)/2
	merge-sort(A[0..m], m+1)
	merge-sort(A[m+1..n-1], r)
	merge(A[0..m], A[m +1...n-1])

Do not pass array by value, instead indicate the range of the array that needs to be sorted. merge an auxiliary array . Allocate this only once.

merge(A,l,m,r,S)
A[0..n-1] is an array, A[l..m] is sorted, A[m+1..r] is sorted
S[0..n-1] is an array
copy A[l..r] into S[l..r]
(i_L, r_R) <- (l, m+1)
for (k <- l, k <= r: k++) do
	if (i_L > m) A[k] <- S[i_R ++]
	else if (i_R > r) A[k] <- S[i_L ++]
	else if (S[i_L] <= S[i_R]) A[k] <- S[i_L ++]
	else A[k] <- S[i_R ++]

Step 3: Argue correctness. Typically state loop-invariants. Sometimes obvious enough from idea-description and comments.

Step 3: Analyze the run-time

  • First analyze work done outside recursion
  • If applicable, analyze subroutines separately. If there are recursions: how big are the subproblems? The run-time then becomes a recursive function.

The recurrence relation for is as follows

The following is the corresponding sloppy recurrence

This resolves to .

Module 2

Definition

Abstract Data Type (ADT): A description of information and a collection of operations on that information. (Covered in CS135).

The information is accessed only through the operations.

We can have various realizations of an ADT, which specify how the information is stored and how the operations are performed.

Recall:

Stack: An ADT consisting of a collection of items with operations:

  • push: Add an item to the stack.
  • pop: Remove and return the most recently added item.

Items are removed in LIFO order. We can have extra operations size, is-empty, and top.

Queue: An ADT consisting of a collection of items with operations:

  • enqueue: Add an item to the queue
  • dequeue: Remove and return the least recently inserted item.

Items are removed in FIFO order.

We can have extra operations size, is-empty, and peek/format.

Priority Queue generalizes both ADT Stack and ADT Queue.

It is a collection of items (each with a priority or key) with operations:

  • insert: Inserting an item tagged with a priority
  • delete-max: Removing and returning an item of highest priority.

We can have extra operations: size, is-empty, get-max.

This is a maximum-oriented priority queue. A minimum-oriented priority queue replaces operation delete-max by delete-min.

Using a priority queue to sort:

PQ-Sort(A[0..n-1])
for i <- 0 to n-1 do
	PQ.insert (an item with priority A[i])
for i <- n-1 down to 0 do
	A[i] <- priority of PQ.delete-max()

Written as: (initialization insert + delete-max

Realizations of Priority Queues

Realization 1: Unsorted arrays

Run-time of operations:

  • insert:
  • delete-max:

We assume dynamic arrays, expand by doubling as needed. Amortized over all insertions this takes extra time.

PQ-sort with this realization yields selection-sort.

Realization 2: Sorted arrays

Run-time of operations:

  • insert:
  • delete-max:

PQ-sort with this realization yields insertion-sort. insert can be implemented to have best-case run-time. insertion-sort then has best-case run-time.

Realization 3: Heaps

A binary heap is a certain type of binary tree. A binary tree is either empty, or consists of a node and two binary trees (left subtree and right subtree).

Level all nodes with distance from the root. Root is on level .

Height maximum number for which level contains nodes. Single-node tree has height , empty tree has height .

Any binary tree with height has nodes. So height .

A heap is a binary tree with the following two properties:

  1. Structural property: All the levels of a heap are completely filled, except possibly for the last level. The filled items in the last level are left-justified.
  2. Heap-order property: For any node , the key of the parent of is larger than or equal to key of .

The full name for this is max-oriented binary heap.

Lemma: The height of a heap with nodes is .

Heaps should not be stored as binary trees. Let be a heap of items and let be an array of size . Store root in and continue with elements level-by-level from top to bottom, in each level-to-right.

It is easy to navigate using this array representation:

  • The root node is at index 0
  • The last node is
  • The left child is of node is node
  • The right child of node is node
  • The parent of node is node
  • These nodes exist if the index falls in the range

We hide implementation details using helper-functions.

insert in heaps: Place the new key at the first free leaf. Use fix-up to restore heap order.

insert(x)
l <- last()+ 1
A[l] <- x
increase size
fix-up(A, l)
fix-up(A,i)
while parent(i) exists and A[parent(i)].key < A[i].key do
	swap A[i] and A[parent(i)]
	i <- parent(i)

Time: height of heap (and this is tight).

delete-max in heaps

The maximum item of a heap is just the root node. We replace the root by the last leaf (last leaf is taken out). The heap-order property might be violated, perform a fix-down.

fix-down(A,i)
while i is not a leaf do
	j <- left child of i
	if (i has right child and A[right child of i].key > A[j].key)
		j <- right child of i
	if A[i].key >= A[j].key break
	swap A[j] and A[i]
	i <- j

Time: height of heap (and this is tight).

delete-max()
l <- last()
swap A[root()] and A[l]
decrease size
fix-down(A, root(), size)
return A[l]

Time: height of heap (and this is tight).

Binary heap are a realization of priority queues where the operations insert and delete-max take time.

Any priority queue can be used to sort in time

Using the binary-heaps implementation of PQs, we obtain:

PQ-sort-with-heaps(A)
for i <- 0 to n-1 do
	H.insert(A[i])
for <- n-1 down to 0 do
	A[i] <- H.delete-max()

Both operations run in time for heaps PQ-sort using heaps takes time.

Can improve this with two simple tricks heap-sort.

  1. Can use the same array for input and heap auxiliary space.
  2. Heaps can be built faster if we know all input in advance.

Problem: Given items all at once (in ) build a heap containing all of them.

Solution 1: Start with an empty heap and insert items one at a time.

simple-heap-building(A)
for i <- 0 to A.size() - 1 do
	H.insert(A[i])

This corresponds to doing fix-up’s. Worst-case running time:

We can use fix-down’s instead.

heapify(A)
n <- A.size()
for i <- parent(last()) downto root() do
	fix-down(A, i, n)

This yields a worst-case complexity of . A heap can be built in linear time.

Efficient sorting with heaps: Idea is to use PQ-sort with heaps.

heap-sort()A
n <- A.size()
for i <- parent(last()) downto 0 do
	fix-down(A,i,n)
 
while n > 1
	swap items at A[root()] and A[last()]
	decrease n
	fix-down(A, root(), n)

The for-loop takes time and the while-loop takes time.

Problem: Find the th smallest item in an array of numbers.

Solution 1: Make passes through the array, deleting the minimum number each time. Complexity .

Solution 2: Sort , then return . Complexity .

Solution 3: Create a min-heap with heapify(A). Call delete-min(A) times. Complexity: .

Question

We can achieve worst-case time easily, but can we do better?

Yes.

Module 3

We introduce (and solve) a new problem, and then analyze the average-case run-time of our algorithm.

Recall the definition of average-case run-time:

Assume that the set of size- instances is finite. Assume that all instances can occur equally frequently. Then we can use the following simplified formula

To learn how to analyze this, we will do simpler examples first.

silly-test(π, n)
π: a permutation of {0, ..., n-1}, stored as an array
if π[0] = 0 then for j = 1 to n do print 'bad case'
else print 'good case'

permutations have run-time . The remaining permutations have run-time .

Another not-so-contrived example

all-0-test(w,n)
// test whether all entries of bitstring w[0..n-1] are 0
if (n=0) return true
if (w[n-1] = 1) return false
all-0-test(w,n-1)

Define bit-comparisons on input . This is asymptotically the same as the run-time.

Worst-case run-time: Always go into the recursion until . .

Best-case run-time: Return immediately. .

Average-case run-time?

Recall \quad .

Recursive formula for one non-empty bitstring :

Natural guess for the recursive formula for :

This holds with but is useless if ??? is worst. This is not obvious if ??? is avg.

Observe: is uniquely determined by and .

This recursion resolves .

Question

Why can’t we always write ‘avg’ for ’???’ in ?

Consider

silly-all-0-test(w,n)
if (n = 0) then return true
if (w[n-1] = 1) then return false
if (n>1) then w[n-2] <- 0
silly-all-0-test(w,n-1)

Only one more line of code in each recursion, so same formula applies.

But observe that now

So . The ‘obvious’ recursion did not hold. Average-case analysis is highly non-trivial for recursive algorithms.

We saw SELECTION: Given an array of numbers, and , find the element that would be at position of the sorted array.

SELECTION can be done with heaps in time .

Special case: MEDIANFINDING = SELECTION with . With previous approaches, this takes time , no better than sorting.

Question

Can we do selection in linear time?

Yes, the algorithm quick-select.

quick-select and the related quick-sort rely on two subroutines:

  • choose-pivot(A): Return an index in . We will use the pivot-value to rearrange the array.
  • partition(A,p): Rearrange and return pivot-index so that
    • the pivot-value is in ,
    • all items in are , and
    • all items in are .

index of pivot-value before partition (we choose it), index of pivot-value after partition (no control).

Partition Algorithm

partition(A,p)
v <- A[p]
for each element x in A do
	if x < v then smaller.append(x)
	else if x > v then larger.append(x)
	else equal.append(x)
i <- smaller.size
j <- equal.size
Overwrite A[0...i-1] by elements in smaller
Overwrite A[i...i+j-1] by elements in equal
Overwrite A[i+j...n-1] by elements in larger
return i

Efficient in-place partition (Hoare)

partition(A,p)
swap(A[n-1], A[p])
i <- -1
j <- n-1
v <- A[n-1]
loop
	do i <- i+1 while A[i] < v
	do j <- j-1 while j > i and A[j] > v
	if i >= j then break 
	else swap(A[i], A[j])
swap(A[n-1], A[i])
return i

Running time .

quick-select(A,k)
p <- choose-pivot(A)
i <- partition(A,p)
if i = k then return A[i]
else if i > k then return quick-select(A[0...i-1], k)
else if i < k then return quick-select(A[i+1...n-1], k-(i+1))

Let be the number of key-comparisons in a size- array with parameter . partition uses key-comparisons.

Worst-case run-time: Sub-array always gets smaller, so recursions. Each takes comparisons time. This is tight: If pivot-value is always the maximum and .

Best-case run-time: First chosen pivot could be the th element. No recursive calls; .

Average case analysis is complicated. We detour into a randomized version.

A randomized algorithm is one which relies on some random numbers in addition to the input. Randomization is a good idea if an algorithm has bad worst-case time but seems to perform much better on most instances.

The run-time will depend on the input and the random numbers used. We assume that there exists a pseudo-random number generator.

Goal: Shift the dependency of run-time from what we can’t control (input) to what we can control (random numbers).

randomized-all-0-test(w,n)
if n = 0 return true
if (random(2)=0) then
	w[n-1] = 1 - w[n-1]
if w[n-1] = 1 return false
randomized-all-0-test(w,n-1)

This is all-0-test, except that we flip last bit based on a coin toss. random(n) returns an integer uniformly from .

The run-time of the algorithm now depends on the random numbers. Define to be the run-time of a randomized algorithm for an instance and the sequence of outcomes of random trials.

The expected run-time for instance is the expected value:

Now take the maximum over all instances of size to define the expected run-time (formally: worst-instance expected-luck run-time) of .

Expected run-time of randomized-all-0-test.

Define bit-comparisons used on input if the random outcomes are .

The random outcomes consist of two parts .

  • outcome of first coin toss
  • random outcomes (if any) for the recursions

We have .

Recursive formula for one instance:

Natural guess for the recursive formula for

In contrast to average-case analysis, the natural guess usually is correct for the expected run-time.

Proof for randomized-all-0-test.

Therefore .

Question

Does the expected time of a randomized version always have something to do with the average-case time?

Yes if the randomization is a shuffle (choose instance randomly).

Average-case run-time and expected run-time are not the same. Average-case is the average over instances usually applied to a deterministic algorithm. Expected is the weighted average over random outcomes, applied only to a randomized algorithm.

For analyzing the average-case run-time of quick-select, we make two assumptions:

  • All input-items are distinct.
  • All possible orders of the input-items occur equally often.

Goal is to create a randomized version of quick-select.

First idea: Shuffle the input, then do quick-select.

shuffled-quick-select(A,k)
for (j <- 1 to n-1) do swap(A[j], A[random(j+1)])
quick-select(A,k)

We know .

Second idea: Do the shuffling inside the recursion.

randomized-quick-select(A,k)
swap A[n-1] witih A[random(n)]
i <- partition(A,n-1)
if i = k then return A[i]
else if i > k then
	return randomized-quick-select(A[0 ... i-1], k)
else if i < k then
	return randomized-quick-select(A[i+1 ... n-1], k-(i+1))

Let key-comparisons of randomized-quick-select on input if the random outcomes are .

Write random outcomes as . Observe: . We recurse in an array of size or .

As for rand-all-0-test: over all , the recursions can use .

Tedious but straightforward. This recursion resolves to .

randomized-quick-select is generally the fastest solution to the selection problem.

Hoare developed partition and quick-select. He then used them to sort based on partitioning.

quick-sort(A)
if n <= 1 then return
p <- choose-pivot(A)
i <- partition(A,p)
quick-sort(A[0, 1, ..., i-1])
quick-sort(A[i+1, ..., n-1])

Worst-case run-time:

Best-case run-time:

Average-case run-time we prove via randomization.

.

We have seen many sorting algorithms.

Question

Can one do better than running time?

Yes and no, depends on what we allow.

No: Comparison-based sorting lower bound is

Yes: Non-comparison-based sorting can achieve (under restrictions).

All algorithms so far are comparison-based. Data is accessed by comparing two elements, and moving elements around.

Theorem

Any comparison-based sorting algorithm requires in the worst case comparisons to sort distinct items.

Easy to understand via a decision tree.

Non-Comparison-Based Sorting

Assume keys are numbers in base (: radix). All digits are in .

Assume all keys have the same number of digits. Can achieve after padding with leading s.

Can sort based on individual digits.

bucket-sort(A, n, sort-key())
sort-key(): maps items of A to {0,...,R-1}
Initialize an array B[0..R-1] of empty queues (buckets)
for i <- 0 to n-1 do
	Append A[0] at the end of B[sort-key(A[i])]
i <- 0
for j <- 0 to R-1 do
	while B[j] is non-empty do
		move front element of B[j] to A[i++]

bucket-sort is stable, equal items stay in original order.

Run-time is , auxiliary space . It is possible to replace the lists by arrays count-sort.

Most-significant-digit(MSD)-radix-sort

Sort array of -digit radix- numbers recursively: sort by 1st digit, then each group by 2nd digit etc.

MSD-radix-sort(A,n,d<-1)
A: array of size n, contains m-digit radix-R numbers
if (d <= m and (n > 1))
	bucket-sort(A, n, return dth digit of A[i])
	l <- 0
	for j <- 0 to R-1
		Let r >= l - 1 be maximal s.t. A[l..r] have dth digit j
		MSD-radix-sort(A[l..r], r - l + 1, d+1)
		l <- r + 1
  • levels of recursion in worst-case
  • subproblems on most levels in worst-case
  • time for each bucket-sort call.

Run-time - slow. Many recursions and allocated arrays.

LSD-radix-sort(A,n)
A: array of size n, contains m-digit radix-R numbers
for d <- least significant to most significant do
	bucket-sort(A,n,return dth digit of A[i])

Time cost with auxiliary space .

Module 4

Definition

Dictionary: An ADT consisting of a collection of items, each of which contains a key and some data. This is called a key value pair. Keys can be compared and are typically unique.

Common assumptions:

Dictionary has KVPs, each KVP uses constant space, keys can be compared in constant time.

In an unordered array or linked list, search is , insertion is , delete is .

In an ordered array, search is , insertion is , delete is .

Recall binary search only applies to a sorted array.

Binary search trees: All nodes have two (possibly empty) subtrees. Every node stores a KVP, empty subtrees aren’t shown.

Every in is less than the root key, and every key in is greater than the root key.

BST as a realization of ADT dictionary.

BST::search(k): Start at root, compare to current node’s key. Stop if found or subtree empty, else recurse at subtree.

Deletion in a BST: First search for the node that contains the key. If is a leaf, delete it. If has one non-empty subtree, move child up. Otherwise, swap key at with key at successor node and then delete that node. The successor is the next-smallest among all keys in the dictionary.

BST::search, BST::insert, BST::delete all have cost , where height of tree max path length from root to leaf.

Question

If items are inserted one-at-a-time, how big is ?

Worst-case:

Best-case: . Any binary tree with nodes has height . Layer has at most nodes. So .

Goal is to create a subclass of BSTs where the height is always good. We need to impose a structural property, argue that the property implies logarithmic height, and discuss how to maintain the property during operations.

AVL Trees: An AVL Tree is a BST with an additional height-balance property at every node: The heights of left and right subtree differ by at most .

If node has left subtree and right subtree , then

Need to store at each node the height of the subtree rooted at it.

Theorem

An AVL tree on nodes has height. search, BST::insert, BST::delete all cost in the worst case.

AVL Insertion (AVL::insert(k,v))

First, insert with the usual BST insertion. We assume that this returns the new leaf where the key was stored. Then, move up the tree from . Update height (we can do this in constant time). If the height difference becomes at nodes , then is unbalanced. Must re-structure the tree to rebalance.

There are many different BSTs with the same keys.

The goal is to change the structure locally without changing the order. Longterm goal is to restructure so the subtree becomes balanced.

Right rotation on node

flowchart TD
z((z)) --> c((c))
z --> D[/D\]
c --> g
c --> C[/C\]
g((g)) --> A[/A\]
g --> B[/B\]

Becomes

flowchart TD
c((c)) --> g((g))
c --> z((z))
g --> A[/A\]
g --> B[/B\]
z --> C[/C\]
z --> D[/D\]
rotate-right(z)
c <- z.left, z.left <- c.right, c.right <- z
setHeightFromSubtrees(z), setHeightFromSubtrees(c)
return c

Left rotation is used to fix a right-right imbalance.

A double right rotation on node starts with a left rotation at , then a right rotation at .

flowchart TD
z((z)) --> c((c))
z --> D[/D\]
c --> A[/A\]
c --> g((g))
g --> B[/B\]
g --> C[/C\]

Becomes

flowchart TD
z((z)) --> c((c))
z --> D[/D\]
c --> g
c --> C[/C\]
g((g)) --> A[/A\]
g --> B[/B\]

Which then becomes

flowchart TD
g((g)) --> c((c))
g --> z((z))
c --> A[/A\]
c --> B[/B\]
z --> C[/C\]
z --> D[/D\]

Symmetrically, there is a double left rotation.

AVL insertion revisited. When there is an imbalance at do (single or double) rotation. Choose as child where subtree has bigger height.

AVL::insert(k,v)
z <- BST::insert(k,v)
while(z is not NULL)
	if (|z.left.heigth - z.right.height| > 1) then
		Let c be taller child of z
		Let g be taller child of c
		restructure(g,c,z)
		break
	setHeightFromSubtrees(z)
	z <- z.parent

We can argue that for insertion, one rotation restores all heights of subtrees.

restructure(g,c,z)
node g is a child of c which is a child of z
p <- z.parent
(make the right rotation)
make u the appropriate child of p and return u

The middle key of becomes the new root.

AVL Deletion (AVL::delete(k))

Remove the key with BST::delete. Find node where structure change happened. Go back up to the root, update heights, and rotate if needed.

AVL::delete(k)
z <- BST::delete(k)
Assume z is the parent of the BST node that was removed
while (z is not NULL)
	if (|z.left.height - z.right.height| > 1) then
		Let c be taller child of z
		Let g be taller child of c
		z <- restructure(g,c,z)
	setHeightFromSubtrees(z)
	z <- z.parent

Ties must be broken to avoid double rotation.

AVL Tree Summary

  • Search:
  • Insert: . Restructure will be called at most once
  • Delete: . Restructure may be called times.

Worst-case cost for all operations is .

Module 5

So far we’ve seen unordered arrays, ordered arrays, binary search trees, and balanced binary search trees.

If the KVPs were inserted in random order, then the expected height of the binary search tree would be .

We did not consider an ordered list as a realization of ADT dictionary. insert and delete take time in an ordered list, once we know where to place them. The bottleneck is in search.

In an ordered array, we can do binary search to achieve . In an ordered list, we cannot skip to the middle, and so cannot do binary search. Search takes time in an ordered list, this is slow.

To speed up search in an ordered list, add links to help us move forward quicker. Choose randomly when to add links.

Skip Lists

A hierarchy of ordered linked lists (levels) .

Each list contains the special keys and (sentinels). List contains the KVPs of in non-decreasing order. Each list is a subsequence of the previous one, i.e., . List contains only the sentinels.

A node is an entry in one list, a KVP is one non-sentinel entry in .

There are usually mode nodes than KVPs. Each node has references and . Each key belongs to a tower of nodes. Height of tower of : maximal index such that . Height of skip list: maximal index such that exists.

Search in skip list. For each list, find predecessor.

get-predecessors(k)
p <- root
P <- stack of nodes, initially containing p
while p.below != NULL do
	p <- p.below
	while p.after.key < k do p <- p.after
	P.push(p)
return P
skipList::search(k)
P <- get-predecessors(k)
p_0 <- P.top()
if p_0.after.key = k return KVP at p_0.after
else return "not found, but would be after p_0"

It is easy to remove a key since we can find all predecessors. We eliminate lists if there are multiple ones with only sentinels.

skipList::delete(k)
P <- get-predecessors(k)
while P is non-empty
	p <- P.pop()
	if p.after.key = k
		p.after <- p.after.after
	else break
 
p <- left sentinel of the root-list
while p.below after is the infinity-sentinel
	// the two top lists are both only sentinels, remove one
	p.below <- p.below.below
	p.after.below <- p.after.below.below

Inserting in skip lists skipList::insert(k,v). There is no choice as to where to put the tower of . Only choice, how tall should we make the tower of ? We choose randomly.

Repeatedly toss a coin until you get tails. Let be the number of times the coin came up heads. We want key to be in lists , so height of tower of .

Before we can insert, must check that these lists exist. Then we do the actual insertion. Use get-predecessors(k) to get stack . The top items of are the predecessors of where should be in each list . Insert after in , and after in for .

skipList::insert(k,v)
for (i <- 0; random(2) = 1; i++) {}
for (h <- 0, p <- root.below, p != NULL; p <- p.below, h++) {}
while i >= h
	create new sentinel-only list; link it in below topmost list
	h++
P <- get-predecessors(k)
p <- P.pop()
zbelow <- new node with (k,v)
zbelow.after <- p.after; p.after <- zbelow
while i > 0
	p <- P.pop()
	z <- new node with k
	z.after <- p.after; p.after <- z; z.below <- zbelow; zbelow <- z

Analysis of skip lists.

Expected space usage: . Expected height: .

skipList::get-predecessors: expected time.

Search, insert, delete: expected time.

Recall unsorted array realization:

  • Search:
  • Insert:
  • Delete:

Question

Can we do something to make search more effective in practice?

No. If items are accessed with equal likelihoods. We can show that the average-case cost for search is then .

We can make search more effective if search requests are biased (some items are accessed much more frequently than others). Intuition is that frequently accessed items should be in the front. Two scenarios - do we know the access distribution beforehand or not?

Optimal Static Ordering

Scenario: We know access distribution, and want the best order of a list.

KeyABCDE
Frequency of access281105

Recall:

Count of cost if search-key ( instance ) is at th position ).

expected access cost

Order has expected access cost of .

Order is better (access cost of ).

Claim: Over all possible static orderings, the one that sorts items by non-increasing access-probability minimizes the expected access cost.

Dynamic Ordering: MTF

Scenario: We do not know the access probabilities ahead of time.

Idea: Modify the order dynamically, i.e., while we are accessing. Rule of thumb (temporal locality), a recently accessed item is likely to be used again.

Move-To-Front heuristic (MTF): Upon a successful search, move the accessed item to the front of the list.

We can also do MTF on an array, but should then insert and search from the back so that we have room to grow.

There are other heuristics we can also use. Upon a successful search, swap the accessed item with the item immediately preceding it (transpose heuristic). Keep counters how often items were accessed, and sort in non-decreasing order (frequency-count heuristic).

Module 6

So far we’ve seen balanced binary search trees and skip lists. Various other realizations sometimes faster on insert, but search always takes time.

Interpolation Search

Scenario: Numbers in sorted array [40, 50, 70, 90, 100, 110, 120].

binary-search(A,n,k)
l <- 0, r <- n-1
while (l <= r)
	m <- (l + r) / 2
	if (A[m] equals k) then return "found at A[m]"
	else if (A[m] < k) then l <- m + 1
	else r <- m -1 
return "not found"

Compare at index . If keys are numbers, where would you expect key ?

Interpolation search is similar to binary search, but compare at index.

interpolation-search(A,k)
while (l <= r)
	if (k < A[l] or k > A[r]) return "not found"
	if (k = A[r]) then return "found at A[r]"
	m = expression above
	if (A[m] equals k) then return "found at A[m]"
	else if (A[m] < k) then l <- m + 1
	else r <- m-1

In the worst case, this can be very slow . But it works well on average. We can show . This resolves to .

Scenario: Keys in dictionary are words. Need brief review.

Words ( strings): sequences of characters over alphabet .

Convention: Words have end-sentinel $. w.size = non-sentinel characters: |\text{be}\| = 2$.

Trie (a radix tree): A dictionary for bit-strings. Comes from retrieval. A tree based on bitwise comparisons, edge labelled with corresponding bit. Similar to radix sort: use individual bits, not the whole key. Due to end-sentinels, all key-value pairs are at leaves.

Tries: Search

Follow links that corresponds to current bits in . Repeat until no such link or found at a leaf. Similar as for skip lists, we find search-path separately first.

Trie::get-path-to(w)
P <- empty stack; z <- root; d <- 0; P.push(z)
while d <= |w|
	if z has a child-link labelled with w[d]
		z <- child at this link; d++; P.push(z)
	else break
return P
Trie::search(w)
P <- get-path-to(w), z <- P.top
if (z is not a leaf) then
	return "not found, would be in sub-trie of z"
return key-value pair at z

For later applications of tries, we want another search-operation. prefix-search(w): find word in trie for which is a prefix. Testing whether exists is easy.

To find quickly, we need leaf-references. Every node stores reference to a leaf in subtree. Convention: store leaf with the longest word.

Trie::prefix-search(w)
P <- get-path-to(w), p <- P.top()
if number of nodes on P is w.size or less
	return "no extension of w found"
return p.leaf

Trie::insert(w). get-path-to(w) gives ancestors that exist already. Expand the trie from by adding necessary nodes that correspond to extra bits of . Update leaf-references.

search(w), prefix-search(w), insert(w), delete(w) all take time . Search time is independent of number of words stored in the trie! Search-time is small for short words.

The trie for a given set of words is unique (expend for order of children and ties among leaf-references).

Tries can be wasteful with respect to space. Worst-case space is .

Question

What can we do to save space?

Pruned Tries: Stop adding nodes to trie as soon as the key is unique.

A node has a child only if it has at least two descendants. Saves spaces if there are only few bit-strings that are long. This is a more efficient version of tires, but the operations get a bit more complicated.

We have (implicitly) seen pruned tries before: For equal-length bit-strings, pruned trie equals recursion tree of MSD radix-sort.

Compressed Tries: Compress paths of nodes with only one child. Each node stores an index, corresponding to the level of the node in the uncompressed trie. These are known as Patricia-Tries (Practical Algorithm to Retrieve Information Coded in Alphanumeric)

As for tries, follow links that correspond to current bits . Main difference: stored indices say which bits to compare.

CompressedTrie::get-path-to(w)
P <- empty stack; z <- root: P.push(z)
while z is not a leaf and (d <- z.index <= w.size) do
	if (z has a child-link labelled with w[d]) then
		z <- child at this link; P.push(z)
	else break
return P
CompressedTrie::search(w)
P <- get-path-to(w), z <- P.top
if (z is not a leaf or word stored at z is not w) then
	return "not found"
return key-value pair at z

search(w) and prefix-search(w) are easy. insert(w) and delete(w) are conceptually simple. Search for path to word, uncompress this path, insert/delete as in an uncompressed trie, compress path from root to where change happened.

All operations take time for a word . Compressed tries use space. We have leaves, every internal node has two or more children. Can show that there are more leaves than internal nodes.

To represent strings over any fixed alphabet , any node will have at most children.

Variation: Compressed multi-way tries: compress paths as before.

Operations search(w), prefix-search(w), insert(w), and delete(w) are exactly as for tries for bit-strings. Run-time . Each node now has up to children.

Question

How should they be stored?

Time/space tradeoffs. Arrays are fast, lists are space-efficient. Dictionary best in theory, not worth it in practice unless is huge. In practice, we use hashing.

Module 7