Notes for VE477 (Introduction to Algorithms) | FA2020 @UM-SJTU JI, Shanghai Jiao Tong University.

Computational Problem

  • a computational problem is a question or a set of questions that a computer might be able to solve. (what is a computer?)
  • study of the solutions to the computational problem composes the field of Algorithms
  • computational complexity attempts to classify the algorithm depending on their speed or memory usage.
  1. decision problem
  2. search problem
  3. counting problem
  4. optimization problem
  5. function problem

Turing Machine

  • a function f:f: \sum^* \rightarrow \sum^* is said to be Turing computable if there exists a Turing machine MM which returns f(x)f(x) for any input xx.
  1. deterministic polynomial algorithm: TM(x)P(x)T_M(x)\leq P(x)
  2. LL a language, PP decision problem PP
  • The set of the decision problems which could be solved by a deterministic polynomial algorithm defines the class P.
  • class NP: a decision problem is computable by a non-deterministic polynomial algorithm.
    • be polynomial with the certificate
  • class co-NP
  • NP-Complete: NP-Complete is a complexity class which represents the set of all problems X in NP for which it is possible to reduce any other NP problem Y to X in polynomial time.
  • NP-hard: do not have to be in NP, do not have to be decision problem. a problem X is NP-hard, if there is an NP-complete problem Y, such that Y is reducible to X in polynomial time.

P问题是在多项式时间内可以被解决的问题,而NP问题是在多项式时间内可以被验证其正确性的问题。 NP困难问题是计算复杂性理论中最重要的复杂性类之一。如果所有NP问题都可以多项式时间归约到某个问题,则称该问题为NP困难。

NPNP contains all problems in PP

  • PSPACE completeness

    SPACE: denote the set of all the decision problems which can be solved by a Turing machien in O(s(n))\mathcal{O}(s(n)) space for some function ss of the input size nn. PSPACE is defined as

    PSPACE=kSPACE(nk)PSPACE = \cup_{k} SPACE(n^k)

    PSPACE-complete: 1. in PSPACE 2. for all PP in PSPACE, PP can be reduced in polynomial space the to PSPACEcompletePSPACE-complete

halting problem: undecidable. Do not belong to NPNP. NPhardNP-hard. reduce SAT to Halting problem

  1. SAT

    boolean satisfiability problem: NP-complete

  2. TQBF: PSPACE-complete and NP-hard

    True Quantiffied Boolean Formula: can be solved in exponential time and polynomial space.

    • when quantifier \forall, the both xi=0or1x_i=0or1 should be evaluted true
    • \exists, only one of them

    Then each time recursive call will give at most 2 way, then O(n2)\mathcal{O}(n^2)

  3. Hilbert’s tenth problem

    Given a Diophantine equation, with any number of unkown quantities and with rational integral numerical coefficients, decide whether the equation is solvable in rational numbers (decision problem).

  4. 3-SAT

    example of how to proceed to evaluate the complexity class of a given problem

    cnf: conjunctive normal form: if a boolean formula is written as the conjunction of disjunctive clauses.

    3-SAT is NP-complete:

    • 3-SAT is in NPNP (clearly as it is a particular case of SAT)

    • in NPcompleteNP-complete that being able to solve it means being able to solve SAT, the proposed transformation is applicable in polynomial time

      Convert a cnf-formula FF to a 3cnf-formula FF’ (then could transform any instance of SAT into an instance of 3-SAT)

      Denote each clause in FF as C1,C2,...,CkC_1, C_2, ...,C_k
      a. For clause has one literal as x1x_1, change to (x1x1x1)(x_1\lor x_1 \lor x_1)
      b. For clause has two literals as (x1x2)(x_1\lor x_2), change to (x1x2x1)(x_1\lor x_2 \lor x_1)
      c. For clause has more than three literals as (x1x2...xm)(x_1\lor x_2\lor ... \lor x_m), introduce new variable ziz_i change to (x1x2z1)(¬z1x3z2)(¬z2x4z3)...(¬zm3xm1xm)(x_1\lor x_2 \lor z_1)\land (\neg z_1\lor x_3 \lor z_2)\land (\neg z_2\lor x_4 \lor z_3)\land ... \land (\neg z_{m-3}\lor x_{m-1} \lor x_m)

      an example:
      image-20201023132044585

      ​ Then we could rewrite it as

      ​$$
      \begin{align*}
      &(x_1\lor x_2 \lor z_1)\land (\neg z_1\lor \neg x_3 \lor z_2)\land (\neg z_2\lor x_4 \lor z_3)\land (\neg z_3\lor x_5 \lor\neg x_6)\land \
      &(\neg x_1\lor \neg x_2 \lor z_1)\land (\neg z_1\lor x_3 \lor z_2)\land (\neg z_2\lor\neg x_4 \lor z_3)\land (\neg z_3\lor x_5 \lor x_6)\land \
      &( x_1\lor \neg x_2 \lor z_1)\land (\neg z_1\lor \neg x_3 \lor z_2)\land (\neg z_2\lor x_4 \lor z_3)\land (\neg z_3\lor x_5 \lor \neg x_6)\land \
      &(x_1 \lor \neg x_2 \lor x_1)
      \end{align*}

  • stirling’s formula n!2πn(ne)nn! \approx \sqrt{2\pi n}(\frac{n}{e} )^n
  • logn!Θ(nlogn),lognΘ(logn)logn! \in \Theta(n \cdot logn), \lceil logn\rceil \in \Theta(logn)
image-20200915132328058

1+2+4+8+…+2^{log n} \approx 2n-1. Thus, the time complexity is Θ(n)\Theta(n).

—Running time is expressed as T(n) for some function T on input size n.

T(n)O(f(n))T(n)\in O(f(n)) also T(n)=O(f(n))T(n)= O(f(n))

complexity theory

  1. RAM model

    image-20200924095457385

The complexity of an algorithm is defined by a numerical function

  1. O\mathcal{O}
image-20200925120901038

A sufficient condition of Big-Oh

image-20200925130630096

image-20200925130715821image-20200925130730523

image-20200925165500558
  1. Ω\Omega
image-20200925120943159
  1. Θ\Theta
image-20200925121016271

Solve Recurrence

Master method: T(n)aT(nb)+O(nd)T(n) \leq aT(\frac{n}{b}) + O(n^d)

  • base case: for sufficiently small nn, T(n)constantT(n) \leq constant
  • a = number of recursive calls (positive integer)
  • b = input size shrinking factor (positive integer)
  • O(nd)O(n^d): the runtime of merging solutions. d is a real value $\geq $ 0
  • a, b, d : independent of n
  • image-20200917083956683
    1. In merge sort, a=2, b=2, d=1
    2. in quick sort, if choose the median as the pivot, a=2,b=2,d=1
    3. in binary search, a=1, b=2, d=0

Divide-and-conquer Approach

  • merge sort
  • Quick sort

Counting inversions

  1. divide into 2 sets
  2. in each set, recursively count the number of inversions
  3. sort each sort
  4. when merge, as need to compre the number, could get the number

Sorting

A sorting algorithm that is based on pairwise comparisons mush use Ω(NlogN)\Omega (NlogN) operations to sort in the worst case

Reorder array A of size N with consistent comparison function

  • in place: requires O(1)\mathcal{O}(1) additional memory;不占用额外内存,只利用原来存储待排数据的存储空间进行比较和交换。

    ​ “ Space usage affects the constant factor, because the memory access takes time. Modern computers are based on memory hierarchy. The top is small cache, but extremely fast. If O(1) additional space, it can be fit inside the cache and then, the program is fast; otherwise, it cannot be fit inside the cache, which needs to read the secondary memory, which takes time. ”

  • Stability: whether the algorithm maintains the relative order of records with equal keys

Best Case Time Worst Case Time Average Case Time In Place Stable
Insertion O(N) sorted array O(N2N^2) 逆序array O(N2N^2) Yes Yes
Selection О(n2) comparisons, O(1) Swap O(N2N^2) O(N2N^2) Yes No
Bubble O(N2N^2) O(N2N^2) Yes Yes
Merge Sort O(NlogNN log⁡N) O(NlogNN log⁡N) No Yes
Quick Sort O(N2N^2) O(NlogNN log⁡N) Weakly No

Comparison Sort

Each item is compared against others to determine its order

img
img

Simple sorts

  1. insertion sort
  2. selection sort
  3. bubble sort

Fast sort: quick sort and merge

Quick Sort

select a pivot randomly

  1. worst case O(n2)\mathcal{O}(n^2): 每次都选到了最大/最小的元素

出现可能2n1n!\frac{2^{n-1}}{n!} - extremely samll

image-20200925090935239
  1. On average θ(n)\mathbb{\theta}(n)
image-20200925140748192

how to do partition? Not in place: with another array B

in-place partition

1. once select a pivot, swap the pivot with the first element in the array 
2. set two pointer i,j; i points to the second element in the array, i = 1; j points to the last elemtn j = N - 1
3. Then increment i until find A[i] >= pivot
4. decrease j until find A[j] < pivot
5. if i < j : swap A[i] and A[j] the go back to 3.  
6. otherwise swap the first element (the pivot) with A[j]

Non-comparison Sort / distribution-based sort

  • each item is put into predifined “bins”, independent of other items; no comparison with other items needed
  • Counting sort / bucket sort / radix sort

Counting sort

Assume that the input consists of data in a small range

给定array,已知其中data range为(0,k) + length of the array AA N (k and N are both parameters, although known, not treated as constant)

  1. allocate a new array countcount with size k+1
  2. store the number of each number in the original array AA in countcount
  3. Sum all number from 00 to kk in countcount as count[i]=count[i]+count[i1]count[i] = count[i] + count[i-1]
  4. from the end of AA, put each element in A[n]A[n] in the new array at the position count[A[n]]count[A[n]], then $count[A[n]] -=1 $

Bucket sort

Assume that the input is drawn from a uniform distribution, then linear time complexity  O(n) \mathcal{O}(n)could be obtained. (the time complecity is relevant to the input)

  1. set an array as an initially empty bucket
  2. Go over the array, put each item into corresponding bucket
  3. in each bucket, do comparison sort
  4. visit all the buckets in order and put all items back to the original array
image-20200918182720630

Radix sort

比如sort name, 因为姓氏的集中性,not good for it

each element in the n-element array AA has dd digits, where digit 1 is the lowest-order digit and digit dd is the highest order.

image-20200918184344405

In-place and not in-place both merge sort each subarray recursively, the difference is in the merge function. For the marge sort discussed in the class, $$\leq $$ is used to compare so that it is stable.

In-place merge sort need to shift all the element because no additional memory (like the additional array in slides), so time complexity $$O(n^2)$$ not $$O(nlogn)$$ any more.

LinearTime Selection

Randomized selection algorithm

image-20200925084333747

Average :

Input array size n && random pivot choice

Rselect is in phase j: current array size is between (34)j+1n(\frac{3}{4})^{j+1}n and (34)jn(\frac{3}{4})^{j}n.

XjX_j: the number of recursive calls in phase jj

For example, phase 0 is size between [3/4n, n]. Depends on what the pivot you choose, the array may enter a new phase or remain in the current phase

Good pivot: make the left sub-array size is am, i.e. 1\4<a<3\4. Probability: 0.5. 因为只要在old array中间50%的位置取即可获得

20200925093853343.png

—E[XjX_j]≤ Expected number of times you need to get a good pivot, p=0.5, == flip coin

小于:因为即使无good pivot也可能进入new phase

N: the number of chosen needed to get a good pivot. P[N=k] = 12k\frac{1}{2^k}; E[N]=2.

E[XjX_j] <= E[N] = 2

image-20200925095103847

第一层循环中将pivot与n个元素进行比较,时间为cncn.: 第一步时间 T(n)=cn+T(n/2)T(n)=cn+T(n/2)

相似的,在递归过程中T(n/2)=c(n/2)+T(n/4); T(n/4)=c(n/4)+T(n/8) T(2)=2c+T(1); T(1)=1cT(n/2)=c(n/2) + T(n/4); \ T(n/4)=c(n/4) + T(n/8)…\ T(2) = 2*c+T(1); \ T(1)=1*c

则有T(n)=c(n+n/2+n/4++2+1)=2n=O(n)T(n) = c(n+n/2+n/4+…+2+1) = 2n = O(n)

空间复杂度:无额外的空间,inplace O(1)\mathcal{O}(1)

The runtime depends on the input pivot

When i = n/2 ,the worst case runtime is Θ(n2)\mathcal{\Theta}(n^2), such that example array 1,2,3,4,5,6,7, choose pivot sequence 1,2,3,7,6,5,4 then comparison time: c(n-1 + n-2 + … + n/2 + … 1). However, if choose 1,2,3,4,5,6,7, only c(n-1 + n-2 + … + n/2 )

Best:

Best case happens when your random selection of pivot directly gives you the i-th smallest item (i.e., a pivot with index as i). However, the pivot index can only be known after the partition. Thus, the runtime is Θ(n)\mathcal{\Theta}(n).

Deterministic selection algorithm

—Idea: use “median of medians”

ChoosePivot(A, n)

  • —A subroutine called by the deterministic selection algorithm

  • —Steps

​ 1.Break A into n/5 groups of size 5 each

​ 2.Sort each group (e.g., use insertion sort)

​ 3.Copy n/5 medians into new array C

​ 4.Recursively compute median of C

— By calling the deterministic selection algorithm!

​ 5.Return the median of C as pivot

Dselect

image-20200925095542348

Intpu array size n; Runs on O(n)\mathcal{O}(n) time

but not in-place: need an additional array of 5/n medians

  • 对于长度为5的array排序,时间为constant,

  • why size <= 0.7n

  • image-20200925102033806

于是可以看出在图中T(?)处的size不会大于0.7n

因此对于整个dselect来说,

—Hope: there is a constant a (independent of n) such that T(n)≤an for all n>1

​ —Then T(n)=O(n)

—We choose a=10c, so to prove ->

Proof by induction:

  • base case: T(1)10cT(1) \leq 10c

  • inductive step: inductive hypothesis T(k)10ck, k<nT(k) \leq 10ck, \forall\ k<n

    ​ Then prove it also true for T(n)T(n) : T(n)cn+T(5n)+T(7n10)cn+2cn+7cn=10cnT(n) \leq cn+T(\frac{5}{n}) + T(\frac{7n}{10}) \leq cn+2cn+7cn = 10cn

Dynamic Programming

idea behind DP:

  • break a complex problem into simpler subproblems
  • store the result of the overlapping subproblems
  • do not recompute the same information again and again
  • do not waste memory because of recursion

saves both time and space

  • simple idea dramatically improving space-time complexity
  • cover all the possibilities at the subproblem level
  • efficient as long as the number of subproblems remains polynomial

eg: fibnonacci number naive way: do recursion

  1. Memorized DP: recursion + memorization

​ **time = #of sub problems \cdot time/subproblem **

mem={}
fib(n):
	if n in mem: return mem[n]
	else:
		if n <= 2: f = 1
		else: f = fib(n-1) + fib(n-2)
		mem[n] = f
		return f
// # of subproblem:n & time/subproblem is \Theta(1)
  • fib(k) only recurses when it is first called
  • only n nonmemorized calls: k=n, n-1, … 1
  • memorized call free (Θ(1)\Theta(1) time)
  • could ignore recursion, Θ(1)\Theta(1) time per call

https://www.zhihu.com/question/23995189

  1. Bottom-up DP

    fib = {}
    for k in [1,2, ...,n]: # n loops, each call is \Theta(1)
    		if k <= 2: f = 1
    		else: f = fib(n-1) + fib(n-2)
    		fib[k] = f
    return fib[n]
    // same computation as do not need to count for recursion but save space 
    

intro

​ 用尽量少的1,5,11凑出15;贪心策略15=111+4115 = 1 * 11 + 4 *1, 五张。

​ 正确的为15=3515 = 3 * 5, 三张。 贪心策略为尽量使之后面对的w更少,只考虑眼前情况

​ 用f(n)f(n) 表示凑出n所需要的最少数量。与1,5,11各用了多少无关。

​ Cost = f(4) + 11 / f(10) + 5 / f(14) + 1 -> $f(n) = min{ f(n-1),f(n-5),f(n-11)} + 1 $

​ 以O(n)\mathcal{O}(n) 复杂度解决

  • 将一个问题拆成几个子问题,分别求解这些子问题,即可推断出大问题的解

  • Definitions: (DP 需要满足的前提)

    • 无后效性:确定f(n)后,如何凑出f(n)即无关
    • 最优子结构: f(n)即为小问题的最优解, 因此可以得出大问题的最优解

    DP: 枚举有可能成为答案的解,自带剪枝 - 尽量缩小可能解空间

knapsack problem

subset sum problem

time complexity O(Nsum)\mathcal{O}(N*sum)

  • Use a boolean subset[i][j] to denote if there is a subset of sum j with element at index i1i-1 as the last element.
subset size: (A.size() + 1) * (target + 1)
subset[i][j] = True
## sum of a subset with the i-1 th element as the last element, equal to j

if (A[i] > j)
	subset[i][j] = subset[i - 1][j]  // copy the answer for previous cases
else 
	subset[i][j] = subset[i - 1][j] OR subset[i - 1][sum - A[i]]
	// if any previous states have already experinced the sum=j OR 
	// OR any previous state experinced a value 'j - A[i]'

Shortest path

Implementation: dij & bellmanford in VE477 lab5 (lab4: fib heap)

shortest path in weighted graph: Given a connected, simple, weighted graph, and two vertices ss and tt, find the shortest path that joins ss to tt.

  • Egdes only have positive weights
  • Edges have positive and negative egdes
  1. Dijkastrs’s algorithm: only for solving the case 1, could not deal with negative weights. (no DP involved)
  2. Bellman-Ford: take advantage of DP

Subproblem dependency should be acyclic

String

when subproblem for strings: (all polynomial)

  • Suffixes x[i:],ix[i:], \forall i Θ(n)\Theta(n)
  • prefixes x[:i,],ix[:i,], \forall i Θ(n)\Theta(n)
  • substrings x[i:j], ijx[i:j], \forall\ i\leq j Θ(n2)\Theta(n^2)
  1. Edit Distance:

    given two string xx and yy, the cheapest possible sequence of character edits to turn xyx\rightarrow y.

    ​ ALLOW:

    • Insert cc
    • Delete cc
    • replace cc with cc'

    SUBPROBLEM: x[i:]&y[j:] i,jx[i:]\&y[j:]\ \forall i,j then Θ(xy)\Theta(|x|\cdot|y|)

    GUESS: one of three allowed possibilities

    RECURRENCE: DP(i, j) = min{ (cost of repalce x[i]->y[j] + DP(i+1, j+1) , cost of insert y[j] + DP(i, j+1), cost of delete x[i] + DP(i+1, j)}

    TOPO ORDER: i: |x| -> 0

    ​ j: |y| ->0

    DP(\emptyset, \O)

    Time comlexity: subproblem Θ(1)\Theta(1). Total: Θ(xy)\Theta(|x|\cdot|y|)

longest common sequence: HELLOHELLO for these two words

x:HIEROGLYPHOLOGYx: HIEROGLYPHOLOGY

y:MICHAELANGELOy: MICHAELANGELO

Union-Find

A disjoint-set data structure is a data structure that keeps track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. A union-find algorithm is an algorithm that performs two useful operations on such a data structure:

并查集是一种树型的数据结构,用于处理一些不交集的合并及查询问题; 不能将在同一组的元素拆开

two operations

  • Find(v): find the root of the tree for vertex vv
  • Union(v, w): link the root of the tree containg vertex vv to the root of tree containing vertex ww

两处优化

  • When Find, update all the nodes along the path (all link them to the root directly)
  • When Union, set the root with higher depth as the parent of another root (minimize the depth of the tree)
image-20200927102034381

Complexity

  • Lemma image-20200927102643009

  • iterated logarithm function image-20200927103159560

    such that, the iterated algorithm of n is the number of time that the function log need to be applied to obtain a number smaller than 2

  • The cost for one find operation is O(logn)\mathcal{O}(logn)

    because for each find, the path will be compressed,

  • The amortized time for a sequence of mm GenSet Union Find operations, nn of which are GenSet can be performed in time O(mα(n))\mathcal{O}(m\alpha(n)).

The complexity of Union-Find structure is Ω(α(n))\Omega(\alpha(n))

Ackerman’s function

inverse Ackerman’s function

MST

Implement: VE477 lab2 (Minimum Spnning Tree)

  1. Definitions
  • tree: acyclic, connected undirected graph; E=V1|E| = |V| - 1
  • any connected graph with NN nodes and N1N-1 edges is a tree
  • graph G=(V,E)G=(V,E), subgraph G=(V,E)G’ = (V’,E’) iff VV&UUV’\subseteq V \& U’\subseteq U
  • spanning tree of a connected undirected graph GG is a subgraph of GG that contains all the nodes and is a tree.
  • weighted, connected and undirected graph -> MST, which is a spanning tree with the minimum sum of weights.
  • TT is a tree: if it contained a cycle, at least one edge could be removed, allowing a lower weight while preserving the connected property of TT
  1. Algorithms

    a. Prim’s Algorithm

    • V=TTV = T \cup T’ , T:T: nodes added to MST; T:T’: nodes not in MST

    • For each node vTv \in T’, keep a measure D(v)D(v) storing current smallest weight over all edges that connect vv to a node in TT

    • Keep P(v)P(v) for each node vv: (P(v),v)(P(v), v) is the edge chosen in MST

    Full version:

    1. Pick one node ss, set D(s)=0D(s) = 0, for any other node vv, set D(v)=D(v)=\infty and P(v)P(v) unknown

    2. Set T=VT’ = V

    3. While $T’ \neq \emptyset $

    • For all nodes in TT’, choose a node vv that has the smallest D(v)D(v).
    • Remove vv from set TT'
    • For vsv’s neighbors uu that still in TT’, if exist D(u)>w(v,u)D(u)>w(v,u), then update D(u)D(u) as w(v,u)w(v,u) and P(u)P(u) as vv

    b. Kruskal’s Algorithm

    Greedy Algorithm, union-find data structure

    • sort all the edges G.EG.E by weight
    • for edges in G.EG.E in non-decreasing order, adding them into TT if no cycle would be created

    To check whether a cycle will be created, union find: whether two edges have the same root Find ; add them: Union

Network flow

网络流问题

image-20201030194352143
  • 对flow的限制:容量限制和流量守恒
  • Antiparallel: (v1,v2)E and (v2,v1)E(v_1,v_2)\in E \ and \ (v_2,v_1)\in E
  • multiple source and/or sink nodes: super node

Maximum Network Flow Problem

  • Residual graph GfG_f. (Ef2E|E_f|\leq 2|E|) 残存图

    image-20201030200750115
  • Argumenting path 增广路径. Residual capacity: on an argumenting path pp, the maximum flow that could be added to each edge s. 将沿着增广路径重复增加路径上的流量直到找到一个最大流. How to know we find a maximum flow? A flow is the maximum flow if and only if no argumenting path in the residual networks.

  • cuts of flow networks: One cut (S,T)(S,T) of flow network G=(V,E)G=(V,E), cut the VV as S and  T=VS,sS,tTS \ and\ \ T=V-S, s\in S,t\in T.

    If ff is a flow, then the net flow f(S,T)f(S,T) across the cut (S,T)(S, T) is defined to be

    f(S,T)=uSvTf(u,v)uSvTf(v,u)f(S,T)=\sum_{u\in S}\sum_{v \in T} f(u,v) - \sum_{u\in S}\sum_{v \in T} f(v,u)

    The capacity of a cut S(S,T)S(S,T) is: c(S,T)=uSvTc(u,v)c(S,T)=\sum_{u\in S}\sum_{v \in T} c(u,v).

  • Minimum cut: A minimum cut of a network is a cut whose capacity is minimum over all cuts of the network.

Max-flow Min-cun Theorem

Algo. Examples

counting inversions

of low implement: VE477 lab3.1.1

数逆序对 application: voting theory / analysis of search engines ranking / collaborative filtering

Divide and conquer approach 将list分为两半,recursively sort list并记录count (sort and count). 接着将两个list (merge and count)

Algorithm. (Merge and count)
Input : Two sorted lists: L1 = (l1,1,··· ,l1,n1), L2 = (l2,1,··· ,l2,n2) 
OutpuT: Number of inversions count, and L1 and L2 merged into L
Function MergeCount(L1 , L2 ):
  count ← 0; L ← ∅; i ← 1; j ← 1; 
  while i≤n1 andj≤n2 do
  	if l1,i ≤ l2,j then
  		append l1,i to L; i++;
  	else
  		append l2,j to L; count←count + n1 − i + 1; j++; end if
  end while
  if i > n1 then append l2,j,··· ,l2,n2 to L; 
  else append l1,i,··· ,l1,n1 to counL;
  return count and L
end

Algorithm. (Sort and count)
Input : A list L = (l1,··· ,ln)
Output : The number of inversions count and L sorted

Function SortCount(L):
  if n=1 then return 0 and L; 
  else
  	Split L into L1 = (l1,··· ,l⌈n/2⌉) and L2 = (l⌈n/2⌉+1,··· ,ln); 
  	count1, L1 ← SortCount(L1);
  	count2, L2 ← SortCount(L2);
  	count, L ←MergeCount(L1, L2);
  end if
  count ← count1 + count2 + count;
  return count and L 
end

time complexity: For merge O(n)

sort part: every time spilt it into two equal parts, a=2, b=2, f(n) = O(n), T(n)=Θ(nlogbalogn)=O(nlogn)T(n) = \Theta(n^{log_ba}logn) = \mathcal{O}(nlogn)

So total for Sort and Count is O(nlogn)\mathcal{O}(n \rm{log} n)

Stable Marriage Problem

Algorithm: Gale-Shapley

Implements 477 Lab3.1.2

Time complexity O(n2)\mathcal{O}(n^2)

N Queens Puzzle

Implementation: Leetcode #52

8*8棋盘上,皇后可以横直斜走,格数不限。皇后之间不可以相互攻击->任何两个皇后不可以在同一行,同一列以及同一斜线上。

backtracking:

image-20201017141331951

Data Structure

A bit comparison of different data structure.

Array has better memory locality and cache performance, arrays are contiguous memory blocks, so large chunks of them will be loaded into the cache upon first access.

Dictionary
Array (Unsorted/Sorted)
search(D,k)
insert(D,k)
delete(D,k)
predecessor(D,k)
successor(D,k)
minimum(D)
maximum(D)

Dictionary using array

image-20201109091428834

Dictionary using linked structure

Hashing

Basics

  • Setup: a universe UU of objects (eg. all names, .., very big)
  • Goal: maintain an evovling set SUS\subseteq U (eg. 100 names, … ,reasonable size)
  • original solution: array based / linked list based

Better solution

  • Hash table, an array A of n buckets n=cSn=c|S|
  • hashing function h:U0,1,,n1h: U \rightarrow {0,1,…,n-1}
  • store item kk in A[h(k)]

Collision: item with differnet search keys hashed into the same buckets (to solve: seperate chaining / open addressing)

Hash Function Design Criteria:

image-20200928093101940