Existence of Code

A source code $C$ for a random variable $X$ is a mapping from $\mathcal{X}$ to $D^*$

• $X$: the range of $X$
• $D$-ary alphabet is $\mathcal{D}={0,1,\cdots,D-1}$
• $C(x)$: codeword for $x$
• $l(x)$: length of $C(x)$
• $L(C)=\sum_{x\in\mathcal{X}}p(x)l(x)$: the expected length of a source code $C(x)$ for a random variable
• rate $R$: average length

Nonsigular code

• every element of the range $X$ maps into a different string in $D^*$
• $C^*$: extension of a code $C$ is mapping from finite length strings of $X$ to finite-length strings of $D$, $C(x_1x_2\cdots x_n)=C(x_1)C(x_2)\cdots C(x_n)$
• Uniquely decodable: extension is nonsigular

Prefix Code (instantaneous code)

• no codeword is a prefix of any other codeword
• Tree representation
• Kraft Inequality (1949): For any instantaneous code over an alphabet of size $D$, the codeword lengths $l_1,l_2,\cdots,l_m$ must satisfy the inequality $\sum_{i=1}^mD^{-l_i}\leq 1$
• Conversly, given codeword satisfy above inequality, the prefix code exists.
• Interval representation
• Extended Kraft Inequality: For any countably infinite set of codewords

Optimal Codes

• optimal: $\sum_{p_il_i}$ is minimal
• optimization problem: Lagrange $$\min_{l_i} L=\sum p_il_i\\sum D^{-l_i}\leq 1$$
• $p_i=D^{-l_i},l_i=-\log p_i,L^*\geq H_D(X)$
• Bounds: $H_D(X)\leq L^*<H_D(X)+1$
• Shannon codes: $l_i=\lceil\log_D\frac{1}{p_i}\rceil$

Encode $n$ symbols together (block encode to remove 1 in $H_D(X)+1$)

• $\frac{H(X_1,X_2,\cdots,X_n)}{n}\leq L^<\frac{H(X_1,X_2,\cdots,X_n)}{n}+\frac{1}{n}$, if stationary stochastic process, $L^\rightarrow H(\mathcal{X})$
• shortcome: alphabet has $|X^n|$

Wrong code

• The expected length under $p(x)$ of the code assignment $l(x)=\log\frac{1}{q(x)}$: $H(p)+D(p|q)\leq E_pl(x)<H(p)+D(p|q)+1$

Kraft Inequality for Uniquely Decodable Codes (McMillan): The codeword length of any uniquely decodable $D$-ary code must satisfy the Kraft inequality $\sum_{D}^{-l_i}\leq 1$. Conversely, given a set of codeword satisfy this inequality, it is possible to construct a uniquely decodable with these codeword lengths. (prefix code is ‘no less than’ any other code)

Codes

Huffman Codes

• $D$-ary Huffman codes (prefix code) for a given distribution: Each time combine $D$ symbols with the lowest probabilities into a single source symbol, until there is only one symbol
• add dummy symbols to the end of the set of symbols: $1+k(D-1)$
• optimal: $C'$ is any other uniquely decodable code, $L(C)\leq L(C')$

Shannon Codes

• $D$-adic distribution: $P(X=x_i)=D^{-n}$, $l_i=\log\frac{1}{p_i}=n_i$
• Shannon codes
• attain optimality within 1
• $D$-adic: Shannon codes are optimal
• $p_i\rightarrow0$: Shannon codes may be worse

For any optimal coding scheme

• lengths are ordered inversely with probability ($p_j>p_k,l_j\leq l_k$)
• the two longest codewords have the same length
• two of the longest codewords differ only in the last bit

Shannon-Fano-Elias Coding

• $F(x)=\sum_{a\leq x}p(a)$
• modified cumulative distribution function: $\overline F(x)=\sum_{a<x}p(a)+\frac{1}{2}p(x)=F(x)+\frac{1}{2}p(x)$
• Truncate $\overline{F}(x)$ to $l(x)$ bit, $\overline F(x)-|\overline{F}(x)|_{l(x)}\leq\frac{1}{2^{l(x)}}$
• $l(x)=\lceil\log\frac{1}{p(x)}\rceil+1$, $\frac{1}{2^{l(x)}}\leq\frac{p(x)}{2}=\overline{F}(x)-\overline{F}(x-1)$
• $L=\sum p(x)l(x)<H(X)+2$

Random Variable Generation

• fair coin toss: $Z_1,\cdots, Z_n$
• expected number of fair bits $E(T)$
• generate variable: leaves are given symbols of $X$
• $E(T)\geq H(X)$
• dyadic distribution: $E(T)=H(X)$
• otherwise: use binary expansions for the probabilities, $H(X)\leq ET<H(X)+2$

Universal Source Coding

• minimax redundancy: $R^*=\min_q\max_{p_0}R=\min_q\max_{p_0}D(p_\theta|q)$
• Arithmetic Coding
• LZ77: slidinig windows
• LZ78: Trie