Preliminary

• A mathematical theory of communicationo - Shannon 1948
• Convexity: $f(\sum_ip_ix_i)\leq \sum_ip_if(x_i)$
• $f(E_p x_i)\leq E_pf(x_i)$
• convex: $f''(x)\geq 0$
• $f(x)=-x\log x$ is concave

Entropy

Entropy: $H(X)=-\sum_{x\in\mathcal{X}}p(x)\log p(x)=E_{p}\log\frac{1}{p(X)}$

• $0\log 0\rightarrow 0$
• $0\leq H(X)\leq \log|\mathcal{X}|$
• uniform $X$: $H(X)=\log|\mathcal{X}|$
• $H_b(X)=\log_baH_a(X)$

Joint Entropy: $H(X,Y)=-E\log p(X,Y)=-\sum_{x\in\mathcal{X}}\sum_{y\in\mathcal{Y}}p(x,y)\log p(x,y)$

• $H(X,X) = H(X)$
• $H(X,Y) = H(Y,X)$

Conditional Entropy: $H(Y|X)=\sum_{x\in\mathcal{X}}p(x)H(Y|X=x)=-\sum_{x\in\mathcal{X}}\sum_{y\in\mathcal{Y}}p(x,y)\log p(y|x)=-E\log p(y|x)$

• $H(Y|X)\leq H(X)$
• remaining uncertainty when $X$ is known
• $H(X|Y)\neq H(Y|X)$
• $H(X|Y)+H(Y)=H(Y|X)+H(X)=H(X,Y)$
• Chain rule: $H(X_1,X_2,\cdots,X_n)=H(X_1)+H(X_2|X_1)+\cdots+H(X_n|X_{n-1},\cdots,X_1)$
• Zero conditional entropy: if $H(Y|X)=0,Y=f(X)$

Relative Entropy (Kullback-Leibler distance): $D(p|q)=\sum_{x\in X}p(x)\log\frac{p(x)}{q(x)}=E_p\log\frac{p(X)}{q(X)}=E_p(-\log q(x))-H(p)$

• $D(p|q)\geq 0$ equality iff $p(x)=q(x)$ (prove by concavity)
• variational distance $V(p,q)=\sum_{x\in\mathcal{X}}|p(x)-q(x)|$
• Pinsker’s inequlity: $D(p|q)\geq\frac{1}{2\ln 2}V^2(p,q)$
• $D(p|u)=\log|\mathcal{X}|-H(X)$

Mutual Information: $I(X;Y)=\sum_x\sum_yp(x,y)\log\frac{p(x,y)}{p(x)p(y)}=D(p(x,y)|p(x)p(y))=E_{p(x,y)}\log\frac{p(X,Y)}{p(X)p(Y)}$

• Independent $I(X;Y)=0$
• $I(X;X)=H(X)$
• $I(X;Y)=H(X)-H(X|Y)=H(X)+H(Y)-H(X,Y)$

Conditional Mutual Information: $I(X;Y|Z)=H(X|Z)-H(X|Y,Z)=E_{p(x,y,z)}\log\frac{p(X,Y|Z)}{p(X|Z)p(Y|Z)}$

• Chain rule: $I(X_1,X_2,\cdots,X_n;Y)=\sum_{i=1}^nI(X_i;Y|X_{i-1},\cdots,X_1)$

Conditional Relative Entropy: $D(p(y|x)|q(y|x))=E_{p(x,y)}\log \frac{q(Y|X)}{q(Y|X)}$

Inequality

• $H(X_1,X_2,\cdots,X_n)\leq\sum_{i=1}^nH(X_i)$ equality iff $X_i$ are independent
• $I(X;Y|Z)$ and $I(X;Y)$
• $X\rightarrow Y\rightarrow Z:I(X;Z|Y)=0$
• $Z=X+Y\bmod 2:I(X;Y|Z)>I(X;Y)$
• Data Processing Inequality: $X\rightarrow Y\rightarrow Z,I(X;Y)\geq I(X;Z)$
• $X\rightarrow Y\rightarrow Z,I(X;Y)=I(X;Z)+I(X;Y|Z)$
• $I(X;Y;Z)=I(X;Y)-I(X;Y|Z)$
• Perfect Secrecy: $H(X)\leq H(Z)$ 信息长度小于秘钥长度
• Fano’s Inequality: relationship between $P_e$ and $H(X|Y)$
• $X\rightarrow Y\rightarrow \hat X,P_e=P(\hat X\neq X)$
• $H(P_e)+P_e\log|\mathcal{X}|\geq H(X|\hat X)\geq H(X|Y)$
• weaken: $P_e\geq\frac{H(X|Y)-1}{\log |\mathcal{X}|}$
• Log sum inequality: for nonnegative numbers, $\sum_{i=1}^na_i\log\frac{a_i}{b_i}\geq(sum_{i=1}^na_i)\log\frac{\sum_{i=1}^na_i}{\sum_{i=1}^nb_i}$, equal iff $\frac{a_i}{b_i}=c$
• $H(p)$ is concave of $p$
• $D(p|q)$ is convex in pair $(p,q)$