Entropy

• (continuous) $X$ with cumulative distribution function $F(x)=Pr(X\leq x)$
• support set of $X$: $f(x)>0$
• differential entropy $h(x)$: $h(X)=-\int_Sf(x)\log f(x)dx$
• $h(X+c) = h(X)$
• $h(aX)=h(X)+\log|a|$
• $h(AX)=h(X)+\log|\det A|$
• $h(X)$ may be negative ($f(x)$ may $>1$)
• uniform: $h(X)=\log a$
• Gaussian: $h(X)=\frac{1}{2}\log 2\pi e\sigma^2$
• $h(X)$: Infinite Information
• does not serve as a measure of the average amount of information
• $h(X_1,X_2,\cdots,X_n)=-\int f(x^n)\log f(x^n)dx^n$
• $h(X|Y)=-\int f(x,y)\log f(x|y)dxdy)$
• Relative Entropy: $D(f|g)=\int f\log\frac{f}{g}\geq 0$
• mutual information: $I(X;Y)=\int f(x,y)\log\frac{f(x,y)}{f(x)f(y)}dxdy\geq 0$

Relation to discrete

• $X^\Delta=x_i$ if $i\Delta\leq x<(i+1)\Delta$
• $p_i=Pr(X^{Delta}=x_i)=f(x_i)\Delta$
• $H(X^{\Delta})=-\sum\Delta f(x_i)\log f(x_i)-\log \Delta$
• as $\Delta\rightarrow 0,H(X^\Delta)+\log \Delta\rightarrow h(f)=h(X)$

AEP

• $-\frac{1}{n}\log f(X_1,X_2,\cdots,X_n)\rightarrow E(-\log f(X))=h(f)$
• $A_\epsilon^{(n)}={(x_1,x_2,\cdots,x_n)\in S^n:|-\frac{1}{n}\log f(x_1,\cdots, x_n)-h(X)|\leq\epsilon}$
• $\text{Vol}(A)=\int_Adx_1dx_2\cdots dx_n$
• Properties
• $Pr(A_\epsilon^{(n)})>1-\epsilon$ for $n$ sufficiently large
• $\text{Vol}(A_\epsilon^{(n)})\leq 2^{n(h(X)+\epsilon)}$
• $\text{Vol}(A_\epsilon^{(n)})\geq (1-\epsilon)2^{n(h(X)-\epsilon)}$

Covariance Matrix

• cov($X$, $Y$)=$E(X-EX)(Y-EY)=E(XY)-(EX)(EY)$
• $\vec X$: $K_X=E(X-EX)(X-EX)^T=[\text{cov}(X_i;X_j)]$
• correlation matrix: $\widetilde K_X=EXX^T=[EX_iX_j]$
• symmetric and positive semidifinite
• $K_X=\widetilde K_X-(EX)(EX^T)$
• $Y=AX$
• $K_Y=AK_XA^T$
• $\widetilde K_Y=A\widetilde K_XA^T$

Multivariate Normal Distribution

$f(x)=\frac{1}{(2\pi)^{\frac{n}{2}}}\exp(-\frac{1}{2}(x-\mu)^TK^{-1}(x-\mu))$

• uncorrelated then independent
• $h(X_1,X_2,\cdots,X_n)=h(\mathcal{N}(\mu, K))=\frac{1}{2}\log(2\pi e)^n|K|$
• the mutual information between $X$ and $Y$ is $I(X;Y)=\sup_{P,Q}I( _P;[Y]_Q)$ over all finite partitions $P$ and $Q$
• Correlatetd Gaussian $(X,Y)\sim\mathcal{N}(0,K)$ $$K=\begin{bmatrix}\sigma^2 & \rho\sigma^2\\rho \sigma^2 & \sigma^2\end{bmatrix}$$

Maximum Entropy

• $X\in R$ have mean $\mu$ and variance $\sigma^2$, then $h(X)\leq\frac{1}{2}\log 2\pi e\sigma^2$ with equality iff $X\sim\mathcal{N}(\mu, \sigma^2)$
• $X\in R$ that $EX^2\leq \sigma^2$, then $h(X)\leq\frac{1}{2}\log 2\pi e\sigma^2$
• Problem: find density $f$ over $S$ meeting moment constraints $\alpha_1,\cdots,\alpha_m$
• $f(x)\geq 0$
• $\int_S f(x)dx=1$
• $\int_S f(x)r_i(x)dx=\alpha_i$
• Maximum entropy distribution: $f^*(x)=f_\lambda(x)=e^{\lambda_0+\sum_{i=1}^m\lambda_ir_i(x)}$
• $S=[a,b]$ with no other constraints: uniform distributioni over this range
• $S=[0,\infty), EX=\mu$, then $f(x)=\frac{1}{\mu}e^{-\frac{x}{\mu}}$
• $S=(-\infty, \infty), EX=\alpha_1,EX^2=\alpha_2$, then $\mathcal{N}(\alpha_1,\alpha_2-\alpha_1^2)$

Inequality

• $K$ is a nonnegative definite symmetric $n\times n$ matrix
• (Hadamard) $|K|\leq\prod K_{ii}$ with equality iff $K_{ij}=0,i\neq j$

Balanced Information Inequailty

• $h(X,Y)\leq h(X)+h(Y)$
• neither $h(X,Y)\geq h(X)$ nor $h(X,Y)\leq h(X)$
• $[n]={1,2,\cdots,n}$, for $\alpha\subset[n]$, $X_\alpha=(X_i:i\in\alpha)$
• linear continous inequality $\sum_\alpha w_\alpha h(X_\alpha)\geq 0$ is valid iff its corresponding discrete counterpart $\sum_\alpha w_\alpha H(X_\alpha)\geq 0$ is valid and balanced

Han’s Inequality

• $h_k^{(n)}=\frac{1}{\binom{n}{k}}\sum_{S:|S|=k}\frac{h(X(S))}{k}$
• $g_k^{(n)}=\frac{1}{\binom{n}{k}}\sum_{S:|S|=k}\frac{h(X(S))|X)(S^c)}{k}$
• Han’s Inequality: $h_1^{(n)}\geq h_2^{(n)}\geq\cdots\geq h_n^{(n)}=H(X_1,\cdots,X_n)/n=g_n^{(n)}\geq\cdots\geq g_2^{(n)}\geq g_1^{(n)}$

Information of Heat

• Heat equation (Fourier, 热传导方程): $x$ is position and $t$ is time, $\frac{\partial}{\partial t}f(x, t)=\frac{1}{2}\frac{\partial^2}{\partial x^2}f(x,t)$
• $Y_t=X+\sqrt{t}Z,Z\sim\mathcal{N}(0,1)$, then $f(y;t)=\int f(x)\frac{1}{\sqrt{2\pi t}}e^{-\frac{(y-x)^2}{2t}}dx$
• Gaussian channel – Heat Equaition
• Fisher Information: $I(X)=\int_{-\infty}^{+\infty}f(x)[\frac{\frac{\partial}{\partial x}f(x)}{f(x)}]^2dx$
• De Bruijn’s Identity: $\frac{\partial}{\partial t}h(Y_t)=\frac{1}{2}I(Y_t)$

Entropy power inequality

• EPI (Entropy power inequality) $e^{\frac{2}{n}h(X+Y)}\geq e^{\frac{2}{n}h(X)}+e^{\frac{2}{n}h(Y)}$ “最为强悍的工具”
• Uncertainty principle
• Young’s inequality
• Nash’s inequality
• Cramer-Rao bound: $V(\hat \theta)\geq\frac{1}{I(\theta)}$
• FII (Fisher information inequality) $\frac{1}{I(X+Y)}\geq\frac{1}{I(X)}+\frac{1}{I(Y)}$