• (continuous) $X$ with cumulative distribution function $F(x)=Pr(X\leq x)$
  • support set of $X$: $f(x)>0$
  • differential entropy $h(x)$: $h(X)=-\int_Sf(x)\log f(x)dx$
    • $h(X+c) = h(X)$
    • $h(aX)=h(X)+\log|a|$
    • $h(AX)=h(X)+\log|\det A|$
    • $h(X)$ may be negative ($f(x)$ may $>1$)
  • uniform: $h(X)=\log a$
  • Gaussian: $h(X)=\frac{1}{2}\log 2\pi e\sigma^2$
  • $h(X)$: Infinite Information
    • does not serve as a measure of the average amount of information
  • $h(X_1,X_2,\cdots,X_n)=-\int f(x^n)\log f(x^n)dx^n$
  • $h(X|Y)=-\int f(x,y)\log f(x|y)dxdy)$
  • Relative Entropy: $D(f|g)=\int f\log\frac{f}{g}\geq 0$
  • mutual information: $I(X;Y)=\int f(x,y)\log\frac{f(x,y)}{f(x)f(y)}dxdy\geq 0$

Relation to discrete

  • $X^\Delta=x_i$ if $i\Delta\leq x<(i+1)\Delta$
  • $p_i=Pr(X^{Delta}=x_i)=f(x_i)\Delta$
  • $H(X^{\Delta})=-\sum\Delta f(x_i)\log f(x_i)-\log \Delta$
  • as $\Delta\rightarrow 0,H(X^\Delta)+\log \Delta\rightarrow h(f)=h(X)$


  • $-\frac{1}{n}\log f(X_1,X_2,\cdots,X_n)\rightarrow E(-\log f(X))=h(f)$
  • $A_\epsilon^{(n)}={(x_1,x_2,\cdots,x_n)\in S^n:|-\frac{1}{n}\log f(x_1,\cdots, x_n)-h(X)|\leq\epsilon}$
  • $\text{Vol}(A)=\int_Adx_1dx_2\cdots dx_n$
  • Properties
    • $Pr(A_\epsilon^{(n)})>1-\epsilon$ for $n$ sufficiently large
    • $\text{Vol}(A_\epsilon^{(n)})\leq 2^{n(h(X)+\epsilon)}$
    • $\text{Vol}(A_\epsilon^{(n)})\geq (1-\epsilon)2^{n(h(X)-\epsilon)}$

Covariance Matrix

  • cov($X$, $Y$)=$E(X-EX)(Y-EY)=E(XY)-(EX)(EY)$
  • $\vec X$: $K_X=E(X-EX)(X-EX)^T=[\text{cov}(X_i;X_j)]$
  • correlation matrix: $\widetilde K_X=EXX^T=[EX_iX_j]$
    • symmetric and positive semidifinite
  • $K_X=\widetilde K_X-(EX)(EX^T)$
  • $Y=AX$
    • $K_Y=AK_XA^T$
    • $\widetilde K_Y=A\widetilde K_XA^T$

Multivariate Normal Distribution


  • uncorrelated then independent
  • $h(X_1,X_2,\cdots,X_n)=h(\mathcal{N}(\mu, K))=\frac{1}{2}\log(2\pi e)^n|K|$
  • the mutual information between $X$ and $Y$ is $I(X;Y)=\sup_{P,Q}I( _P;[Y]_Q)$ over all finite partitions $P$ and $Q$
  • Correlatetd Gaussian $(X,Y)\sim\mathcal{N}(0,K)$ $$K=\begin{bmatrix}\sigma^2 & \rho\sigma^2\\rho \sigma^2 & \sigma^2\end{bmatrix}$$

Maximum Entropy

  • $X\in R$ have mean $\mu$ and variance $\sigma^2$, then $h(X)\leq\frac{1}{2}\log 2\pi e\sigma^2$ with equality iff $X\sim\mathcal{N}(\mu, \sigma^2)$
  • $X\in R$ that $EX^2\leq \sigma^2$, then $h(X)\leq\frac{1}{2}\log 2\pi e\sigma^2$
  • Problem: find density $f$ over $S$ meeting moment constraints $\alpha_1,\cdots,\alpha_m$
    • $f(x)\geq 0$
    • $\int_S f(x)dx=1$
    • $\int_S f(x)r_i(x)dx=\alpha_i$
  • Maximum entropy distribution: $f^*(x)=f_\lambda(x)=e^{\lambda_0+\sum_{i=1}^m\lambda_ir_i(x)}$
    • $S=[a,b]$ with no other constraints: uniform distributioni over this range
    • $S=[0,\infty), EX=\mu$, then $f(x)=\frac{1}{\mu}e^{-\frac{x}{\mu}}$
    • $S=(-\infty, \infty), EX=\alpha_1,EX^2=\alpha_2$, then $\mathcal{N}(\alpha_1,\alpha_2-\alpha_1^2)$


Hadamard’s Inequality

  • $K$ is a nonnegative definite symmetric $n\times n$ matrix
  • (Hadamard) $|K|\leq\prod K_{ii}$ with equality iff $K_{ij}=0,i\neq j$

Balanced Information Inequailty

  • $h(X,Y)\leq h(X)+h(Y)$
  • neither $h(X,Y)\geq h(X)$ nor $h(X,Y)\leq h(X)$
  • $[n]={1,2,\cdots,n}$, for $\alpha\subset[n]$, $X_\alpha=(X_i:i\in\alpha)$
  • linear continous inequality $\sum_\alpha w_\alpha h(X_\alpha)\geq 0$ is valid iff its corresponding discrete counterpart $\sum_\alpha w_\alpha H(X_\alpha)\geq 0$ is valid and balanced

Han’s Inequality

  • $h_k^{(n)}=\frac{1}{\binom{n}{k}}\sum_{S:|S|=k}\frac{h(X(S))}{k}$
  • $g_k^{(n)}=\frac{1}{\binom{n}{k}}\sum_{S:|S|=k}\frac{h(X(S))|X)(S^c)}{k}$
  • Han’s Inequality: $h_1^{(n)}\geq h_2^{(n)}\geq\cdots\geq h_n^{(n)}=H(X_1,\cdots,X_n)/n=g_n^{(n)}\geq\cdots\geq g_2^{(n)}\geq g_1^{(n)}$

Information of Heat

  • Heat equation (Fourier, 热传导方程): $x$ is position and $t$ is time, $\frac{\partial}{\partial t}f(x, t)=\frac{1}{2}\frac{\partial^2}{\partial x^2}f(x,t)$
  • $Y_t=X+\sqrt{t}Z,Z\sim\mathcal{N}(0,1)$, then $f(y;t)=\int f(x)\frac{1}{\sqrt{2\pi t}}e^{-\frac{(y-x)^2}{2t}}dx$
  • Gaussian channel – Heat Equaition
  • Fisher Information: $I(X)=\int_{-\infty}^{+\infty}f(x)[\frac{\frac{\partial}{\partial x}f(x)}{f(x)}]^2dx$
  • De Bruijn’s Identity: $\frac{\partial}{\partial t}h(Y_t)=\frac{1}{2}I(Y_t)$

Entropy power inequality

  • EPI (Entropy power inequality) $e^{\frac{2}{n}h(X+Y)}\geq e^{\frac{2}{n}h(X)}+e^{\frac{2}{n}h(Y)}$ “最为强悍的工具”
    • Uncertainty principle
    • Young’s inequality
    • Nash’s inequality
    • Cramer-Rao bound: $V(\hat \theta)\geq\frac{1}{I(\theta)}$
  • FII (Fisher information inequality) $\frac{1}{I(X+Y)}\geq\frac{1}{I(X)}+\frac{1}{I(Y)}$