The matrix Laplace transform method

The Laplace transform technique for scalar concentration

The Laplace transform technique is the standard proof technique for showing concentration inequalities. First, let’s see see it in action by using it to show concentration inequalities for (independent) sums of random scalars (eg Hoeffding’s inequality). Let

\[\textstyle S_n\triangleq\sum_{i=1}^nX_i,\]

where $X_i$’s are independent random scalars. The Laplace transform method bounds the tail probability of $S_n$ by

\[\begin{aligned} \Pr\{S_n\ge s\} &= \Pr\{e^{tS_n}\ge e^{st}\} &\text{($t>0$)}\\ &\le e^{-st}\Ex\big[e^{tS_n}\big] &\text{(Markov's inequality)} \\ &=\textstyle e^{-st}\Ex\big[\prod_{i=1}^ne^{tX_i}\big] \\ &= e^{-st}\prod_{i=1}^n\Ex\big[e^{tX_i}\big] &\text{($X_i$'s are independent)} \\ \end{aligned}\]

for any $t > 0$. Such tail probability bounds are called Chernoff bounds, and they are precursors to more familiar concentration inequalities. For example, if $X_i\sim N(0,1)$, then its MGF is $e^{t^2/2}$. We plug the MGF into the Chernoff bound to obtain

\[\Pr\{S_n\ge s\}\le e^{-st}e^{\frac12nt^2} \le e^{\frac12nt^2 - st}.\]

The right side is minimized at $t = \frac{s}{n}$, which implies the tail (probability) bound

\[\Pr\{S_n\ge s\}\le e^{-\frac{s^2}{2n}}.\]

This is the standard tail bound for independent sums of $N(0,1)$ random scalars; it shows that $S_n$ is $O(n^{\frac12})$ with high probability:

\[\textstyle \Pr\{S_n\ge(2n\log\frac1\delta)^{\frac12}\} \le \delta.\]

Lieb’s concavity and the Laplace transform technique for matrix concentration

We wish to generalize the Laplace transform method to prove concentration inequalities for (independent) sums of random matrices. As we shall see, the main issue with a naive application of the Laplace transform method is that the MGF of $S_n$ is not simple product of the MGFs of the $X_i$’s because

\[e^{A + B}\ne e^{A}e^{B}\text{ (unless $A$ and $B$ commute)}.\]

For matrices, we study tail (probability) bounds on $\lambda_{\max}(S_n)$, where $S_n$ is defined exactly as above except the $X_i$’s are random matrices instead of scalars. Similar to the Chernoff bound proof, we have

\[\begin{aligned} \Pr\{\lambda_{\max}(S_n)\ge s\} &= \Pr\{e^{t\lambda_{\max}(S_n)}\ge e^{st}\} &\text{($t>0$)}\\ &\le e^{-st}\Ex\big[e^{t\lambda_{\max}(S_n)}\big] &\text{(Markov's inequality)} \\ &=\textstyle e^{-st}\Ex\big[\max_{j\in[d]}e^{t\lambda_j(S_n)}\big] &\text{($\lambda\mapsto e^{t\lambda}$ is monotone)}\\ &= e^{-st}\Ex\big[\lambda_{\max}(e^{tS_n})\big] &\text{(matrix exponential def)}\\ &\le\textstyle e^{-st}\Ex\big[\sum_{j=1}^d\lambda_j(e^{tS_n})\big] \\ &\le e^{-st}\Ex\big[\tr(e^{tS_n})\big] &\textstyle\text{($\tr(A) = \sum_{j=1}^d\lambda_j(A)$ on $\symm_+^d$)}. \end{aligned} \tag{Chr}\]

The matrix exponential that appears in the preceding calculation is the same matrix exponential that provides solutions to linear ODEs. At this point, we wish to continue following the steps in Chernoff bound proof by claiming

\[\begin{aligned}\textstyle e^{tS_n} = \prod_{i=1}^ne^{tX_i}\text{ or,}\\\textstyle \tr(e^{tS_n}) \le \tr(\prod_{i=1}^ne^{tX_i}). \end{aligned}\]

Unfortunately, neither claim is generally true. A special case of the claimed trace inequality is true—the $n=2$ case is a result from statistical physics called the Golden-Thompson inequality—but it is not true for $n > 2$. There is a proof technique first developed by Ahlswede and Winter and later refined by Olivera that carefully applies the Golden-Thompson inequality repeatedly, but this technique seems to be less general than the technique based on we present below.

Recently, Tropp developed a new technique to extend the Laplace transform method to matrix concentration based on Lieb’s concavity theorem.

Lieb’s concavity theorem. The map $A\mapsto\tr e^{\log A + B}$ is concave on $\symm_+^d$ for any fixed $B\in\symm_+^d$.

If $A$ and $B$ are scalars, then $a\mapsto e^{\log a + b}$ is linear in $a$; Lieb’s concavity theorem generalizes this simple fact in the scalar case to the matrix case. It was proved by Elliott Lieb on the way to proving a basic result in quantum information theory (the von Neumann entropy is strongly subadditive).

Armed with Lieb’s concavity theorem, we can now bound $\Ex\big[\tr(e^{tS_n})\big]$ with a careful conditioning argument:

\[\begin{aligned} \Ex\big[\tr(e^{tS_n})\big] &= \Ex\big[\Ex\big[\tr(e^{tS_{n-1} + \log e^{tX_n}})\mid S_{n-1}\big]\big]\\ &\le \Ex\big[\tr(e^{tS_{n-1}} + \log\Ex\big[e^{tX_n}\mid S_{n-1}\big])\big] &\text{(Lieb's concavity thm)} \\ &= \Ex\big[\tr(e^{tS_{n-1}} + \log\Ex\big[e^{tX_n}\big])\big] &\text{($S_{n-1}$ & $X_n$ are independent)} \\ &\;\;\vdots \\ &\le\textstyle\tr e^{\sum_{i=1}^n\log\Ex\big[e^{tX_i}\big]}, \end{aligned}\]

where we iterated the first three steps $n-1$ times, “peeling off” the $X_i$’s from the sum one-by-one. We combine the preceding bound on $\Ex\big[\tr(e^{tS_n})\big]$ with (Chr) to obtain a matrix Chernoff bound:

\[\textstyle \Pr\{\lambda_{\max}(S_n)\ge s\} \le e^{-st}\tr e^{\sum_{i=1}^n\log\Ex\big[e^{tX_i}\big]}.\]

To obtain matrix versions of familiar (scalar) concentration inequalities, we combine the preceding matrix Chernoff bound with bounds on the MGFs of the $X_i$’s. We refer to Tropp’s monograph for a comprehensive account of matrix concentration inequalities from the (matrix) Chernoff bound.

Posted on January 22, 2025 from Ann Arbor, MI.