STATS 606

Smoothing proofs

This post accompanies the lecture video on smoothing. Please see the video for the problem setup and definitions.

Properties of the Moreau envelope

First, we show that the gradient of the Moreau envelope is

\[\partial\underline{f}_\mu(x) = \frac{1}{\mu}(x - \prox_{\mu f}(x)).\]

We have

\[\begin{aligned} f_\mu(x) &= \frac{1}{2\mu}\|x\|_2^2 - \frac{1}{\mu}\sup_y\{\langle x,y\rangle - \mu f(y) - \frac12\|y\|_2^2\} \\ &= \frac{1}{2\mu}\|x\|_2^2 - \frac{1}{\mu}(\mu f + \frac12\|\cdot\|_2^2)^*(x) \end{aligned}\]

Recall \(\partial f^*(y) = \argmax_x\{\langle x,y\rangle - f(x)\}\), we differentiate the conjugate representation of the Moreau envelope to obtain

\[\begin{aligned} \partial\underline{f}_\mu(x) &= \frac{x}{\mu} - \frac{1}{\mu}\argmax_y\{\langle x,y\rangle - \mu f(y) - \frac12\|y\|_2^2\} \\ &= \frac{x}{\mu} - \frac{1}{\mu}\argmax_y\{- \mu f(y) - \frac12\|x-y\|_2^2\} \\ &= \frac{1}{\mu}(x - \prox_{\mu f}(x)), \end{aligned}\]

where we recalled the definition of \(\prox_{\mu f}\) in the third step. Second, we show that \(\underline{f}_\mu\) is \(\frac{1}{\mu}\)-strongly smooth. This is a consequence of

  1. \(\underline{f}_\mu = (f^* + \frac{\mu}{2}\|\cdot\|_2^2)^*\);
  2. the conjugate of a closed, proper, \(\mu\)-strongly convex function is \(\frac{1}{\mu}\)-strongly smooth.

The first fact follows from (i) the conjugate of a sum is the infimal convolution of the conjugates

\[(f_1 + f_2)^*(y) = \inf\{f_1^*(v) + f_2^*(y-v)\}\]

and (ii) the conjugate of the \(\frac{1}{2\mu}\|\cdot\|_2^2\) is \(\frac{\mu}{2}\|\cdot\|_2^2\). It remains to show the conjugate of a \(\mu\)-strongly convex function is \(\frac{1}{\mu}\)-strongly smooth. Recall

\[\partial f^*(y) \in \argmax_x\{\langle x,y\rangle - f(x)\}\]

Thus \(\partial f^*(y)\) satisfies \(y\in\partial f(\partial f^*(y))\). Also recall the \(\mu\)-strong convexity of \(f\) implies its gradient is \(\mu\)-strongly monotone:

\[\langle\partial f(x_1) - \partial f(x_2),x_1-x_2\rangle \ge \mu\|x_1 - x_2\|_2^2.\]

Letting \(x_1 = \partial f^*(x_1)\) and \(x_2 = \partial f^*(x_2)\), we have

\[\langle y_1 - y_2,\partial f^*(y_1) - \partial f^*(y_2)\rangle \ge \mu\|\partial f^*(y_1) - \partial f^*(y_2)\|_2^2.\]

We rearrange and appeal to the Cauchy-Schwarz inequality to arrive at the stated result.

Posted on March 07, 2021 from San Francisco, CA.