STATS 606: Computation and Optimization Methods in Statistics
University of Michigan
including slides by Stephen Boyd and Lieven Vandenberghe
Given
find a transport map $\pi:\cX\times\cY\to\reals$ ($\pi(x,y)$ is the mass transported from $x$ to $y$) that
transports $P$ to $Q$:
$$\textstyle\int_{\cY}\pi(x,y)dxdy = p(x),\quad\int_{\cX}\pi(x,y)dxdy = q(x)$$minimizes the total transport cost $\int_{\cX\times\cY}c(x,y)\pi(x,y)dxdy$
If $\cX$ and $\cY$ are finite sets, then the optimal transport problem is an LP:
$$ \begin{aligned} &\min\nolimits_{\Pi\in\reals_+^{m\times n}} &&\Tr(C^\top\Pi) \\ &\subjectto &&\Pi1_n = p \\ & &&\Pi^\top 1_m = q \end{aligned} \tag{OT} $$Given $\cD\triangleq\{(X_i,Y_i)\}_{i=1}^n\subset\reals^p\times\{-1,1\}$, find the max margin (linear) classifier.
To (strictly) separate $$ \begin{aligned} \cD_1\triangleq\{X_i\mid(X_i,Y_i)\in\cD,Y_i = 1\},\\ \cD_0\triangleq\{X_i\mid(X_i,Y_i)\in\cD,Y_i = -1\} \end{aligned} $$
with a hyperplane, we require
$$\{Y_i(\beta_0 + \beta^\top X_i) > 0\}_{i=1}^n.$$Since scaling $\beta_0$, $\beta$ does not change the hyperplane, the preceding constraints are equivalent to
$$\{Y_i(\beta_0 + \beta^\top X_i) \ge 1\}_{i=1}^n.$$Fact: The (Euclidean) distance between the hyperplanes
$$ \begin{aligned} \cH_1\triangleq\{x\mid\beta_0 + \beta^\top x = 1\},\\ \cH_{-1}\triangleq\{x\mid\beta_0 + \beta^\top x = -1\} \end{aligned} $$is $\dist(\cH_1,\cH_{-1}) = \frac{2}{\|\beta\|_2}$.
Thus to separate $\cD_1$ and $\cD_{-1}$ with the maximum margin, we solve the (hard-margin) SVM problem:
$$ \begin{aligned} &\min\nolimits_{\beta_0\in\reals,\beta\in\reals^p} &&\textstyle\frac12\|\beta\|_2^2 \\ &\subjectto && \{Y_i(\beta_0 + \beta^\top X_i) \ge 1\}_{i=1}^n \end{aligned}. \tag{SVM} $$If $\cD_1$ and $\cD_{-1}$ are not linearly separable, then (SVM) is infeasible!
To restore feasibility, we add slack variables $\{\xi_i\}_{i=1}^n\subset\reals_+$ to (SVM):
$$ \begin{aligned} &\min\nolimits_{\beta_0\in\reals,\beta\in\reals^p,\xi\in\reals_+^n} &&\textstyle\frac12\|\beta\|_2^2 + C\sum_{i=1}^n\xi_i \\ &\subjectto && \{Y_i(\beta_0 + \beta^\top X_i) \ge 1 - \xi_i\}_{i=1}^n \end{aligned}. $$This is the soft-margin SVM problem. It is often written as a regularized (empirical) risk minimization problem
$$ \begin{aligned} \textstyle\min_{\beta_0\in\reals,\beta\in\reals^p}\frac12\|\beta\|_2^2 + C\sum_{i=1}^n\ell(Y_i(\beta_0 + \beta^\top X_i)), \\ \ell(z) \triangleq \max\{0,1-z\}. \end{aligned} $$Given $\{x_i\}_{i=1}^n$, find $k$-dim subspace that best approximates $\{x_i\}_{i=1}^n$:
$$ \begin{aligned} &\min\nolimits_{P\in\symm^n}&&\textstyle\frac12\|X - XP\|_F^2 = \frac12\sum_{i=1}^n\|x_i^\top - x_i^\top P\|_2^2\\ &\subjectto &&P\text{ is a projector} \\ & &&\rank(P) = k \end{aligned} $$This problem (despite its non-convexity) has a closed-form solution in terms of the singular value decomposition (SVD) of $X$:
$$P_* = V_kV_k^\top,$$where $X = U\Sigma V^\top$ is SVD of $X$ and $V_k$ is the principal submatrix of $V$.
The feasible set is the (non-convex) set
$$\{P\in\symm^n\mid \lambda_i(P)\in\{0,1\},\Tr(P) = k\}.$$We relax the PCA problem by replacing the feasible set with its convex hull:
$$ \begin{aligned} \cF_k &\triangleq \{P\in\symm^n\mid \lambda_i(P)\in{\color{red}[0,1]},\Tr(P) = k\} \\ &= \{P\in\symm^n\mid 0\preceq P\preceq I_P,\Tr(P) = k\}. \end{aligned} $$This set is called the $k$-th order Fanotope.
Remarkably, the relaxed problem has the same optimal solution as the original PCA problem!
Given a graph $\cG\triangleq\{\cV,\cE\}$, find symmetric (edge) weights $W_{i,j}\in[0,1]$ so that the weighted random walk $(X_t)_{t=1}^\infty$
$$ \Pr\{X_{t+1} = v_j\mid X_t = v_i\} = W_{i,j} $$mixes as quickly as possible.
The matrix $W\in[0,1]^{n\times n}$ ($n\triangleq|\cV|$) satisfies
$$ \begin{aligned} 1_n^\top W = 1_n^\top\text{ (it is stochastic)}, \\ W = W^\top\text{ (it is doubly stochastic)}, \\ W_{i,j} = 0\text{ whenever }(v_i,v_j)\notin\cE. \end{aligned} $$The mixing time of $X_t$ depends on the second largest eigenvalue modulus (SLEM) of $W$:
$$\mu(W) \triangleq \max\nolimits_{i = 2,\dots,n}|\lambda_i(W)|.$$Let $\pi_t\in\Delta^{n-1}$ be the distribution of $X_t$ (i.e. $\Pr\{X_t = v_i\} = [\pi_t]_i$), then $\pi_t$ satisfies the recursion
$$\pi_t^\top = \pi_{t-1}^\top W = \dots = \pi_0^\top W^t$$The smaller the SLEM, the faster the random walk mixes:
$$\textstyle\frac12\|\pi_T - \frac1n1_n\|_1 \le \frac12\sqrt{n}\mu(W)^T.$$Fastest mixing Markov chain (FMMC) problem [[Boyd et al](https://epubs.siam.org/doi/10.1137/S0036144503423264)]:
$$ \begin{aligned} &\min\nolimits_{W\in[0,1]^{n\times n}} &&\mu(W) \\ &\subjectto && W1_n = 1_n,\quad W = W^\top \\ & &&W_{i,j} = 0\text{ for any }(v_i,v_j)\notin\cE. \end{aligned} $$$\mu$ is a convex function because $\textstyle\mu(W) = \|W - \frac1n1_n1_n^\top\|_2$.
SDP form of FMMC problem:
$$ \begin{aligned} &\min\nolimits_{W\in[0,1]^{n\times n}} &&t \\ &\subjectto &&\textstyle-tI_n \preceq W - \frac1n1_n1_n^\top \preceq tI \\ & && W1_n = 1_n,\quad W = W^\top \\ & &&W_{i,j} = 0\text{ for any }(v_i,v_j)\notin\cE. \end{aligned} $$