STATS 606: Computation and Optimization Methods in Statistics
University of Michigan
including slides from Stanford's EE364b
parameterized (convex) optimization problem
$$ \begin{aligned} &\min\nolimits_x &&f_0(x) \\ &\subjectto &&\{f_i(x,u) \leq 0\}_{i=1}^m \end{aligned} \tag{OP} $$The parameters $u$ are often data-dependent (and thus uncertain).
Let $\cU$ be an uncertainty set of plausible values of $u$. Consider the robust form of (OP):
$$ \begin{aligned} &\min\nolimits_x &&f_0(x) \\ &\subjectto &&\{f_i(x,u) \leq 0\text{ for all }u\in\cU\}_{i=1}^m \end{aligned} \tag{ROP} $$(ROP) is equivalent to
$$ \begin{aligned} &\min\nolimits_x &&f_0(x) \\ &\subjectto &&\{\sup\nolimits_{u\in\cU}f_i(x,u) \leq 0\}_{i=1}^m \end{aligned} $$As long as $x\to f_i(x,u)$ is convex for all $u\in\cU$, the robust constraint functions (and thus the robust constraints) are also convex.
Q's:
How should we pick $\cU$?
Can we solve (ROP) (efficiently)?
Is the robust formulation useful?
Consider $u$ as a random variable, and choose $\cU$ to be a $1-\alpha$ confidence set for $u$:
$$ \Pr\{U\in\cU\} \ge 1-\alpha. $$Enforcing the robust constraint $\sup_{u\in\cU}f_i(x,u)\le 0$ implies
$$ \begin{aligned} &\Pr\{f_i(x,U)> 0\} \\ &= \cancel{\Pr\{f_i(x,U)> 0, U\in\cU\}} + \Pr\{f_i(x,U)> 0, U\notin\cU\} \\ &\le\Pr\{U\notin\cU\}\le \alpha. \end{aligned} $$There are many possible confidence sets, this sometimes leads to intractable robust optimization problems.
where $a_i$ are independent $N(\bar{a},\Sigma)$ random vectors and $\eps\in(0,1)$ is a failure probability
Marginally, $a_i^\top x\sim N(\bar{a}^\top x,x^\top\Sigma x)$, so
$$ \begin{aligned} &\{x\mid\Pr\{a_i^\top x > b_i\}\le \eps\} \\ &\quad= \{x\mid\bar{a}_i^\top x - b_i - \Phi(\eps)^{-1}(x^\top\Sigma x)^{\frac12} \le 0\}. \end{aligned} $$For $\eps = 0.5$, the robust constraints remain (conveniently) linear, but
for $\eps < 0.5$, the constraints are quadratic, but they remain convex
for $\eps < 0.5$, the constraints are non-convex quadratic constraints
Unfortunately not. Consider the quadratic constraint
$$ \|Ax + Bu\|_2 \le 1\text{ for all }\|u\|_\infty\le 1. $$The robust constraint is
$$\textstyle \max_{\|u\|_\infty\le 1}\|Ax + Bu\|_2 \le 1 $$includes a convex quadratic maximization problem.
Maximizing convex quadratics on the $\ell_\infty$-ball (even approximately) is computationally intractable (ie NP-hard).
Q: When is a robust LP still an LP (a robust SOCP an still SOCP, a robust SDP still an SDP etc)?
$$ \begin{aligned} &\min\nolimits_x &&c^\top x \\ &\subjectto &&(A + U)x \preceq b\text{ for all }U\in\cU \end{aligned} $$Consider one (robust) inequality constraint:
$$ (a + u)^T x \le b \text{ for all } u \in \mathcal{U}. $$Ex: If $\mathcal{U} = \{u\in\reals^n\mid\|u\|_\infty \le\delta\}$, then the constraint is equivalently
$$ a^T x + \delta \|x\|_1 \le b. $$This is a semi-infinite (set of) constraint(s). Let's reformulate it in a tractable way with duality.
The robust constraint is
$$\textstyle b\ge a^\top x + \max_{u\in\cU}u^\top x\equiv a^\top x + \left\{\begin{aligned} &\max\nolimits_u &&u^\top x \\ &\subjectto &&Fu + g \succeq 0 \end{aligned}\right\}. $$The Lagrangian of the maximization in the robust constraint is
$$ \begin{aligned} L(u,\lambda) &= x^\top u + \lambda^\top(Fu + g),\quad\lambda\succeq 0 \\ &= (x + F^\top\lambda)^\top u + \lambda^\top g. \end{aligned} $$We maximize with respect to $u$ to obtain the dual function
$$ \begin{aligned} g(\lambda) &\triangleq \max\nolimits_u(x + F^\top\lambda)^\top u + \lambda^\top g,\quad\lambda\succeq 0\\ &= \begin{cases} \infty & \text{if } F^\top\lambda + x \neq 0 \\ \lambda^\top g & \text{if } F^\top\lambda + x = 0 \end{cases}. \end{aligned} $$
The dual function is a sharp upper bound of the primal value, so the robust constraint is equivalent to
$$ \begin{aligned} &\{x\in\reals^n\mid a^\top x + \max\nolimits_{u\in\cU}u^\top x\le b\} \\ &\quad\equiv \{x\in\reals^n\mid a^\top x + \min\nolimits_\lambda g(\lambda)\le b\},\\ &\quad\equiv\{x\in\reals^n\mid\left.\text{there is $\lambda\succeq 0$ such that }\begin{aligned} a^\top x + \lambda^\top g \le b,\\ F^\top\lambda + x = 0.\\ \end{aligned}\right\} \end{aligned} $$The robust constraint is
$$ \begin{aligned} b &\ge\max\nolimits_{u\in\cU}(a + Pu)^\top x \\ &= a^\top x + \max\nolimits_{u\in\cU}u^\top Px \\ &= a^\top x + \|Px\|_* &\text{(dual norm def)}\\ \end{aligned} $$Similar to the polyhedral uncertainty case, the robust constraint is equivalent to the constraints
$$ \begin{aligned} a^\top x + \lambda^\top g \le b,\\ F^\top\lambda + x = 0,\\ \lambda {\color{red}\succeq_{K^*}} 0. \end{aligned} $$We defer the details as an exercise.
This is equivalent to two robust constraints:
$$ \begin{aligned} \|(A+\Delta)x + b\|_2 \le t\text{ for all }\|\Delta\|_\infty\le\delta, \\ t\le (c + u)^\top x + d\text{ for all }u\in\cU. \end{aligned} $$The 2nd (robust) constraint is an uncertain LP constraint (which we saw how to handle).
The 1st constraint is equivalent to
$$ \begin{aligned} \|z\|_2 \le t, \\ \{|A_i^\top x + b_i| + \delta\|x\|_1 \le z_i\}_{i=1}^m. \end{aligned} $$Pf: The 1st (robust) constraint is equivalent to
$$ \max\nolimits_{\|\Delta\|_\infty\le\delta}\|(A+\Delta)x + b\|_2 \le t. $$We simplify the constraint function to obtain
$$ \begin{aligned} &\max\nolimits_{\|\Delta\|_\infty\le\delta}\|(A+\Delta)x + b\|_2 \\ &\quad=\textstyle \max\nolimits_{\|\Delta\|_\infty\le\delta}(\sum_{i=1}^m[(A_i+\Delta_i)^\top x + b_i]^2)^{\frac12} \\ &\quad=\max\nolimits_{\|\Delta\|_\infty\le\delta}\{\|z\|_2\mid z_i = (A_i + \Delta_i)^\top x+ b_i,i\in[m]\} \\ &\quad=\min\{\|z\|_2\mid z_i \ge |A_i^\top x + b_i| + \delta\|x\|_1\}. \end{aligned} $$Thus the 1st constraint is equivalent to
$$\min\{\|z\|_2\mid z_i \ge |A_i^\top x + b_i| + \delta\|x\|_1\} \le t.$$We simplify the constraint function to obtain
$$ \begin{aligned} &\textstyle\max\nolimits_{\|u\|\le 1}(\sum_{i=1}^m[(a_i+P_iu)^\top x + b_i]^2)^{\frac12} \\ &\quad=\max\nolimits_{\|u\|\le 1}\{\|z\|_2\mid z_i = (a_i + P_iu)^\top x+ b_i,i\in[m]\} \\ &\quad=\min\{\|z\|_2\mid z_i \ge |a_i^\top x + b_i| + \|P_ix\|_*\}. \end{aligned} $$Thus the (robust) constraint is equivalent to
$$ \begin{aligned} \|z\|_2 \le t, \\ \{|A_i^\top x + b_i| + \|P_ix\|_* \le z_i\}_{i=1}^m. \end{aligned} $$where $\|\Delta\|_\sp$ is the spectral norm.
The LMI representation of the robust constraint is
$$ \begin{bmatrix}t & ((A+P\Delta)x + b)^\top \\ (A+P\Delta)x + b& tI_n\end{bmatrix}\succeq 0\text{ for all }\|\Delta\|_2\le 1. $$Equivalently,
$$ \begin{aligned} 0 &\le \begin{bmatrix}s \\ y\end{bmatrix}^\top\begin{bmatrix}t & ((A+P\Delta)x + b)^\top \\ (A+P\Delta)x + b& tI_m \end{bmatrix}\begin{bmatrix}s \\ y\end{bmatrix} \\ &= s^2t + 2sy^\top((A+P\Delta)x + b) + t\|y\|_2^2 \end{aligned} $$ for all $(s,y)\in\reals^{m+1}$ and $\|\Delta\|_\sp\le 1$.We minimizing the right side with respect to $\Delta$ to obtain
$$ 0\le s^2t + 2sy^\top(Ax + b) + t\|y\|_2^2 - 2\|sx\|_2\|P^\top y\|_2. $$for all $(s,y)\in\reals^{m+1}$.
The preceding constraints are equivalent to
$$ 0\le s^2t + 2sy^\top(Ax + b) + t\|y\|_2^2 + sx^\top z $$for all $(s,y,z)\in\reals^{m+n+1}$ and $\|z\|_2\le\|P^\top y\|_2$ because
$$\textstyle \|sx\|_2\|P^\top y\|_2 = \max_{\|z\|_2\le\|P^\top y\|_2}sx^\top z. $$We restate the preceding constraints as $\|z\|_2\le\|P^\top y\|_2$ implies
$$\textstyle 0\le s^2t + 2s(y^\top(Ax + b) + x^\top z) + t\|y\|_2^2 $$for all $(s,y,z)\in\reals^{m+n+1}$. Equivalently,
$$ \begin{aligned} \begin{bmatrix}s \\ y \\ z\end{bmatrix}^\top\begin{bmatrix}0 & 0_m^\top & 0_n^\top \\ 0_m & PP^\top & 0_{m\times n} \\ 0_n & 0_{n\times m} & -I_n\end{bmatrix}\begin{bmatrix}s \\ y \\ z\end{bmatrix}\ge 0\text{ implies } \\ \begin{bmatrix}s \\ y \\ z\end{bmatrix}^\top\begin{bmatrix}t & (Ax + b)^\top & z^\top \\ Ax + b & tI_m & 0 \\ z & 0 & 0\end{bmatrix}\begin{bmatrix}s \\ y \\ z\end{bmatrix}\ge 0. \end{aligned} $$$S$-lemma: $x^\top Ax\ge 0$ implies $x^\top Bx\ge 0$ for all $x$ iff $B - \lambda A \succeq 0$ for some $\lambda\ge 0$.
The $S$-lemma allows us to restate the robust constraint as
$$ \begin{bmatrix}t & (Ax + b)^\top & z^\top \\ Ax + b & tI_m - \lambda PP^\top & 0 \\ z & 0 & 0 \lambda I_n\end{bmatrix}\succeq 0. $$Safe approximation: Consider $u$ as a random variable, and choose $\cU$ to be a $1-\alpha$ confidence set for $u$:
$$ \Pr\{U\in\cU\} \ge 1-\alpha. $$Chance constraints: (directly) work with $\Pr\{f_i(x,U) > 0\}$ (instead of $\Pr\{U\in\cU\}$)
convex in some cases
common $\alpha$ values: 0.1, 0.05, 0.01
smaller $\eta$ values (e.g 0.001) are meaningless because the tails of the distribution of $\omega$ are generally unknown
VaR of a random scalar $z$ at level $\eta$:
$$ \begin{aligned} \VaR(z;\alpha) &\triangleq\inf\{\gamma\mid\Pr\{z \le \gamma\} \ge 1-\alpha\} \\ &=\inf\{\gamma\mid\Pr\{z> \gamma\} \le \alpha\} \end{aligned} $$$1-\alpha$-quantile of $z$
$\VaR(z;\alpha)$ is the worst possible outcome excluding the worst outcomes with total probability at most $\alpha$.
chance constraints are VaR constraints:
$$ \Pr\{f_i(x,\omega) \le 0\} \ge 1-\alpha\equiv\VaR(f_i(x,\omega);\alpha) \le 0 $$