Robust optimization

STATS 606: Computation and Optimization Methods in Statistics

University of Michigan

including slides from Stanford's EE364b

Robust optimization

parameterized (convex) optimization problem

$$ \begin{aligned} &\min\nolimits_x &&f_0(x) \\ &\subjectto &&\{f_i(x,u) \leq 0\}_{i=1}^m \end{aligned} \tag{OP} $$

The parameters $u$ are often data-dependent (and thus uncertain).

Let $\cU$ be an uncertainty set of plausible values of $u$. Consider the robust form of (OP):

$$ \begin{aligned} &\min\nolimits_x &&f_0(x) \\ &\subjectto &&\{f_i(x,u) \leq 0\text{ for all }u\in\cU\}_{i=1}^m \end{aligned} \tag{ROP} $$

Robust optimization

(ROP) is equivalent to

$$ \begin{aligned} &\min\nolimits_x &&f_0(x) \\ &\subjectto &&\{\sup\nolimits_{u\in\cU}f_i(x,u) \leq 0\}_{i=1}^m \end{aligned} $$

As long as $x\to f_i(x,u)$ is convex for all $u\in\cU$, the robust constraint functions (and thus the robust constraints) are also convex.

Q's:

  1. How should we pick $\cU$?

  2. Can we solve (ROP) (efficiently)?

  3. Is the robust formulation useful?

How to choose uncertainty sets?

Consider $u$ as a random variable, and choose $\cU$ to be a $1-\alpha$ confidence set for $u$:

$$ \Pr\{U\in\cU\} \ge 1-\alpha. $$

Enforcing the robust constraint $\sup_{u\in\cU}f_i(x,u)\le 0$ implies

$$ \begin{aligned} &\Pr\{f_i(x,U)> 0\} \\ &= \cancel{\Pr\{f_i(x,U)> 0, U\in\cU\}} + \Pr\{f_i(x,U)> 0, U\notin\cU\} \\ &\le\Pr\{U\notin\cU\}\le \alpha. \end{aligned} $$

There are many possible confidence sets, this sometimes leads to intractable robust optimization problems.

LPs with Gaussian uncertainty

$$ \begin{aligned} &\min\nolimits_x &&c^\top x \\ &\subjectto &&\{\Pr\{a_i^\top x > b_i\}\le \eps\}_{i=1}^m \end{aligned} $$

where $a_i$ are independent $N(\bar{a},\Sigma)$ random vectors and $\eps\in(0,1)$ is a failure probability

Marginally, $a_i^\top x\sim N(\bar{a}^\top x,x^\top\Sigma x)$, so

$$ \begin{aligned} &\{x\mid\Pr\{a_i^\top x > b_i\}\le \eps\} \\ &\quad= \{x\mid\bar{a}_i^\top x - b_i - \Phi(\eps)^{-1}(x^\top\Sigma x)^{\frac12} \le 0\}. \end{aligned} $$

For $\eps = 0.5$, the robust constraints remain (conveniently) linear, but

  • for $\eps < 0.5$, the constraints are quadratic, but they remain convex

  • for $\eps < 0.5$, the constraints are non-convex quadratic constraints

Is convexity enough?

Unfortunately not. Consider the quadratic constraint

$$ \|Ax + Bu\|_2 \le 1\text{ for all }\|u\|_\infty\le 1. $$

The robust constraint is

$$\textstyle \max_{\|u\|_\infty\le 1}\|Ax + Bu\|_2 \le 1 $$

includes a convex quadratic maximization problem.

Maximizing convex quadratics on the $\ell_\infty$-ball (even approximately) is computationally intractable (ie NP-hard).

Robust LPs

Q: When is a robust LP still an LP (a robust SOCP an still SOCP, a robust SDP still an SDP etc)?

$$ \begin{aligned} &\min\nolimits_x &&c^\top x \\ &\subjectto &&(A + U)x \preceq b\text{ for all }U\in\cU \end{aligned} $$

Consider one (robust) inequality constraint:

$$ (a + u)^T x \le b \text{ for all } u \in \mathcal{U}. $$

Ex: If $\mathcal{U} = \{u\in\reals^n\mid\|u\|_\infty \le\delta\}$, then the constraint is equivalently

$$ a^T x + \delta \|x\|_1 \le b. $$

LPs with polyhedral uncertainty

$$ (a + u)^\top x \le b\text{ for all }u\in\cU\triangleq\{u\in\reals^n\mid Fu + g \succeq 0 \}. $$

This is a semi-infinite (set of) constraint(s). Let's reformulate it in a tractable way with duality.

The robust constraint is

$$\textstyle b\ge a^\top x + \max_{u\in\cU}u^\top x\equiv a^\top x + \left\{\begin{aligned} &\max\nolimits_u &&u^\top x \\ &\subjectto &&Fu + g \succeq 0 \end{aligned}\right\}. $$

The Lagrangian of the maximization in the robust constraint is

$$ \begin{aligned} L(u,\lambda) &= x^\top u + \lambda^\top(Fu + g),\quad\lambda\succeq 0 \\ &= (x + F^\top\lambda)^\top u + \lambda^\top g. \end{aligned} $$

LPs with polyhedral uncertainty

We maximize with respect to $u$ to obtain the dual function

$$ \begin{aligned} g(\lambda) &\triangleq \max\nolimits_u(x + F^\top\lambda)^\top u + \lambda^\top g,\quad\lambda\succeq 0\\ &= \begin{cases} \infty & \text{if } F^\top\lambda + x \neq 0 \\ \lambda^\top g & \text{if } F^\top\lambda + x = 0 \end{cases}. \end{aligned} $$

The dual function is a sharp upper bound of the primal value, so the robust constraint is equivalent to

$$ \begin{aligned} &\{x\in\reals^n\mid a^\top x + \max\nolimits_{u\in\cU}u^\top x\le b\} \\ &\quad\equiv \{x\in\reals^n\mid a^\top x + \min\nolimits_\lambda g(\lambda)\le b\},\\ &\quad\equiv\{x\in\reals^n\mid\left.\text{there is $\lambda\succeq 0$ such that }\begin{aligned} a^\top x + \lambda^\top g \le b,\\ F^\top\lambda + x = 0.\\ \end{aligned}\right\} \end{aligned} $$

LPs with norm uncertainty

$$ (a + Pu)^\top x\le b\text{ for all }u\in\cU\triangleq\{u\in\reals^n\mid\|u\|\le 1\}. $$

The robust constraint is

$$ \begin{aligned} b &\ge\max\nolimits_{u\in\cU}(a + Pu)^\top x \\ &= a^\top x + \max\nolimits_{u\in\cU}u^\top Px \\ &= a^\top x + \|Px\|_* &\text{(dual norm def)}\\ \end{aligned} $$

LPs with conic uncertainty

$$ (a + u)^\top x \le b\text{ for all }u\in\cU\triangleq\{u\in\reals^n\mid Fu + g {\color{red}\succeq_K} 0 \}. $$

Similar to the polyhedral uncertainty case, the robust constraint is equivalent to the constraints

$$ \begin{aligned} a^\top x + \lambda^\top g \le b,\\ F^\top\lambda + x = 0,\\ \lambda {\color{red}\succeq_{K^*}} 0. \end{aligned} $$

We defer the details as an exercise.

SOCPs with box uncertainty

$$ \|(A+\Delta)x + b\|_2 \le (c + u)^\top x + d\text{ for all }\|\Delta\|_\infty\le\delta, u\in\cU. $$

This is equivalent to two robust constraints:

$$ \begin{aligned} \|(A+\Delta)x + b\|_2 \le t\text{ for all }\|\Delta\|_\infty\le\delta, \\ t\le (c + u)^\top x + d\text{ for all }u\in\cU. \end{aligned} $$

The 2nd (robust) constraint is an uncertain LP constraint (which we saw how to handle).

The 1st constraint is equivalent to

$$ \begin{aligned} \|z\|_2 \le t, \\ \{|A_i^\top x + b_i| + \delta\|x\|_1 \le z_i\}_{i=1}^m. \end{aligned} $$

SOCPs with box uncertainty

Pf: The 1st (robust) constraint is equivalent to

$$ \max\nolimits_{\|\Delta\|_\infty\le\delta}\|(A+\Delta)x + b\|_2 \le t. $$

We simplify the constraint function to obtain

$$ \begin{aligned} &\max\nolimits_{\|\Delta\|_\infty\le\delta}\|(A+\Delta)x + b\|_2 \\ &\quad=\textstyle \max\nolimits_{\|\Delta\|_\infty\le\delta}(\sum_{i=1}^m[(A_i+\Delta_i)^\top x + b_i]^2)^{\frac12} \\ &\quad=\max\nolimits_{\|\Delta\|_\infty\le\delta}\{\|z\|_2\mid z_i = (A_i + \Delta_i)^\top x+ b_i,i\in[m]\} \\ &\quad=\min\{\|z\|_2\mid z_i \ge |A_i^\top x + b_i| + \delta\|x\|_1\}. \end{aligned} $$

Thus the 1st constraint is equivalent to

$$\min\{\|z\|_2\mid z_i \ge |A_i^\top x + b_i| + \delta\|x\|_1\} \le t.$$

SOCPs with ellipse-type uncertainty

$$\textstyle (\sum_{i=1}^m[(a_i + P_iu)^\top x + b_i]^2)^{\frac12} \le t\text{ for all }\|u\|\le 1. $$

We simplify the constraint function to obtain

$$ \begin{aligned} &\textstyle\max\nolimits_{\|u\|\le 1}(\sum_{i=1}^m[(a_i+P_iu)^\top x + b_i]^2)^{\frac12} \\ &\quad=\max\nolimits_{\|u\|\le 1}\{\|z\|_2\mid z_i = (a_i + P_iu)^\top x+ b_i,i\in[m]\} \\ &\quad=\min\{\|z\|_2\mid z_i \ge |a_i^\top x + b_i| + \|P_ix\|_*\}. \end{aligned} $$

Thus the (robust) constraint is equivalent to

$$ \begin{aligned} \|z\|_2 \le t, \\ \{|A_i^\top x + b_i| + \|P_ix\|_* \le z_i\}_{i=1}^m. \end{aligned} $$

SOCPs with matrix uncertainty

$$ \def\sp{\text{sp}} \textstyle \|(A+P\Delta)x + b\|_2 \le t\text{ for all }\|\Delta\|_\sp\le 1, $$

where $\|\Delta\|_\sp$ is the spectral norm.

The LMI representation of the robust constraint is

$$ \begin{bmatrix}t & ((A+P\Delta)x + b)^\top \\ (A+P\Delta)x + b& tI_n\end{bmatrix}\succeq 0\text{ for all }\|\Delta\|_2\le 1. $$

Equivalently,

$$ \begin{aligned} 0 &\le \begin{bmatrix}s \\ y\end{bmatrix}^\top\begin{bmatrix}t & ((A+P\Delta)x + b)^\top \\ (A+P\Delta)x + b& tI_m \end{bmatrix}\begin{bmatrix}s \\ y\end{bmatrix} \\ &= s^2t + 2sy^\top((A+P\Delta)x + b) + t\|y\|_2^2 \end{aligned} $$ for all $(s,y)\in\reals^{m+1}$ and $\|\Delta\|_\sp\le 1$.

SOCPs with matrix uncertainty

We minimizing the right side with respect to $\Delta$ to obtain

$$ 0\le s^2t + 2sy^\top(Ax + b) + t\|y\|_2^2 - 2\|sx\|_2\|P^\top y\|_2. $$

for all $(s,y)\in\reals^{m+1}$.

The preceding constraints are equivalent to

$$ 0\le s^2t + 2sy^\top(Ax + b) + t\|y\|_2^2 + sx^\top z $$

for all $(s,y,z)\in\reals^{m+n+1}$ and $\|z\|_2\le\|P^\top y\|_2$ because

$$\textstyle \|sx\|_2\|P^\top y\|_2 = \max_{\|z\|_2\le\|P^\top y\|_2}sx^\top z. $$

SOCPs with matrix uncertainty

We restate the preceding constraints as $\|z\|_2\le\|P^\top y\|_2$ implies

$$\textstyle 0\le s^2t + 2s(y^\top(Ax + b) + x^\top z) + t\|y\|_2^2 $$

for all $(s,y,z)\in\reals^{m+n+1}$. Equivalently,

$$ \begin{aligned} \begin{bmatrix}s \\ y \\ z\end{bmatrix}^\top\begin{bmatrix}0 & 0_m^\top & 0_n^\top \\ 0_m & PP^\top & 0_{m\times n} \\ 0_n & 0_{n\times m} & -I_n\end{bmatrix}\begin{bmatrix}s \\ y \\ z\end{bmatrix}\ge 0\text{ implies } \\ \begin{bmatrix}s \\ y \\ z\end{bmatrix}^\top\begin{bmatrix}t & (Ax + b)^\top & z^\top \\ Ax + b & tI_m & 0 \\ z & 0 & 0\end{bmatrix}\begin{bmatrix}s \\ y \\ z\end{bmatrix}\ge 0. \end{aligned} $$

SOCPs with matrix uncertainty

$S$-lemma: $x^\top Ax\ge 0$ implies $x^\top Bx\ge 0$ for all $x$ iff $B - \lambda A \succeq 0$ for some $\lambda\ge 0$.

The $S$-lemma allows us to restate the robust constraint as

$$ \begin{bmatrix}t & (Ax + b)^\top & z^\top \\ Ax + b & tI_m - \lambda PP^\top & 0 \\ z & 0 & 0 \lambda I_n\end{bmatrix}\succeq 0. $$

Chance constraints

$$ \begin{aligned} &\min\nolimits_x &&f_0(x)\\ &\subjectto &&\{f_i(x,u) > 0\}_{i=1}^m \end{aligned} $$

Safe approximation: Consider $u$ as a random variable, and choose $\cU$ to be a $1-\alpha$ confidence set for $u$:

$$ \Pr\{U\in\cU\} \ge 1-\alpha. $$

Chance constraints: (directly) work with $\Pr\{f_i(x,U) > 0\}$ (instead of $\Pr\{U\in\cU\}$)

Chance constraints

$$ \Pr\{f_i(x,U) \le 0\} \ge 1-\alpha \equiv \Pr\{f_i(x,U) > 0\} \le \alpha $$
  • convex in some cases

  • common $\alpha$ values: 0.1, 0.05, 0.01

  • smaller $\eta$ values (e.g 0.001) are meaningless because the tails of the distribution of $\omega$ are generally unknown

Value-at-Risk (VaR)

VaR of a random scalar $z$ at level $\eta$:

$$ \begin{aligned} \VaR(z;\alpha) &\triangleq\inf\{\gamma\mid\Pr\{z \le \gamma\} \ge 1-\alpha\} \\ &=\inf\{\gamma\mid\Pr\{z> \gamma\} \le \alpha\} \end{aligned} $$
  • $1-\alpha$-quantile of $z$

  • $\VaR(z;\alpha)$ is the worst possible outcome excluding the worst outcomes with total probability at most $\alpha$.

  • chance constraints are VaR constraints:

    $$ \Pr\{f_i(x,\omega) \le 0\} \ge 1-\alpha\equiv\VaR(f_i(x,\omega);\alpha) \le 0 $$