Constraints and Duality

So far the variable $\xv$ has ranged freely over $\R^n$ . Most real problems instead minimize $f$ over a feasible set defined by constraints, equalities and inequalities. The price of admission is a small piece of new machinery, the Lagrangian, which turns constrained problems back into unconstrained ones and exposes the deep symmetry between primal and dual.

The primal problem

The Lagrangian

Attach a multiplier to each constraint, packaged into the Lagrangian:

L(\xv, \lambdav, \nuv) \;=\; f(\xv) + \sum_{i=1}^m \lambda_i\,g_i(\xv) + \sum_{j=1}^p \nu_j\,h_j(\xv), \qquad \lambdav \ge \mathbf{0}.

For any feasible $\xv$ , $g_i(\xv) \le 0$ and $h_j(\xv) = 0$ , so $L(\xv, \lambdav, \nuv) \le f(\xv)$ , with equality when $\lambda_i\,g_i(\xv) = 0$ (the inactive constraints contribute nothing).

The dual function is the infimum over $\xv$ ,

q(\lambdav, \nuv) \;=\; \inf_{\xv} L(\xv, \lambdav, \nuv).

As an infimum of affine functions of $(\lambdav, \nuv)$ , the dual is concave, no matter how nasty $f$ is.

KKT conditions

At a primal optimum $\xv^\star$ , the gradients of $f$ and the constraints obey a balance equation. Under mild regularity (Slater’s condition for convex problems; constraint qualifications in general) the four KKT conditions are necessary, and for convex problems sufficient.

Statement
Reading

If $\xv^\star$ is a regular local minimum, there exist multipliers $\lambdav^\star \ge \mathbf{0}$ and $\nuv^\star$ satisfying:

Stationarity: $\nabla_\xv L(\xv^\star, \lambdav^\star, \nuv^\star) = \mathbf{0}$ , i.e. $\nabla f(\xv^\star) + \sum_i \lambda_i^\star \nabla g_i(\xv^\star) + \sum_j \nu_j^\star \nabla h_j(\xv^\star) = \mathbf{0}$ .
Primal feasibility: $g_i(\xv^\star) \le 0$ and $h_j(\xv^\star) = 0$ .
Dual feasibility: $\lambda_i^\star \ge 0$ .
Complementary slackness: $\lambda_i^\star\,g_i(\xv^\star) = 0$ for every $i$ .

Weak and strong duality

Because $L(\xv, \lambdav, \nuv) \le f(\xv)$ on the feasible set, taking the infimum on the left and any feasible $\xv$ on the right gives

q(\lambdav, \nuv) \;\le\; p^\star \quad \text{for every } \lambdav \ge \mathbf{0}, \nuv.

Maximizing the left side defines the dual problem

d^\star \;=\; \max_{\lambdav \ge \mathbf{0},\,\nuv} q(\lambdav, \nuv).

This is the weak duality inequality: $d^\star \le p^\star$ . The gap $p^\star - d^\star$ is the duality gap, and the central fact of convex optimization is that under mild conditions it vanishes.

Strong duality means a convex problem can be solved by either primal or dual route, and that the dual multipliers $\lambdav^\star$ are themselves the shadow prices of the constraints (the sensitivity of $p^\star$ to relaxing each $g_i$ ). Almost every interior-point and primal-dual solver runs on this duality.

Linear programming

A linear program is the special case where $f$ and the constraints are all linear.

LP duality is strong unconditionally (no Slater needed, as both feasible sets are polyhedra). The complementary slackness condition becomes

x_i^\star\,(c_i - \Av_{:,i}^{\rm T}\yv^\star) = 0 \quad \text{for every } i,

saying that each primal variable that is positive in the optimum corresponds to a dual constraint that is tight, and vice versa. The classical simplex method walks along the vertices of the feasible polyhedron; interior-point methods follow a central path inside it.

Two-person zero-sum games and the minimax theorem

A finite zero-sum game is a real payoff matrix $\Av \in \R^{m \times n}$ : the row player picks a row $i$ , the column player picks a column $j$ , and the row player pays $A_{ij}$ . In mixed strategies each player chooses a distribution; the row player picks $\xv \in \Delta_m$ ( $\Delta$ = simplex, $\xv \ge \mathbf{0}$ , $\mathbf{1}^{\rm T}\xv = 1$ ), the column player picks $\yv \in \Delta_n$ , and the expected payoff is $\xv^{\rm T}\Av\yv$ .

The row player wants to minimize the worst-case loss; the column player wants to maximize the worst-case gain. The remarkable fact, proved by von Neumann, is that these match.

The proof is exactly LP duality: the row player’s minimax is a linear program in $\xv$ and an auxiliary variable $v$ (the worst-case loss), and its dual is the column player’s maximin. Strong duality for LPs collapses the two onto the same number. The minimax theorem is, structurally, the same theorem as KKT applied to a particularly clean polyhedral feasible set.

What this enables

Duality is the engine behind a lot of machine learning. Support vector machines rewrite the geometric margin maximization as a quadratic program; its dual involves only inner products of data, which is the gateway to the kernel trick. Lasso and other regularized regressions have dual variables that act as feature selectors. Reinforcement learning game-theoretic formulations (multi-agent training, GAN minimax objectives) are minimax problems solved by alternating gradient methods, where the duality theory tells us what equilibria to expect. Each of these is a constrained optimization sitting on the same machinery: the Lagrangian, KKT, and strong duality.