Skip to content

Constraints and Duality

So far the variable x\xv has ranged freely over Rn\R^n. Most real problems instead minimize ff over a feasible set defined by constraints, equalities and inequalities. The price of admission is a small piece of new machinery, the Lagrangian, which turns constrained problems back into unconstrained ones and exposes the deep symmetry between primal and dual.

Attach a multiplier to each constraint, packaged into the Lagrangian:

L(x,λ,ν)  =  f(x)+i=1mλigi(x)+j=1pνjhj(x),λ0.L(\xv, \lambdav, \nuv) \;=\; f(\xv) + \sum_{i=1}^m \lambda_i\,g_i(\xv) + \sum_{j=1}^p \nu_j\,h_j(\xv), \qquad \lambdav \ge \mathbf{0}.

For any feasible x\xv, gi(x)0g_i(\xv) \le 0 and hj(x)=0h_j(\xv) = 0, so L(x,λ,ν)f(x)L(\xv, \lambdav, \nuv) \le f(\xv), with equality when λigi(x)=0\lambda_i\,g_i(\xv) = 0 (the inactive constraints contribute nothing).

The dual function is the infimum over x\xv,

q(λ,ν)  =  infxL(x,λ,ν).q(\lambdav, \nuv) \;=\; \inf_{\xv} L(\xv, \lambdav, \nuv).

As an infimum of affine functions of (λ,ν)(\lambdav, \nuv), the dual is concave, no matter how nasty ff is.

At a primal optimum x\xv^\star, the gradients of ff and the constraints obey a balance equation. Under mild regularity (Slater’s condition for convex problems; constraint qualifications in general) the four KKT conditions are necessary, and for convex problems sufficient.

Because L(x,λ,ν)f(x)L(\xv, \lambdav, \nuv) \le f(\xv) on the feasible set, taking the infimum on the left and any feasible x\xv on the right gives

q(λ,ν)    pfor every λ0,ν.q(\lambdav, \nuv) \;\le\; p^\star \quad \text{for every } \lambdav \ge \mathbf{0}, \nuv.

Maximizing the left side defines the dual problem

d  =  maxλ0,νq(λ,ν).d^\star \;=\; \max_{\lambdav \ge \mathbf{0},\,\nuv} q(\lambdav, \nuv).

This is the weak duality inequality: dpd^\star \le p^\star. The gap pdp^\star - d^\star is the duality gap, and the central fact of convex optimization is that under mild conditions it vanishes.

Strong duality means a convex problem can be solved by either primal or dual route, and that the dual multipliers λ\lambdav^\star are themselves the shadow prices of the constraints (the sensitivity of pp^\star to relaxing each gig_i). Almost every interior-point and primal-dual solver runs on this duality.

A linear program is the special case where ff and the constraints are all linear.

LP duality is strong unconditionally (no Slater needed, as both feasible sets are polyhedra). The complementary slackness condition becomes

xi(ciA:,iTy)=0for every i,x_i^\star\,(c_i - \Av_{:,i}^{\rm T}\yv^\star) = 0 \quad \text{for every } i,

saying that each primal variable that is positive in the optimum corresponds to a dual constraint that is tight, and vice versa. The classical simplex method walks along the vertices of the feasible polyhedron; interior-point methods follow a central path inside it.

Two-person zero-sum games and the minimax theorem

Section titled “Two-person zero-sum games and the minimax theorem”

A finite zero-sum game is a real payoff matrix ARm×n\Av \in \R^{m \times n}: the row player picks a row ii, the column player picks a column jj, and the row player pays AijA_{ij}. In mixed strategies each player chooses a distribution; the row player picks xΔm\xv \in \Delta_m (Δ\Delta = simplex, x0\xv \ge \mathbf{0}, 1Tx=1\mathbf{1}^{\rm T}\xv = 1), the column player picks yΔn\yv \in \Delta_n, and the expected payoff is xTAy\xv^{\rm T}\Av\yv.

The row player wants to minimize the worst-case loss; the column player wants to maximize the worst-case gain. The remarkable fact, proved by von Neumann, is that these match.

The proof is exactly LP duality: the row player’s minimax is a linear program in x\xv and an auxiliary variable vv (the worst-case loss), and its dual is the column player’s maximin. Strong duality for LPs collapses the two onto the same number. The minimax theorem is, structurally, the same theorem as KKT applied to a particularly clean polyhedral feasible set.

Duality is the engine behind a lot of machine learning. Support vector machines rewrite the geometric margin maximization as a quadratic program; its dual involves only inner products of data, which is the gateway to the kernel trick. Lasso and other regularized regressions have dual variables that act as feature selectors. Reinforcement learning game-theoretic formulations (multi-agent training, GAN minimax objectives) are minimax problems solved by alternating gradient methods, where the duality theory tells us what equilibria to expect. Each of these is a constrained optimization sitting on the same machinery: the Lagrangian, KKT, and strong duality.