Constraints and Duality
So far the variable has ranged freely over . Most real problems instead minimize over a feasible set defined by constraints, equalities and inequalities. The price of admission is a small piece of new machinery, the Lagrangian, which turns constrained problems back into unconstrained ones and exposes the deep symmetry between primal and dual.
The primal problem
Section titled “The primal problem”The Lagrangian
Section titled “The Lagrangian”Attach a multiplier to each constraint, packaged into the Lagrangian:
For any feasible , and , so , with equality when (the inactive constraints contribute nothing).
The dual function is the infimum over ,
As an infimum of affine functions of , the dual is concave, no matter how nasty is.
KKT conditions
Section titled “KKT conditions”At a primal optimum , the gradients of and the constraints obey a balance equation. Under mild regularity (Slater’s condition for convex problems; constraint qualifications in general) the four KKT conditions are necessary, and for convex problems sufficient.
Weak and strong duality
Section titled “Weak and strong duality”Because on the feasible set, taking the infimum on the left and any feasible on the right gives
Maximizing the left side defines the dual problem
This is the weak duality inequality: . The gap is the duality gap, and the central fact of convex optimization is that under mild conditions it vanishes.
Strong duality means a convex problem can be solved by either primal or dual route, and that the dual multipliers are themselves the shadow prices of the constraints (the sensitivity of to relaxing each ). Almost every interior-point and primal-dual solver runs on this duality.
Linear programming
Section titled “Linear programming”A linear program is the special case where and the constraints are all linear.
LP duality is strong unconditionally (no Slater needed, as both feasible sets are polyhedra). The complementary slackness condition becomes
saying that each primal variable that is positive in the optimum corresponds to a dual constraint that is tight, and vice versa. The classical simplex method walks along the vertices of the feasible polyhedron; interior-point methods follow a central path inside it.
Two-person zero-sum games and the minimax theorem
Section titled “Two-person zero-sum games and the minimax theorem”A finite zero-sum game is a real payoff matrix : the row player picks a row , the column player picks a column , and the row player pays . In mixed strategies each player chooses a distribution; the row player picks ( = simplex, , ), the column player picks , and the expected payoff is .
The row player wants to minimize the worst-case loss; the column player wants to maximize the worst-case gain. The remarkable fact, proved by von Neumann, is that these match.
The proof is exactly LP duality: the row player’s minimax is a linear program in and an auxiliary variable (the worst-case loss), and its dual is the column player’s maximin. Strong duality for LPs collapses the two onto the same number. The minimax theorem is, structurally, the same theorem as KKT applied to a particularly clean polyhedral feasible set.
What this enables
Section titled “What this enables”Duality is the engine behind a lot of machine learning. Support vector machines rewrite the geometric margin maximization as a quadratic program; its dual involves only inner products of data, which is the gateway to the kernel trick. Lasso and other regularized regressions have dual variables that act as feature selectors. Reinforcement learning game-theoretic formulations (multi-agent training, GAN minimax objectives) are minimax problems solved by alternating gradient methods, where the duality theory tells us what equilibria to expect. Each of these is a constrained optimization sitting on the same machinery: the Lagrangian, KKT, and strong duality.