Karush-Kuhn-Tucker Conditions

The Karush-Kuhn-Tucker (KKT) conditions, sometimes referred to as the Kuhn-Tucker conditions, are the conditions that a Nonlinear Programming problem need to meet in order to be optimal.  The KKT conditions extends the method of Lagrangian multipliers by allowing for inequality constraints, as opposed to the Lagrange Multipliers which only allow equality constraints.

Let us consider a nonlinear optimization problem:

Minimize f(x)
subject to:g_{i}(x)\leq 0 \\ h_{j}(x) = 0

We will let x^{*} represent a relative minimum point for our problem, which also satisfies some constraint qualification. With this, then we can assume that for every element there exists a vector \lambda_j \left ( j = 1, \ldots, l \right ), where l represents the number of equality constraints, and \mu_i\left ( i = 1, \ldots, m \right ), where m represents the number of inequality constraints. These constants, \lambda_j and \mu_i, are called KKT multipliers.

In order for the KKT conditions to be held in a nonlinear programming problem (NLP), then each of three conditions must be met [1-4]. The Primal Feasibility restates what the problem states, that the inequality and equality constraints on x^* must be met in order for the problem to be optimal:

\left.\begin{matrix}  h\left ( x^* \right )=0\\  g\left ( x^* \right )\leq0  \end{matrix}\right\} \textrm{primal feasibility (PF)}

The second condition is known as the dual feasibility condition.  In this condition states, rather verbosely, that every element in \mu_i must be greater than zero, and that the stationarity of the problem must be equal to 0.

\left.\begin{matrix}  \bigtriangledown f\left ( x^* \right ) + \sum_{i=1}^{m}\lambda_{i}\bigtriangledown h_{i}\left ( x^* \right ) + \sum_{j=1}^{l}\mu_{j}\bigtriangledown g_{j}\left ( x^* \right ) = 0\\  \lambda_i \in \mathbb{R}\\  \mu_i \geq 0  \end{matrix}\right\}\textrm{dual feasibility (DF)}

The stationarity of the problem is:

\bigtriangledown f\left ( x^* \right ) + \sum_{i=1}^{m}\lambda_{i}\bigtriangledown h_{i}\left ( x^* \right ) + \sum_{j=1}^{l}\mu_{j}\bigtriangledown g_{j}\left ( x^* \right ) = 0

While the other two are simply conditions that λ and μ must meet in order for x^* to be optimal.

The third condition that must be met is known as complementary slackness.  This condition simply states that for each mu and its respective inequality constraint, the product of the two should result in zero:

\mu_i g_{i} \left ( x^* \right ) = 0

When these three conditions are met, we have met the KKT conditions and our solution, x^*, is an optimal solution for the NLP problem.  There maybe more than one x in the space which meet the conditions.  Any point in the problem space where every element of λ and μ, such that the tuple (x, λ, μ) satisfy the KKT conditions are called KKT points. Derivation of these constraints can be found in [1,2]

Constraint Qualifications

As mentioned previously, the points that we are testing need to meet some qualification in order for the point to be considered. The most well known constraint Qualification is the Linear Independence constraint Qualification (LICQ), which simply states that h_{j}(x^*) and g_{i}(x^*) are linearly independent from the other at point x^*. The Mangasarian-Fromovitz constraint qualification (MFCQ) states similarly the LICQ with the addition of being positive-lineraly independent at x^*. [5]

There are however other constraint qualifiers that relax the LICQ. The Slater constraint qualifier can be used in convex problems. If there exists a point x such that h_{j}(x^*) = and g_{i}(x^*) < 0 for all i,j active in x^*, then the slate condition holds. [5,6]

Other types of constraint qualifiers do exist, but these three seem to be the most commonly used in KKT qualification.

[1] Kuhn, H. and Tucker, A., “Nonlinear Programming” Proceedings of the 2nd Berkeley Symposium 1951, pp. 481-492.
[2] Karush, W., “Minima of Functions of Several Variables with Inequalities as Side Constraints”. M.Sc. Dissertation, Univ. of Chicago, Chicago, Il, 1939.
[3] Kuhn, M. “The Karush-Kuhn-Tucker Theorem”, Internet: http://smp.if.uj.edu.pl/~kopiec/MT/Materialy/KarushKuhnTucker.pdf, CDSEM Uni. Mannheim, 2006.
[4]McCarl, B. and Spreen, T., “Nonlinear Optimization Conditions”, Ch. 12, Applied Mathematical Programming Using Algebraic Systems. Internet: http://agecon2.tamu.edu/people/faculty/mccarl-bruce/mccspr/thebook.pdf
[5]Eustaquio, R. Karas, E. and Ribeiro, A. Constraint Qualification for Nonlinear Programming, Tech Report, Univ. of Parana.
[6] Tiba, D. and Zalinescu, C. “On the Necessity of some Constraint Qualification Conditions in Convex Programming”, Journal of Convex Analysis, 11 (1), 2004. pp 95-110.

Isoperimetric problem (Dido’s Problem)

History

Dido (or Elissa, depending on the reference) is the legendary founder and first Queen of Carthage. Her accounts are mentioned by the historian Justin and later as a part of Virgil’s “Aeneid” poem. In this story, Elissa is said to have escaped Tyre with groups of her brother Pygmalion’s attendants and senators. Eventually Elissa’s party arrived on the coast of Northern Africa where Elissa had asked the local inhabitants for a small area of land for a refuge. The isoparametric problem came into existence. Elissa asked for only as much land as could be encompassed by an oxhide. She used a stroke of mathematical genius, cutting the hide into long, thin strips in order to encircle the entirety of the nearby hill. This land became what is Carthage, and Elissa (or Dido) became the Queen of the city.

The history of Elissa is mentioned by Justin, however, the problem is formulated from a passage in Virgil’s “Aeneid”:

The Kingdom you see is Carthage, the Tyrians, the town of Agenor;
But the country around is Libya, no folk to meet in war.
Dido, who left the city of Tyre to escape her brother,
Rules here – a long and labyrinthine tale of wrong
Is hers, but I will touch on its salient points in order…
Dido, in great disquiet, organized her friends for escape.
They met together, all those who harshly hated the tyrant
Or keenly feared him: they seized some ships which chanced to be ready…
They came to this spot, where today you can behold the mighty
Battlements and rising citadel of New Carthage,
And purchased a site, which was named “Bull’s Hide” after the bargain
By which they should get as much land as they could enclose with a bull’s hide. [1]

Mathematical Formulation

Knowing the history helps us to see how this might be useful, but it is not formalized in any mathematical sense. If we think of the problem, the problem becomes similar to what is known in mathematics as the isoperimetric problem. In Dido’s problem, Dido cuts the pieces of the ox-hide into strips, effectively using the area of the hide as the perimeter. From this point, the goal of the Dido problem then is to find the closed curve that has maximal area for a given perimeter. The problem is at this point, the isoperimetric problem.

Isoperimetric, literally means “having the same perimeter”, therefore, we know that the perimeters are fixed and our goals is to simply maximize the area. The isoperimetric problem requires the use of the isoperimetric inequality (and quotient) where if we think planar geometry, we can define a figure to have an area A and a perimeter p. The quotient then is [2]:

Q\equiv \frac{4\pi A}{p^2} \leq 1

This quotient is derived from the ratio of the curve area to the area of a circle (A = \pi r_A^2) with the same perimeter as the curve (p=2 \pi r_p). We can therefore derive the quotient:

\begin{matrix}  Q & \equiv & \frac{r_A}{r_p^2}\\  & = & \frac{\left ( \frac{A}{\pi}\right )}{ \left ( \frac{p}{2 \pi} \right )^2} \\  & = & \frac{4 \pi A}{p^2}  \end{matrix}

Mathematics tells us that this inequality holds for the shape of a circle only. Mathematical Proofs for this concept representing a can be found at [3-6]

Usefulness in Computer Science

In combinatorics and computer science, the use of Isoperimetric inequalities (the basis of the Dido problem) plays a role in designing robust computer networks, several applications in complexity theory, and error-correcting codes, through the use of expander graphs.[7] Expander graphs are a sparse graph with strong connectivity properties. The isoperimetric problems in graph theory (described on page 470 of [7]) then have usefulness in graph optimization problems for expander graphs.

There are perhaps many other applications of the Isoperimetric problem and the Dido problem, as the Dido problem is in itself an optimization of maximal area given a perimeter.

Works Cited:

[1] Virgil. Trans. by C.D. Lewis. Book 1, lines 307-372 in The Aeneid. New York: Doubleday, pp 22-23. 1953.
[2] Osserman, R. “Isoperimetric Inequalities.” Appendix 3, Sec 3 in A Survey of Minimal Surfaces. New York: Dover, pp. 147-148, 1986.
[3] Bonnesen, Les Problèmes des Isopérîmètres et des Isépîphanes. Paris: Gauthier-Villars, pp 59-61, 1929.
[4] Bonnesen, T. and Fenchel, W., Theorie der Convexen Körper. Chelsea Publishing, New York, S.111-112, 1948.
[5] Magnani, C. “The Problem of Dido” Internet: http://mathematicalgarden.wordpress.com/2008/12/21/the-problem-of-dido/, December 2008 [September 5, 2011].
[6] Luthy, P. “Two Cute Proofs of the Isoperimetric Inequality” Internet: http://cornellmath.wordpress.com/2008/05/16/two-cute-proofs-of-the-isoperimetric-inequality/, May 16, 2008 [September 5, 2011].
[7]Hoory, S., Linial, N., and Wigderson, A. “Expander Graphs and Their Applications” Bulletin of the American Mathematical Society 43 (4): 439-561, 2006. dpi: 10.1090/S0273-0979-06-01126-8.

Lagrangian Optimization

The Lagrangian multiplier is a method of mathematical optimization that provides a means of finding the maxima and minima of a mathematical function, f, subject to constraints, g. The contour map below shows an optimization problem. We want to consider the problem where we maximize f \left ( x, y \right ) subject to the constraints g \left ( x, y \right ) = c. The solution to this problem then is the point where the red line tangentially touches a blue contour. This answer can be found using Lagrangian multipliers.

Lagrangian Optimization

Contour Map where the red line shows the constraint g(x,y) = c and the blue times are contours of f(x,y).

We can construct a Lagrangian function as shown below. [1] We assume that function, f, is the problem for which an extremum of a function is desired:
f \left ( x_1, x_2, \ldots, x_n \right )

with a set of any number of constraint functions:
\begin{matrix}  g_i \left (x_1, x_2, \ldots, x_n \right ) = b_i, & i = 1,2,\ldots, m, & m < n  \end{matrix}

Then our Lagrangian function is defined as:
\Lambda \left ( x_1, x_2, \ldots, x_n,\lambda  \right ) = f\left ( x_1, x_2, \ldots, x_n \right ) + \sum_{i=1}^{m}\lambda _i\cdot \left ( b_i - g_i(x_1, x_2, \ldots, x_n) \right )

The Lagrangian multiplier can be used in any multidimensional space. The example below is given in [2].

A box is to be constructed out of various materials. The material to be used for the front and back sides costs $1 per square meter. The material to be used for the left and right sides costs $2 per square meter. The material to be used for the top and bottom costs $4 per square meter. What is the maximum volume that can be enclosed for a total material cost of $192?

We know the constraint, C. We should also know that the Volume of a cube (box) is V = LWH. We must find the constraint function…

Since we have a front and back side that cost $1 each, we have 2 planes that are length times width. We have a left and right side that cost $2 each at width time height. And lastly, we have a top and bottom side that cost $4 each at length times width:

2\left ( 1 L H \right ) + 2\left ( 2WH \right ) + 2\left ( 4LW \right ) = C \\  2LH+4WH+8LW - C = 0

We can then write our Lagrangian function:
\Lambda (L,W,H,\lambda ) = LWH + \lambda \left ( 2LH+4WH+8LW - C \right )

If we continue solving this out using partial derivatives and some algebraic techniques, as shown in [2], we eventually determine the maximal volume for the cost, C, of $192, is a Volume of 64 meters3. This is an example of how we can use the Lagrangian multiplier in a 3-dimensional space. The multiplier can work in any higher dimensional space as well.

What does the Lagrangian multiplier, λ, represent then? Well in the problems above, the Lagrangian multiplier represents a scalar value that allows us to change the size of the values used to scale the shape (circle in the contour map, and cube in the Volume problem). However, the Lagrangian multiplier can be used to represent actual values as well. [3] shares some examples of what the λ functions can represent in both physics and economics.

In the case of the physics of forces and potential energy, we might think of the constraint functions as representing forces that are helping to bring a point to it minimum or maximum. The Multiplier can then be thought of as a measure of how hard the constraint has to pull in order to balance our the forces on the constraint surface. In economics, we might think of maximizing profits based on some limited number of resources. The multiplier then represents the marginal value of the resource, or the rate at which the function changes if we change the constraints.

Therefore, understanding what the multiplier means takes an understanding of the problem that it is being applied. In some cases, it may only be a scaling value; in others, it may have a more significant meaning.

Impact on Computers

The Lagrangian Multiplier is useful in the field of optimal control theory. The multipliers can be interpreted as a cost ate variables. In Pontryagin’s maximum principle (or minimum principle), applied to deterministic problems, yields the same solution as many of the Bellman principles of Dynamic Programming, without the curse of dimensionality. The multiplier then works as a means of deriving the optimal minimum or maximum in control theory in a quicker means than Bellman’s equations for dynamic programming in higher dimensional spaces. [4]

Works Cited:

[1] Vapnyarkii, I.B., “Lagrange multipliers” in Encyclopaedia of Mathematics, New York: Springer, 2001.
[2] Hahn, K. Lagrange Multiplier Method for finding Optimums Internet: http://www.karlscalculus.org/pdf/lagrange.pdf [September 5, 2011]
[3] Jensen, S. An Introduction to Lagrange Multipliers Internet: http://www.slimy.com/~steuard/teaching/tutorials/Lagrange.html [September 5, 2011]
[4] Todorov, E. “Optimal Control Theory” in Bayesian Brain, Boston: MIT Press, 2006.