Karush-Kuhn-Tucker Conditions

The Karush-Kuhn-Tucker (KKT) conditions, sometimes referred to as the Kuhn-Tucker conditions, are the conditions that a Nonlinear Programming problem need to meet in order to be optimal.  The KKT conditions extends the method of Lagrangian multipliers by allowing for inequality constraints, as opposed to the Lagrange Multipliers which only allow equality constraints.

Let us consider a nonlinear optimization problem:

Minimize f(x)
subject to:g_{i}(x)\leq 0 \\ h_{j}(x) = 0

We will let x^{*} represent a relative minimum point for our problem, which also satisfies some constraint qualification. With this, then we can assume that for every element there exists a vector \lambda_j \left ( j = 1, \ldots, l \right ), where l represents the number of equality constraints, and \mu_i\left ( i = 1, \ldots, m \right ), where m represents the number of inequality constraints. These constants, \lambda_j and \mu_i, are called KKT multipliers.

In order for the KKT conditions to be held in a nonlinear programming problem (NLP), then each of three conditions must be met [1-4]. The Primal Feasibility restates what the problem states, that the inequality and equality constraints on x^* must be met in order for the problem to be optimal:

\left.\begin{matrix}  h\left ( x^* \right )=0\\  g\left ( x^* \right )\leq0  \end{matrix}\right\} \textrm{primal feasibility (PF)}

The second condition is known as the dual feasibility condition.  In this condition states, rather verbosely, that every element in \mu_i must be greater than zero, and that the stationarity of the problem must be equal to 0.

\left.\begin{matrix}  \bigtriangledown f\left ( x^* \right ) + \sum_{i=1}^{m}\lambda_{i}\bigtriangledown h_{i}\left ( x^* \right ) + \sum_{j=1}^{l}\mu_{j}\bigtriangledown g_{j}\left ( x^* \right ) = 0\\  \lambda_i \in \mathbb{R}\\  \mu_i \geq 0  \end{matrix}\right\}\textrm{dual feasibility (DF)}

The stationarity of the problem is:

\bigtriangledown f\left ( x^* \right ) + \sum_{i=1}^{m}\lambda_{i}\bigtriangledown h_{i}\left ( x^* \right ) + \sum_{j=1}^{l}\mu_{j}\bigtriangledown g_{j}\left ( x^* \right ) = 0

While the other two are simply conditions that λ and μ must meet in order for x^* to be optimal.

The third condition that must be met is known as complementary slackness.  This condition simply states that for each mu and its respective inequality constraint, the product of the two should result in zero:

\mu_i g_{i} \left ( x^* \right ) = 0

When these three conditions are met, we have met the KKT conditions and our solution, x^*, is an optimal solution for the NLP problem.  There maybe more than one x in the space which meet the conditions.  Any point in the problem space where every element of λ and μ, such that the tuple (x, λ, μ) satisfy the KKT conditions are called KKT points. Derivation of these constraints can be found in [1,2]

Constraint Qualifications

As mentioned previously, the points that we are testing need to meet some qualification in order for the point to be considered. The most well known constraint Qualification is the Linear Independence constraint Qualification (LICQ), which simply states that h_{j}(x^*) and g_{i}(x^*) are linearly independent from the other at point x^*. The Mangasarian-Fromovitz constraint qualification (MFCQ) states similarly the LICQ with the addition of being positive-lineraly independent at x^*. [5]

There are however other constraint qualifiers that relax the LICQ. The Slater constraint qualifier can be used in convex problems. If there exists a point x such that h_{j}(x^*) = and g_{i}(x^*) < 0 for all i,j active in x^*, then the slate condition holds. [5,6]

Other types of constraint qualifiers do exist, but these three seem to be the most commonly used in KKT qualification.

[1] Kuhn, H. and Tucker, A., “Nonlinear Programming” Proceedings of the 2nd Berkeley Symposium 1951, pp. 481-492.
[2] Karush, W., “Minima of Functions of Several Variables with Inequalities as Side Constraints”. M.Sc. Dissertation, Univ. of Chicago, Chicago, Il, 1939.
[3] Kuhn, M. “The Karush-Kuhn-Tucker Theorem”, Internet: http://smp.if.uj.edu.pl/~kopiec/MT/Materialy/KarushKuhnTucker.pdf, CDSEM Uni. Mannheim, 2006.
[4]McCarl, B. and Spreen, T., “Nonlinear Optimization Conditions”, Ch. 12, Applied Mathematical Programming Using Algebraic Systems. Internet: http://agecon2.tamu.edu/people/faculty/mccarl-bruce/mccspr/thebook.pdf
[5]Eustaquio, R. Karas, E. and Ribeiro, A. Constraint Qualification for Nonlinear Programming, Tech Report, Univ. of Parana.
[6] Tiba, D. and Zalinescu, C. “On the Necessity of some Constraint Qualification Conditions in Convex Programming”, Journal of Convex Analysis, 11 (1), 2004. pp 95-110.

Isoperimetric problem (Dido’s Problem)

History

Dido (or Elissa, depending on the reference) is the legendary founder and first Queen of Carthage. Her accounts are mentioned by the historian Justin and later as a part of Virgil’s “Aeneid” poem. In this story, Elissa is said to have escaped Tyre with groups of her brother Pygmalion’s attendants and senators. Eventually Elissa’s party arrived on the coast of Northern Africa where Elissa had asked the local inhabitants for a small area of land for a refuge. The isoparametric problem came into existence. Elissa asked for only as much land as could be encompassed by an oxhide. She used a stroke of mathematical genius, cutting the hide into long, thin strips in order to encircle the entirety of the nearby hill. This land became what is Carthage, and Elissa (or Dido) became the Queen of the city.

The history of Elissa is mentioned by Justin, however, the problem is formulated from a passage in Virgil’s “Aeneid”:

The Kingdom you see is Carthage, the Tyrians, the town of Agenor;
But the country around is Libya, no folk to meet in war.
Dido, who left the city of Tyre to escape her brother,
Rules here – a long and labyrinthine tale of wrong
Is hers, but I will touch on its salient points in order…
Dido, in great disquiet, organized her friends for escape.
They met together, all those who harshly hated the tyrant
Or keenly feared him: they seized some ships which chanced to be ready…
They came to this spot, where today you can behold the mighty
Battlements and rising citadel of New Carthage,
And purchased a site, which was named “Bull’s Hide” after the bargain
By which they should get as much land as they could enclose with a bull’s hide. [1]

Mathematical Formulation

Knowing the history helps us to see how this might be useful, but it is not formalized in any mathematical sense. If we think of the problem, the problem becomes similar to what is known in mathematics as the isoperimetric problem. In Dido’s problem, Dido cuts the pieces of the ox-hide into strips, effectively using the area of the hide as the perimeter. From this point, the goal of the Dido problem then is to find the closed curve that has maximal area for a given perimeter. The problem is at this point, the isoperimetric problem.

Isoperimetric, literally means “having the same perimeter”, therefore, we know that the perimeters are fixed and our goals is to simply maximize the area. The isoperimetric problem requires the use of the isoperimetric inequality (and quotient) where if we think planar geometry, we can define a figure to have an area A and a perimeter p. The quotient then is [2]:

Q\equiv \frac{4\pi A}{p^2} \leq 1

This quotient is derived from the ratio of the curve area to the area of a circle (A = \pi r_A^2) with the same perimeter as the curve (p=2 \pi r_p). We can therefore derive the quotient:

\begin{matrix}  Q & \equiv & \frac{r_A}{r_p^2}\\  & = & \frac{\left ( \frac{A}{\pi}\right )}{ \left ( \frac{p}{2 \pi} \right )^2} \\  & = & \frac{4 \pi A}{p^2}  \end{matrix}

Mathematics tells us that this inequality holds for the shape of a circle only. Mathematical Proofs for this concept representing a can be found at [3-6]

Usefulness in Computer Science

In combinatorics and computer science, the use of Isoperimetric inequalities (the basis of the Dido problem) plays a role in designing robust computer networks, several applications in complexity theory, and error-correcting codes, through the use of expander graphs.[7] Expander graphs are a sparse graph with strong connectivity properties. The isoperimetric problems in graph theory (described on page 470 of [7]) then have usefulness in graph optimization problems for expander graphs.

There are perhaps many other applications of the Isoperimetric problem and the Dido problem, as the Dido problem is in itself an optimization of maximal area given a perimeter.

Works Cited:

[1] Virgil. Trans. by C.D. Lewis. Book 1, lines 307-372 in The Aeneid. New York: Doubleday, pp 22-23. 1953.
[2] Osserman, R. “Isoperimetric Inequalities.” Appendix 3, Sec 3 in A Survey of Minimal Surfaces. New York: Dover, pp. 147-148, 1986.
[3] Bonnesen, Les Problèmes des Isopérîmètres et des Isépîphanes. Paris: Gauthier-Villars, pp 59-61, 1929.
[4] Bonnesen, T. and Fenchel, W., Theorie der Convexen Körper. Chelsea Publishing, New York, S.111-112, 1948.
[5] Magnani, C. “The Problem of Dido” Internet: http://mathematicalgarden.wordpress.com/2008/12/21/the-problem-of-dido/, December 2008 [September 5, 2011].
[6] Luthy, P. “Two Cute Proofs of the Isoperimetric Inequality” Internet: http://cornellmath.wordpress.com/2008/05/16/two-cute-proofs-of-the-isoperimetric-inequality/, May 16, 2008 [September 5, 2011].
[7]Hoory, S., Linial, N., and Wigderson, A. “Expander Graphs and Their Applications” Bulletin of the American Mathematical Society 43 (4): 439-561, 2006. dpi: 10.1090/S0273-0979-06-01126-8.

Lagrangian Optimization

The Lagrangian multiplier is a method of mathematical optimization that provides a means of finding the maxima and minima of a mathematical function, f, subject to constraints, g. The contour map below shows an optimization problem. We want to consider the problem where we maximize f \left ( x, y \right ) subject to the constraints g \left ( x, y \right ) = c. The solution to this problem then is the point where the red line tangentially touches a blue contour. This answer can be found using Lagrangian multipliers.

Lagrangian Optimization

Contour Map where the red line shows the constraint g(x,y) = c and the blue times are contours of f(x,y).

We can construct a Lagrangian function as shown below. [1] We assume that function, f, is the problem for which an extremum of a function is desired:
f \left ( x_1, x_2, \ldots, x_n \right )

with a set of any number of constraint functions:
\begin{matrix}  g_i \left (x_1, x_2, \ldots, x_n \right ) = b_i, & i = 1,2,\ldots, m, & m < n  \end{matrix}

Then our Lagrangian function is defined as:
\Lambda \left ( x_1, x_2, \ldots, x_n,\lambda  \right ) = f\left ( x_1, x_2, \ldots, x_n \right ) + \sum_{i=1}^{m}\lambda _i\cdot \left ( b_i - g_i(x_1, x_2, \ldots, x_n) \right )

The Lagrangian multiplier can be used in any multidimensional space. The example below is given in [2].

A box is to be constructed out of various materials. The material to be used for the front and back sides costs $1 per square meter. The material to be used for the left and right sides costs $2 per square meter. The material to be used for the top and bottom costs $4 per square meter. What is the maximum volume that can be enclosed for a total material cost of $192?

We know the constraint, C. We should also know that the Volume of a cube (box) is V = LWH. We must find the constraint function…

Since we have a front and back side that cost $1 each, we have 2 planes that are length times width. We have a left and right side that cost $2 each at width time height. And lastly, we have a top and bottom side that cost $4 each at length times width:

2\left ( 1 L H \right ) + 2\left ( 2WH \right ) + 2\left ( 4LW \right ) = C \\  2LH+4WH+8LW - C = 0

We can then write our Lagrangian function:
\Lambda (L,W,H,\lambda ) = LWH + \lambda \left ( 2LH+4WH+8LW - C \right )

If we continue solving this out using partial derivatives and some algebraic techniques, as shown in [2], we eventually determine the maximal volume for the cost, C, of $192, is a Volume of 64 meters3. This is an example of how we can use the Lagrangian multiplier in a 3-dimensional space. The multiplier can work in any higher dimensional space as well.

What does the Lagrangian multiplier, λ, represent then? Well in the problems above, the Lagrangian multiplier represents a scalar value that allows us to change the size of the values used to scale the shape (circle in the contour map, and cube in the Volume problem). However, the Lagrangian multiplier can be used to represent actual values as well. [3] shares some examples of what the λ functions can represent in both physics and economics.

In the case of the physics of forces and potential energy, we might think of the constraint functions as representing forces that are helping to bring a point to it minimum or maximum. The Multiplier can then be thought of as a measure of how hard the constraint has to pull in order to balance our the forces on the constraint surface. In economics, we might think of maximizing profits based on some limited number of resources. The multiplier then represents the marginal value of the resource, or the rate at which the function changes if we change the constraints.

Therefore, understanding what the multiplier means takes an understanding of the problem that it is being applied. In some cases, it may only be a scaling value; in others, it may have a more significant meaning.

Impact on Computers

The Lagrangian Multiplier is useful in the field of optimal control theory. The multipliers can be interpreted as a cost ate variables. In Pontryagin’s maximum principle (or minimum principle), applied to deterministic problems, yields the same solution as many of the Bellman principles of Dynamic Programming, without the curse of dimensionality. The multiplier then works as a means of deriving the optimal minimum or maximum in control theory in a quicker means than Bellman’s equations for dynamic programming in higher dimensional spaces. [4]

Works Cited:

[1] Vapnyarkii, I.B., “Lagrange multipliers” in Encyclopaedia of Mathematics, New York: Springer, 2001.
[2] Hahn, K. Lagrange Multiplier Method for finding Optimums Internet: http://www.karlscalculus.org/pdf/lagrange.pdf [September 5, 2011]
[3] Jensen, S. An Introduction to Lagrange Multipliers Internet: http://www.slimy.com/~steuard/teaching/tutorials/Lagrange.html [September 5, 2011]
[4] Todorov, E. “Optimal Control Theory” in Bayesian Brain, Boston: MIT Press, 2006.

Propositional Logic: Basics

We can take these concepts from the Boolean Algebra chapter and conduct the same types of operations within propositional logic.  In Boolean Algebra, we said that variables could only result in true or false values.  Propositional Logic is similar, but we add another element and use the English language to define our propositions.  So we must first define what a proposition is. A proposition is a statement such as:  “It is sunny outside” or “2 + 3 = 5″.  Therefore, our propositions share one fundamental property.  The statements can either be true or false, but never both.

We see then that propositional logic is simply an extension of Boolean algebra where we are now using propositions to determine whether a variable is true or not.  We can now read Boolean algebra in almost English language by filling in our variable with propositions, or we can represent a English sentence based on Logic as Boolean algebra.  Let us take a look at how this might work:

Let s be “It is sunny” and let c be “It is cold”.

We can use this statement to write Boolean Algebra equations for English sentences:

  • It is cold and sunny outside:  c \wedge s
  • It isn’t true that it is sunny or not cold outside: \lnot \left ( s \vee \lnot c \right )

We can do the same with just english sentences where we haven’t yet defined our variables.  Let’s look:

She is tall and beautiful

First we must define our variables:

  • t is “She is tall” — since this is a proposition, we can define tall being some variable.
  • b is “She is beautiful” — another proposition since it is either a true or false statement.

Now we can write this statement in Boolean Algebra by saying:  t \wedge b

We can further extend Propositional Algebra by reading into sentences to determine truths and falses from sequences of statements.  Example:

  1. If I study I will pass my Computer Test
  2. I did not study

Will I pass the computer test? Well, lets write this in Boolean Algebra.

  • s is “study”
  • p is “pass the computer test”
s \rightarrow p \\ s \Leftrightarrow F \therefore p \Leftrightarrow F

Which is basically saying, s implies that I will pass, but since I didn’t study I will therefore fail the test.  (Note:  Any then statement generally refers to an implication (\rightarrow))

Boolean Algebra: Truth tables

The last set of notes discussed the symbology of Boolean Algebra and this post will discuss what those symbols actually mean.  We will ignore several of the symbols, including: Implication, Equivalence (IFF), and Exclusive OR (XOR).  So this post will focus on AND, OR, and NOT.  (A later one might be written to discuss the more complicated ones for those who are interested).

Let us first take a look at the AND statement.  In the AND statement, both variables of the statement must be true in order for the statement to resolve to a truth.  Any other combination of variables results in a false.  If we want to visualize an AND operator, we can use Venn Diagrams:

Venn Diagram for Boolean Algebra AND

Venn Diagram for Boolean Algebra AND operator

Seeing this, we know that the and is where both a and b meet.  If we look at this in a tabular format, we can build out a truth statement as:

a b a \wedge b
T T T
T F F
F T F
F F F

In the AND operator, we can see that any combination other than a T and T results in a F.

With the OR operator, we only need to have one truth to resolve the statement to a truth.  Only when both variables are  false do we get a false.  We can visualize the OR operator, we can again, use Venn Diagrams:

Venn Diagram for Boolean Algebra OR

Venn Diagram for Boolean Algebra OR operator

Again, we see that anything that is true (blue) means that we get a truth back. Here’s what the truth tables will look like:

a b a \wedge b
T T T
T F T
F T T
F F F

The NOT statement only requires a single variable.  The NOT operator switches the truth to a false and a false to a truth.  Basically anything outside of A becomes true, as shown in the Diagram below:

Venn Diagram for Boolean Algebra NOT operator

Venn Diagram for Boolean Algebra NOT operator

The truth table for the not would appear as:

A \lnot a
1 0
0 1

Building out Larger Truth tables

This post has shown truth tables for 2 variables. We noticed that in this case, we always had 4 rows. If we had 3 variables we would have 2^{3} = 8 rows, 4 variables would be 2^{4} = 16 rows. We notice then, that depending on the number of variables, we can build out truth tables with 2^n rows in the table. From this row, since we have only 2 possible values, true and false, half of the rows in each variable should be true and the other half false. We want every possible combination of the variables that we can make. The easiest way to achieve this is by alternating how many true and falses are being used in each column.

Here’s an example of a Boolean Algebra equation we might want to make a truth table for:
\left ( a \wedge b \right ) \vee \lnot \left ( c \vee d \right ) \wedge \left ( a \wedge b\right )

In this equation, we have 4 variables (a,b,c,d) so we have 16 rows in our truth table. We then want to break down our truth table so that we are only attempting one operation per column. If you get good at it, you might be able to skip some operations if you get good at boolean algebra (and your teacher doesn’t want you to show all of your work). Here’s how the truth table looks for the equation above.

a b c d a \wedge b c \vee d \lnot \left ( c \vee d\right ) \left ( a \wedge b \right ) \vee \lnot \left ( c \vee d \right ) \left ( a \wedge b \right ) \vee \lnot \left ( c \vee d \right ) \wedge \left ( a \wedge b\right )
T T T T T T F T T
T T T F T T F T T
T T F T T T F T T
T T F F T F T T T
T F T T F T F F F
T F T F F T F F F
T F F T F T F F F
T F F F F F T T F
F T T T F T F F F
F T T F F T F F F
F T F T F T F F F
F T F F F F T T F
F F T T F T F F F
F F T F F T F F F
F F F T F T F F F
F F F F F F T T F

Hopefully you see how this table works out. We simply break down the equation to smaller parts and calculate those based on which combination we are currently on in the row.

Boolean Algebra: Basics and Laws

Boolean Algebra, or Boolean Logic depending on the book you read, deals solely with the logic of true and false.  As such, there are very few things that are necessary to know in Boolean algebra.  The symbology is fairly straightforward:

English Name Symbol
AND \wedge
OR \vee
NOT \lnot
Exclusive OR \oplus
Implies \to
If and only If \Leftrightarrow

For the purpose of this class, our primary focus will be on the AND, OR, and NOT. I highly recommend reading a discrete mathematics book to determine what the XOR (Exclusive Or), Implication, and equivalence (if and only if [IFF]) mean in the context of Boolean algebra. The book will also go further to discuss Inferences and deduction using boolean logic. In the context of introductory programming Boolean Algebra represents compound logical statements for decision structures (if statements). We will extend this model slightly in a future post.

Now that we know the basic rules before we continue building truth tables lets look at some laws that we should remember to help us possibly logical statements. However, if your logic is sound, then the only real benefit to simplifying the statements is to optimize the way in which the computer interprets the decision. Here are some laws that might be useful:

Law Symbology
Associativity a \vee \left ( b \vee c\right ) = \left ( a \vee b\right ) \vee c
a \wedge \left ( b \wedge c\right ) = \left ( a \wedge b\right ) \wedge c
Commutativity a \vee b = b \vee a
a \wedge b = b \wedge a
Absorption a \vee \left ( a \wedge b\right ) = a
a \wedge \left ( a \vee b\right ) = a
Distributivity a \vee \left ( b \wedge c\right ) = \left (a \wedge b \right ) \vee \left ( a \wedge c \right )
a \wedge \left ( b \vee c\right ) = \left (a \vee b \right ) \wedge \left ( a \vee c \right )
Complements a \vee \lnot a = 1
a \wedge \lnot a = 0

We’ll end this post here and pick back up with Truth tables in the next post.

Concepts of Number Theory: Modular Arithmetic

Not everything in mathematics is basic arithmetic!  There are frequently times that we do what is known as modular arithmetic.  What day will it be 4 days from now?  What time will it be 13 hours from now?  What day will that package get to your house if it arrives 3-5 business days from the order date?  All of these are examples of modular arithmetic!  This post does assume that you understand basic arithmetic.

Since time is already a number, and we don’t have to associate words with numbers, we’ll focus on this example.  Let’s assume it is 4, what time is 13 hours away?  Most people will say 5, and depending on some other information you might have, you might include A.M. or P.M.  How do we reach this conclusion?  We know that time is cyclical, and that it is broken into 12 hour segments of time.  We can add the two numbers together and divide by 12, but we are only concerned with the remainder of the division.

4 + 13 = 17 = 1 \cdot 12 + 5

We don’t care about the 12s, so we simply say the time is 5 o’clock.  How many people could easily figure out what time it would be in 67 hours?  We can use this same equation to find the time is 11 o’clock.

4+67 = 71 = 5 \cdot 12 + 11

The math that we’re doing using this clock face concept is carried out modulo 12.  Using the week concept would be carried out modulo 7, while the work week would be carried out modulo 5.

In mathematics, the more common notation that is used when doing modular arithmetic is to avoid the use the equality sign (=) and use the equivalence sign (≡).  Therefore, we would write the equation above as:

4 + 67 \equiv 11 \mbox{(mod 12)}

and we say that the addition above is congruent to 11, and we display the base, or modulo, in paranthesis.

Now lets abstract this a little more.  What if we don’t have a 12 hour clock?  You might say, ok, we have a military time clock which uses 24 hours.  But what else is cyclical?  Perhaps business days in a week, of which we know there are 5 (assuming no holidays of course).  Days of the week is 7, Days in a year is 365 (assuming not a leap year of course).  So our modulus can be of any number.  Let’s say we have a 4 hour clock and it’s 2 right now.  What time would our clock read in 67 hours?

2 + 67 = 69 \div 4 = 17 \cdot 4 + 1 \equiv 1 \mbox{(mod 4)}

So we are left with the answer of 1 (congruent to 4).  To conclude, modular arithmetic starts first with basic arithmetic and ends with a division of the “length of the clock” — so to speak — to determine the remainder.

Concepts of Number Theory: Bases

Many people are familiar with Numbers, as you’ve been using them since you started school.  Much of the mathematics that are discussed during these deal with the continuous properties of numbers.  How many numbers exist between the number 3 and 4?  How many numbers exist between the numbers 3.5 and 3.6?  In both cases, there are an infinite number of possible numbers between them.  One property of real numbers is that between any two, we can always find another number.  In fact, Calculus depends on the continuous nature of the numbering line that you are likely familiar with.

But discrete nature of number representations are just as important as this continuous nature, and this is especially true in the field of computer science.  All of the representations that we are familiar with in mathematics deal with ten numbers (0, 1, 2, 3, 4, 5, 6, 7, 8, and 9).  If we want to represent any number on the number line, this can be done with those ten numbers.  But what happens when we take away eight of those numbers and are left with two numbers (0 and 1), or when we add six more numbers for a total of sixteen numbers (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F)?  No matter which notation we are using, we are still able to represent all numbers on that number line.  We will discuss primitive data types in a later class (an earlier blog post), and this concept will be shown in even more detail.

Since we understand that our representations of numbers are now discrete, let us focus our attention on what these new numbering systems, or bases, really mean.  I do assume in this blog post that you have a concept of adding numbers.  I also assume that everyone understands how to count to sixteen in this blog post.

We now know that there are ten numbers in our decimal number system (and I have avoided the writing of the number to try to avoid confusion).  So what does the number 10 really mean?  We know the number 10 subsequently follows the number 9 because our first grade teacher told us so, and it must therefore be true.  But what the number 10 means is that there is one part with 10 objects and zeros parts of one objects.  Let’s count!

Objects with a count of 24

Countable objects (24 total squares)

The above image has a countable number of objects, totaling 24 squares.  But what does this number 24 mean?  2 parts of 10, and 4 parts of ones.  Lets look at this counting a little more closely.  When we count, we can count the total number of groups of tens that we have.  This is the tens place on the number 24.  Basically, we have 2 groups of ten objects.  The remainder of the objects cannot be grouped together, so we count those separately in the singles place (4).

Counting in 10s

24 objects counted in 2 groups of 10 with 4 singletons

So if we look at this more visually, we see two groups of ten objects (assuming I counted right).  That accounts for 20 of the total number of objects in the picture.  We cannot group the last objects into a group of ten and therefore use the singles place to account for those numbers.  We have 4 objects leftover, therefore we have 24 total objects in the decimal number system, more commonly written in number theory as 24_{10}.

What happens if we change the number system?  Lets use the hexadecimal numbering system, or a base of 16.  In this case, we can conceptualize the number in very much the same way, except instead of stopping at ten, we count on to groups of 16 (hexadecimal).  Using the same number of blocks, what number of objects exist, for a person who thinks naturally in hexadecimal?  Let’s count!

Counting in Hexadecimal

Counting in Hexadecimal, we have 1 group of 16 and 8 single objects

We see now that the number that represents this is a lot different than using the decimal numbering system.  We have 18 objects with a base of sixteen, more commonly written 18_{16}.  So if we wanted all kinds of bases, we could easily represent them in this very visual way.  Lets count in binary now, which leaves us with representing all of the objects with 0s and 1s!

This gets a little nastier, and answers the question of what happens with the numbers as we get groups of groups, 100_{10}, 1000_{10}, etc.  In this case, we start grouping those groups together.  Unfortunately, this makes drawing a little nastier to visualize, so I have broken this down into 2 sections of the image.  The first represents the 24_{10} objects.  The second represents the 3rd and 4th groupings of the objects.

Counting in Binary

The top section represents the 24 (base 10) objects and the groupings there. The bottom portion represents the third and fourth groupings of the objects

In the above image, we can start interpreting the number representation of binary from left to right.  We have no single objects left over, after grouping the groups, we have no groups of 4 objects left over after regrouping, we have 1 group of 8 left after regrouping, and 1 group of 16 at the end of groupings.  Therefore, we can represent this binary number as 11000_{2}.  Simple right?  Now if we had to visualize this to convert numbers, this would be a daunting task, especially when we start discussing numbers that range in the millions or billions!

There are some pretty nice mathematical equations, and methods, that can be use to convert numbers using this same concept.  Let’s convert 543_{8} into a decimal number (base 10) using this concept.  We know that the first number represents a grouping of groups of 8, so the number of objects must be 8^2, we also know that the second number represents a group of 8 and the 3rd position represents single numbers.  Therefore, we can write the base ten number as:

543_{8} = 5\cdot 8^{2} + 4 \cdot 8^{1} + 3\cdot 8^{0}

Doing the math, we know the number is 355_{10}.  This can be done with numbers from any base to the decimal numbering system (base 10).  So we can remember then that the conversion can be written such that each place is multiplied into the base of the number to its power in the place of the number starting at 0.  (e.g.)

2423_{b} = 2\cdot b^{3} + 4 \cdot b^{2} + 2 \cdot b + 3

Converting from the decimal numbering system (base 10) to any other base can be done by consistently dividing by the base of the number.  So let’s convert 957_{10} to base 4.

957 \div 4 = 239 \mbox{ R } 1 \\  239 \div 4 = 59 \mbox{ R } 3 \\  59 \div 4 = 14 \mbox{ R } 3 \\  14 \div 4 = 3 \mbox{ R } 2 \\  3 \div 4 = 0 \mbox{ R } 3

The remainders of the numbers represent the number in base 4.  The last number represents the most significant digit in the number and the first number represents the singles digit.  Therefore, 957_{10} = 32331_{4}.  This same concept can be used for binary, octal, hexadecimal, etc.

Concepts of Set Theory: Membership, Cardinality, Equivalence, and Equality

Membership

In most instances, we are pretty capable of determining whether an object is a member of a set of not.  We know that April is an element of the set “Months of the Year”, simply because we know April as a month in the year.  We denote membership, or non-membership, in a set using the symbols ∈ and ∉ respectively.  We can then write that April is an element of “Months of the Year” mathematically as:

W = \left \{x | x\textrm{ is a month of the year} \right \} \\ April \in W

Using the other symbol, the statement would be false.  The statements above return a true value.  We can continue to give more examples where all of the statements below are true:

 1.03 \notin \mathbb{N} \\ 4 \in \mathbb{R} \\ -4 \notin \mathbb{N} \\ -5 \in \mathbb{Z}

With this, we can mathematically represent any elements which exist in a set, or do not exist in a set.  Determining whether they exist or not can be done by going over each element in the set until there are no more elements or you find the element for which you are searching.

Cardinality

The Cardinality of a set, is simply the number of unique elements for which a set contains. As an example, our months set would contain 12 elements (for all 12 months in the year) giving it a cardinal number, or cardinality or 12. We might initially think that the cardinality of the following set is also 12, but look closer at the definition:

U = \left \{ 1, 2, 3, 5, 5, 5, 6, 8, 8, 9, 10, 12 \right \}
In this example, we have several repeating values.  The cardinality of U might appear to be 12, if we just counted, but the cardinality deals with unique values in the set.  If we simply write the unique values in the set, we see that the cardinality of U is really 9.

U = \left \{ 1, 2, 3, 5, 6, 8, 9, 10, 12 \right \}

We denote the desire for the cardinal number by using two lines surrounding the variable name.  Therefore, we can write the cardinality of U mathematically as:

\left | U \right | = 9

You may notice that this is the same notation representing absolute values in algebraic/arithmetic math.  In the case of Set Theory, and Database Design, this notation means cardinality.

But what happens when the cardinality of a set is not a finite value?  For example, can we say that the cardinality of the set of natural numbers can be represented in a finite size? In the instances where the size of the set cannot be represented using a finite size, we simply call the sets infinite sets.

Equivalence

When discussing equivalence in set theory, we are only discussing the cardinalities of the sets. If we want to test for equivalence, we need to ensure that set A and set B contain the same number of elements. Below is an example of two sets that are equivalent:

A = \left \{x | x\textrm{ is a vowel} \right \} \\  B = \left \{x | x \in \mathbb{N} \textrm{ and } 3 \leq x \leq 7 \right \}

In this instance, the sets will contain the values:

A = \left \{a,e,i,o,u \right \} \\  B = \left \{3,4,5,6,7\right \}

These two sets are therefore equivalent, because the cardinalities of both A and B are 5.  Again, if we have repetition, we need to be aware of it and change the cardinalities accordingly.  If we change the set B to:

B = \left \{3,4,5,5,6\right \}

we no longer have equivalence between A and B, because B has a cardinality of 4.

What happens if we compare infinite sets?  Can we say that natural numbers and Real numbers are equivalent?  No, there’s an interesting thing that goes on when playing with infinite sets.  We know that there are infinitely more numbers in Real numbers than there are in Natural Numbers, simply because the real numbers considers all numbers in between each natural number.  So in the case of infinite sets, we run into different sized infinite sets.  (I won’t go into more detail about this, as in databases, we are always working with finite sets of tuples).

Equality

The last thing to discuss in this entry is Equality of sets. Equality of sets means that both sets contain the same elements, without regard to the ordering or repetitions of the elements within each set. For example, the sets:
A = \left \{ 1,3,4,5 \right \} \\  B = \left \{ 1,4,5,3 \right \}

are equal to one another because all of the values of set A appear in set B. The sets:

A = \left \{ 1,3,4,5 \right \} \\  B = \left \{ 1,4,5,3,3,4,5,1,1,1,5,4,3 \right \}

are also equal sets, because we ignore repetition for equality tests as well.

But what happens when we deal with a null set to set A?  Well, since none of the values in set A appear in the null set, the null set and set A are not equal.  Remember that ALL of the values in both sets have to appear in the other.

Concepts of Set Theory: Basics

Set Theory plays a major role in Computer and Computational Sciences.  When discussing the concepts of Database Design through the use of Relational Algebra, it is helpful to first have an understanding of the mathematics that make up Relational Algebra, First Order Logic, and Set Theory.  This series of Blog entries is designed to familiarize you with Set Theory from a very mathematical perspective.

The first question that comes to most peoples minds is what is a mathematical set?  Humans have a tendency to place objects into categories about the world.  If we think of the scientific concepts of classification of animals into various genus, families, or orders, we are thinking in sets.  Our friends make up a set of people that we know, we might also have a set for family members, or for coworker.  We can think of any number of sets of objects in the world.  A set can therefore be defined as a “collection of objects whose contents can be clearly determined”.  We can define each object in a set as an element of that particular set for which they are associated.

If you are familiar with programming then representing sets mathematically is very similar to the creation of Arrays or Lists in programming.  However, in mathematics, we generally use a Capital Letter to name each set.  We can use one of three methods to designate a set in mathematics.  The first is through the use of a word description.  An example of this might be “Months of the Year”.  We can also define the set using a more programmatic approach by writing the elements of the sets in braces, { }, separated by commas.  e.g.

 W = \left \{January, February, March, April, May, June, July, August, September, October, November, December \right \}

Use of other types of grouping symbols, paranthesis, ( ) or square brackets, [ ], are not generally used to represent sets in Mathematics.  We can also shorten the above example through the use of an ellipsis (you know those three dots that people use!).  If we shortened this list, we could rewrite the set as:

 W = \left \{January, February, March, \ldots, December\right \}

The third way of generating a set is by using set-builder notation.  e.g.

W = \left \{x | x \textrm{ is a month of the year} \right \}

We can read this notation as “Set W is the set of all elements x such that x is a month of the year”.  Therefore, before the line in the braces, we list the variable which represents an element in general.  Following the line, we write the condition that x must meet in order for the variable to be an element of the set.

What happens when a set contains no elements?  There are instances where we might create sets that have no elements in the set.  In this case, we have what is called an null set.  This can happen frequently by making a mistake, or in databases, returning a query that has no conditions met that we might be looking for.  In mathematics, a null set can be represented one of two ways.  The first is by using empty braces, { }.  Mathematics also has a null symbol or ∅.  In database theory you may also see the symbol ω as a representation for a null as well.

You should note that you cannot use these symbols together.  Using { ∅ } is not the same as a null set.  In fact what is written is saying that there is a set that contains the null element.

Sets you already know

In mathematics and programming, there are some sets that you might not realize that you have been using frequently.  For example, all of the natural numbers (numbers representing ordinary counting numbers) make up a mathematical set.

\mathbb{N} = \left \{ 1, 2, 3, \ldots \right \}

The table below shows all of the other sets you may have used before, or you may see written again. There are several others, but they are not used as frequently.

Prime Numbers \mathbb{P} = \left \{ 1, 2, 3, 5, 7, 11, \ldots \right \}
Natural Numbers \mathbb{N} = \left \{ 1, 2, 3, 4, 5, \ldots \right \}
Integer Numbers \mathbb{Z} = \left \{ \ldots, -2, -1, 0, 1, 2, \ldots \right \}
Real Numbers \mathbb{R} = \left \{ x | x \leq 0 \textrm{ or } x \geq 0 \right \}