Complex Data Type

Another primitive data type that is implemented in a handful of languages is the complex data type which can be used to represent complex numbers in mathematics.  Complex numbers are numbers in mathematics which contain a real and imaginary part.  If you recall the imaginary part occurs on any square root of a negative number.  Complex (or imaginary) numbers take on the form a+bi where i represents \sqrt{-1}.

In languages that implement this, the complex number is often translated by the compiler as a series of two individual floating point numbers representing a and b.  These languages which implement complex numbers often implement the basic operators (addition, subtraction, multiplication, and division) needed to do mathematics with the complex numbers.

This is good to know about in the rare case that you may need the data type, but we will not be using this data type in this class.

Character Data Type

Character Data Types are the most primitive data type used for writing.  Multiple characters are combined to make up words, sentences, paragraphs, and everything in any language.  Characters represent any symbol which may be important to someone writing, whether they are mathematical symbols, punctuation, letters, numbers, or even the hidden characters that you don’t think of, such as the newline or tab characters.

Characters are always stored in some form of encoding.  This simply means that the symbols that are displayed on the screen are represented using an integer value and then mapped to the defined encoding of the document or program.  The simplest form of encoding used on the computer is ASCII (American Standard Code for Information Interchange).  This standard uses 8 bits [technically 7 with a parity bit is desired] to store character data.  Therefore, the maximum number of characters that can be represented using this information was 2^{7} or 128 characters, enough to represent any symbol necessary in the English Language.

This remained the most popular encoding used on the web until December 2007 when UTF-8 surpassed ASCII.  UTF-8 is now the most popular character encoding used on computers.  UTF (UCS [Universal Character Set] Transformation Format) was designed to be backwards compatible with ASCII by keeping the original ASCII characters as the first 128 of the Unicode character set.  UTF in its current form (UTF-32) uses exactly 32 bits to represent characters, allowing it the ability to encode 4,294,967,296 characters.  This encoding is enough to represent characters in most languages and mathematics.

Because UTF encoding is so big, for this class when we do anything with characters we will assume ASCII.  An ASCII table can be found at <http://www.asciitable.com/>.

Because characters are encoded by storing integers, basic arithmetic can be done using character data types in most programming languages.  Therefore, if we have the character “A” and the character “?”, we can do addition of the characters by looking at the ASCII table:

 [A] + [?] = 128

In this case, if we attempted to store this as a different character in ASCII, we would get an overflow error.  If however we store it as an integer, we would receive 128 as the number.  It is rare that you will do this when programming, but because of the way it is encoded, you should be aware in case you move between languages that have varying support for this and receive an answer you were not expecting.

So how do we store whole words?

In strings, but strings are an abstract data type which we’ll discuss later in class.  In languages such as C, you create strings simply as arrays of characters.  This will make more sense as we get further into the book.

Boolean Data Type

Boolean Data Types are the most primitive type that exists on the computer.  The boolean type consist of 2 states stored in a bit representing true and false.  The type was initially used to represent truth values of logic in boolean algebra, which was designed by George Boole in the 19th century.

Many languages today contain support boolean algebra, in some form, by implementing operators such as conjunction (AND), disjunction (OR), equivalence (==), exclusive or (XOR), and not (!).  Many of these languages, such as Java and C/C++, contain an explicit boolean data type which can be used with any of the operators.  However, the explicit data type is not necessary for use of these operators.  Since all primitive data types are stored using a binary number system, any of the data types can be used with the boolean (bitwise) operators.  A boolean value can best be thought of as a single binary bit which can store only two states.  In most languages true is mapped to a 1 and false is mapped to a 0.

These boolean operators and boolean data types are important for understanding decision structures which will be discussed later in class.  In many languages, boolean types and boolean algebra make up the majority of the testing done in while, for, if, etc. statements.  However, many of these statements in most languages regard any non-zero value as true.

Floating Point & Decimal Data Types

Another of the most popular Data Types used is the Floating Point data type which can be used to model real numbers on the computer. However, due to hardware limitations, floating point numbers can only approximate most real numbers. This includes the fundamental numbers of mathematics, such as π and e. Floating point numbers are limited to a finite area of space on the computer. However, in most cases, the floating point representation is accurate enough to make most of the complex computations needed to solve a problem.

The floating point data type is represented on modern computers using the IEEE Floating Point Standard 754. This standard uses a fractional portion as well as an exponent, structuring the representation after scientific notation. The first bit is always a sign bit, representing whether the number is positive or negative. The image below shows the IEEE floating point standard using single and double precision:

IEEE 754 Floating Point images

Visual Representation of the IEEE 754 Floating Point Decimal numbers in both (A) single precision and (B) double precision.

One term used above, precision, is a term that you should be familiar with when discussing floating point numbers. Precision is the accuracy of the fractional part of the value, which is calculated by counting the number of bits used for the fractional portion. The image above shows a single precision floating point (23 bit precision) and double precision floating point (52 bit precision). It is obvious from the precision that the double precision would be more accurate than the single precision. Interpreting the above image, it can be better described how the floating point is stored by looking closer at how it is related to scientific notation. If we break down the floating point, we can think of the number that we can store as the equation below.

(-1)^{S}(mantessa_{10})\times 2^{exponent_{10}}

Knowing this, if we convert the fractional portion and exponent portion into their respective decimal forms and use them in the equation above, we can understand how to store a decimal number in the computer. We use the sign bit to determine whether that number is positive or negative. Also notice that because the floating point representation uses a base of 2. This is done to save some storage space as representing base 10 is harder and requires specialized devices. Because of this, there is loss in the accuracy of the floating point number as well.

Converting the mantessa, or the fractional portion of the floating point number is not something that is completely obvious right away.  Unlike the exponent portion which can be converted directly to decimal numbers, the mantessa is representative of fractional numbers.  This means that instead of the typical conversion, we must think about them slightly differently.  If we have a single precision mantessa of 11000000000000000000000, we work left to right and add fractions as below:

M=1+\frac{1}{2}+\frac{1}{4}+\frac{0}{8}+\frac{0}{16}+\cdots+\frac{0}{8388608}

Therefore, to convert the mantessa, we can use a simple mathematical equation for conversion.  The variable n in this equation refers to the size of the mantessa, 23 bits in single precision, 52 bits in double precision, etc.

1 + \sum_{i=0}^{n-1}\frac{b_{n-i}}{2^{i}}

You may notice that there is an addition of a 1 when converting the mantessa.  This is due to the fact that floating point numbers are normalized in the same way as numbers in scientific notation.  This means that a single number must be placed to the left of the number, and since there is only one binary digit which is not zero we add this number to the left, which is equal to 2^{0} = 1.  Therefore we always add 1 to the mantessa.

The decimal floating point number is similar to the floating point, but instead of using the base 2 number system, it uses a base 10 number system.  The conversion is slightly different and beyond the scope of this class.  The decimal data type is also defined in the IEEE-754 2008 standard, and what you should know about the decimal number system is that this data type is meant to be more accurate than the floating point numbers described above.  You may notice that the fractional portions may not be able to represent every real number, the decimal type alleviates these issues.

For more information on Floating Points and Decimal data types, the following resources may be useful:

[1] Sebesta, Robert W.  “Concepts of Programming Languages”, Boston: Addison Wesley, 2008.  ISBN:978-0-321-49362-0

[2] Stallings, William.  “Computer Organization and Architecture”, Boston: Prentice Hall, 2010.  ISBN: 978-0-13-607373-4

[3] IEEE Standard.  <http://standards.ieee.org/findstds/standard/754-2008.html>  (Must have IEEE Xplorer access.)

Integer Data Type

Integers are perhaps the most commonly used data type in programming, and even the integer data type is often supported in different sizes by various programming languages.  For example, the Java programming language breaks down this information into four types of signed integers:  byte, short, int, and long.  Other languages, such as C based languages, include the unsigned integer type.  Unsigned integers are exactly that, they do not contain a positive or negative sign, while a signed integer type does.

To further explain the types of integers supported, a byte is an 8 bit binary string that allows for 2^{8}-1 unsigned integers to be stored in it, or if the integer is signed it can store between the range of -2^{7} to 2^{7}-1.  A short is a 16 bit binary string that can also be either signed or unsigned.  The Integer type, int is a standard integer which is represented using a 32 bit string.  And lastly, a long type is twice that of an integer, or a 64 bit binary string.  Conversion between the integers is easy to do.  Conversion backwards, for example int to short could result in errors unless the integer that stored in the int type is small enough to be represented using a smaller number of bits.  However, conversion to a larger integer type can be done without any errors at all.  In fact, if integer arithmetic operations result in values that are too large for the original type, they will be stored as larger integer types.

The maximum number of integers than an n-bit integer can store can be determined using simple math:

Unsigned max: 2^{n} - 1
Signed min: -2^{n-1}
max: 2^{n-1} - 1

Most of these integer types are supported directly by the hardware, however some languages exist that have integer types which are not directly supported by hardware. [1]  One example is the Python’s long integer type.  A value stored in Python’s long integer type can have an unlimited length.  This is achieved by specifying the numbers as string literals, such as:  2456785357321546L.

The last necessary information to understand the integer data type is the differences between unsigned and signed integers, and how they are stored.  An unsigned integer is by far the easier of the two to explain.  An unsigned integer is a number that represents a natural number (although it is stored as a positive number).  This is stored in binary form as described above.  The conversion of a natural number to its binary form should have been discussed in class.

Signed integers pose different challenges, and because of this, several methodologies exist to add a signed bit to a natural number.  The simplest for us to understand is the Sign-Magnitude representation.  Sign-magnitude representation simply uses the most significant bit (first bit) to store whether the number is positive or negative.  The following bits are used to represent the number as it would in an unsigned integer.  This method has many drawback including unnecessary computation to calculate addition and subtraction, due to the need of both the sign bit and the magnitude of the natural number to do the arithmetic.  It also becomes more difficult to test for a zero in a problem, as zero is simply a magnitude, and by using this method both signs must be tested.

There is a method which alleviates these issues by using an entirely different methodology, known as “Two’s Complement” representation.  This methodology, though harder for an average human to understand, allows the computer to do basic arithmetic without extra computations and since there is only a single zero value, simplifies any tests for zero.  In this representation, the most significant bit is once again used to represent the sign (positive being 0 and negative being 1).  We begin by counting normally, as with an unsigned integer.  However, when the most significant bit switches to negative, we begin counting the magnitude of the number backwards.  So the magnitude is increment by 1 initially, and that number is then made negative.  All subsequent numbers are decremented as the binary digit become larger.  Therefore if we use a nibble (4 bits) to count, our counting would go as follows:

Decimal Binary
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
-8 1000
-7 1001
-6 1010
-5 1011
-4 1100
-3 1101
-2 1110
-1 1111

Try playing around with some binary arithmetic to see how easy using two’s complement is with addition and how difficult it would be for a computer to do binary arithmetic using the Sign-Magnitude representation.

For further reading on the Integer Data Type, I would recommend (especially the latter):

[1] Sebesta, Robert W.  “Concepts of Programming Languages”, Boston: Addison Wesley, 2008.  ISBN:978-0-321-49362-0

[2] Stallings, William.  “Computer Organization and Architecture”, Boston: Prentice Hall, 2010.  ISBN: 978-0-13-607373-4