May 21st, 2010 by Kristopher Reese.
No Comments »
Topic Modeling is a classic problem in Information Retrieval, and despite the extensive amount of research, the idea of clustering documents together under a broad range of topics is still relatively uncommon in Content Management Systems on the web. With algorithms such as Latent Dirichlet Allocation, Latent Sematic Analysis, and Gamma-Poisson, one would expect to see more of these data clustering algorithms implemented for finding potentially related documents in a system and even inferring the relevance that other documents might have to a document being currently viewed by a user.
One of the most robust algorithm for Topic Modeling is the Latent Dirichlet Allocation (LDA). It’s a statistical method which uses probabilistic principles to generate a list of topics for which a document can be associated with. The LDA model makes several assumptions about distributions on several of the parameters for the algorithm. These are [1]:
- Choose

- Choose

- For each N words
- Choose a topic

- Choose a word
from
, a multinomial probability conditioned on the topic
zn.

Mathematical Model for Latent Dirichlet Allocation
Knowing these assumptions, we can generate a mathematical model, shown in figure 1, using the hyperparameters, α and β, which we use to help to sample the distributions of θ and φ (which are both Dirichlet-Multinomial distributions), where θ is the distribution of topics in documents and φ is the distribution of topics for words. Ultimately, these parameters will be marginalized and are unnecessary to use, except to help us understand the model. By setting α and β arbitrarily, we simply make the assumption that the distributions θ and φ exist.
To calculate the distributions, we will use a Gibbs Sampling Algorithm to sample the z-topic for each of the words in the document. The Gibbs Sampler will probabilistically be determined using the following equations:
where
is the number of times the word in
is assigned to topic 
is the number of times topic
is used in document 
Using this model can ultimately be looked at as a decomposition of a document-word matrix into both a topic-word and topic-document matrix. It’s a powerful tool for information retrieval. I am currently working (in the little bit of spare time that I have) an automated clustering algorithm using LDA for WordPress blogs. When this is completed, it would automatically cluster related posts into their respective categories without the author needing to worry about posting the entry in a category. This could later be expanded into a full blown data clustering and relation engine which would use some unsupervised bayesian principles in inferring the relevance of potential documents to current documents.
May 11th, 2010 by Kristopher Reese.
No Comments »
The PDF below is a discussion that I had presented to Dr. Roman Yampolskiy‘s CyberSecurity Lab at the University of Louisville. Though this topic has little to nothing to do with cyber-security, it raised a lot of interesting questions and provided insightful suggestions from the audience. Two questions that were raised during the presentation but was unable to answer at the time will be answered in this blog entry. The second question goes into brief detail on how the Markov Decision Process (MDP) works.
What additions to Computer Science and Computational Music does you work bring to the fields?
Yes, much of my work is not new to this field. I am a Masters student and my thesis does not need to provide conclusively new work to the field. However, despite this, a portion of my work is new to both Computer Science and Music Theory in general. My work on the chord progression algorithm using Markov Decision Processes would help to solidify Dr. Dmitri Tymoczko’s recent development of Geometric Music theory (more on his website: http://www.music.princeton.edu/~dmitri/).
It does this by showing that the decision making process that composers go through can be replicated through complex decision making algorithms such as the Markov Decision Process discussed in this presentation. I have also seen very little research in the use of MDPs in algorithmic music generation.
Why did you decide to use Markov Chains for your thesis? This has been tried and was moved away from because it wasn’t robust enough to capture chord progressions.
I think you are confusing the Markov Decision Processes. The Markov Decision Process has very little to do with Markov Chains. A Markov Decision Process is a mathematical framework for decision making in situations which are partially stochastic (random) and partially under the control of a decision maker.
If you are familiar with Chord Progressions, you know that there is an ultimate goal in mind but in most music the way that you reach that goal is not explicitly defined; there are exceptions in Blues and Rock where the I-iv-V and I-ii-V progressions prevail. Since Dr. Tymoczko’s geometric model for chord progressions captures the implicit definition of movement of chords in the progression, we can weight certain chords as a goal that we want to reach and use MDPs to decide on the best path that would maximize utility in the model. By doing this, we leave a bit of the randomness in the chord progressions, keep the implicit definition of chords through Dr. Tymoczko’s model, and allow the decision maker (the computer) the ability to decide where to go given its current location.
This is different from Markov Chains, where we would define a limited set of actions and make probabilistic movements to one or the other leaving most of the model as stochastic. This would be similar to Xenakis’ work in stochastic music, which is respectable in its own right. However, I am attempting something far different than stochastic music. My hope is that I will be able to define tonality in chord progressions as both a complex decision process and through Dr. Tymoczko’s work in Geometric Music Theory.
May 9th, 2010 by Kristopher Reese.
No Comments »
Here’s another paper that I worked on this last semester at UofL for my Parallel Programming class. I took the A* Heuristic Search Algorithm and attempted to pipeline the algorithm using OpenMPI in C++. The results of the experiments and comparisons were conducted on the Cardinal Research Cluster.
The paper discusses the implementation in a fairly straightforward manner. The implementation is ultimately not very cost effective, but the implementation taught me a lot about Parallel techniques for Pipelining. It does not include any sort of decomposition of the data. Since the Algorithm uses a priority queue, we can simply pop items off the front of the queue and pass this information to slave processes which will do most of the heavy lifting.
One thing to especially take away from this paper is a method for polling on processes. The sample below is an example of how you would poll on processes, written in C++:
if(process_id == 0)
{
bool keep_alive = true;
while (keep_alive)
{
if (MPI::COMM_WORLD.Iprobe(source_id, message_tag))
{
// Main Code Here
}
}
}
The main loop here keeps the process alive during the life of the program. This is necessary since MPI will kill the process once it reaches the end of the code. The next logic statement uses the MPI_Iprobe function which is a nonblocking test for a message. It takes two parameters, the source id from the message and the numeric tag of the message that we are looking for in the test.
One issue that this leads to is the processes staying alive forever, since we are essentially looping infinitely over the process. We can use this MPI_Iprobe function to kill the process when we send a message from the master node that will tell the program that it has reached an end. We do this with the MPI_Send function, simply sending a message, testing for it, and ending the loop when the process receives the message.
Below are the paper and the powerpoint for the project. Source code will be given in a later post.

- Title: Pipelining A star searching
Caption: Parallel Programming - UofL
Description: The presentation given for the Parallel Programming class on the implementation of the A* search algorithm.
File: Pipelining-A-star-searching.pptx

- Title: Pipelining the A* Heuristic Search Algorithm
Caption: Parallel Programming - UofL
Description: A paper discussing the implementation of the A* Heuristic Search Algorithm. It uses a Master-Slave model in the implementation and polls on each of the slave processes to gather information. The mathematics for the Speedup and Efficiency of the parallel algorithm.
File: final_paper.pdf
May 8th, 2010 by Kristopher Reese.
No Comments »
During this last semester at UofL, I took a Comutational Cognitive Science class. It focused on topics in Machine Learning and how we can use psychological ideologies in Machine Learning. Near the end of the semester we had to complete a project on a topic of interest to us. I chose an addition to the Latent Dirichlet Allocation (LDA) to attempt to capture temporal shifts in topics.
Though most topics remain stationary over an infinitely long period of time, there are examples where topics and words within topics change. For example, a medical topic in the 18th century might include keywords such as bleeding, leeching, etc. while the topic today might include cancer, medicines, or other such topics. We could capture these as two separate topics using LDA, but if we wanted to capture this as a single topic that changed over time we need to modify LDA. This paper proposes an addition to the LDA model that captures a complete shift in the keywords of topics.
You can also find the source code for my Temporal LDA model, written in MATLAB. In order to use this, you need the Topic Modeling Toolbox which can be found at The University of California Irvine Cognitive Science Research Website. This is a free download for scientific use. Once you get the Topic Modeling Toolbox up and running, you can simply extract the zip files into the same folder as the toolbox.
When this is set up, you can run the following commands in MATLAB.
testData
[Sa, Sb] = TLDA(WS, DS, WO, 2, 1, 0.01, 50)
testData will randomly generate 2 topics and 30 documents with a split at about document number 10 where the topics have a complete change in keywords for each topic. testData returns 3 vectors, WS, DS & WO. WS is the words in the documents. This vector matches up with DS which shows which document DS(i) the word in WS(i) is associated with. The vector WO lists the words in order, where the value WS(i) is the word found in WO(WS(i)).
The next line is the actual call to the Temporal Latent Dirichlet Allocation Model. This takes in 7 parameters. The first three are the values that are returned from the testData call, WS, DS, & WO, in that order. The next parameter is the total number of topics. Parameter five and six are the alpha and Beta hyperparameters respectively. The final parameter is the total number of iterations to run the TLDA Gibbs Sampler.
I hope to get a chance to implement the Latent Dirichlet Allocation into generic PHP classes and implement a version for automatic classification and clustering of wordpress topics. I’ll keep the blog updated on the progress.

- Title: Temporal Topic Model
Caption: Extension to the Topic Modeling Toolbox
Description: A Temporal Latent Dirichlet Allocation extension to the Topic Modeling Toolbox. Code is written in the MATLAB Programming Language.
File: TLDA.zip

- Title: Towards a Temporal Latent Dirichlet Allocation
Caption: Computational Cognitive Sciences - UofL
Description: Though most topics remain stationary over an infinitely long period of time, there are examples where topics and words within topics change. Latent Dirichlet Allocation (LDA) is used to capture information about topic models given a known number of Topics. This algorithm does not capture information about topics, which may temporally change. The proposed model in this paper attempts to modify the existing LDA model to allow it to capture temporal changes in topics while allowing those models that do not change to remain through infinite time. This is achieved by adding a variable to the existing model, K, and using this variable to calculate the probability of a change in the topic given the hyperparameters, two topics, and the words that make up the topics.
File: TLDA.pdf

- Title: Towards a Temporal Latent Dirichlet Allocation Presentation
Caption: Computational Cognitive Science - UofL
Description: Presentation given for the Computational Cognitive Science class on the paper "Towards a Temporal Latent Dirichlet Allocation".
File: TLDA.pptx
March 27th, 2010 by Kristopher Reese.
No Comments »
During my research for my thesis on Algorithmically Generated Tonal Music, I was fortunate to run across the research from McGill University, called “The Euclidean Algorithm Generates Traditional Musical Rhythms“, that uses a modified version of the Euclidean Algorithm called the Bjorklund algorithm. The Euclidean Algorithm is one of the oldest algorithms in existence and was proposed by Euclid in his “Elements” Books VII and X. This algorithm is used to find the Greatest Common Divisor of two numbers.
The Euclidean Algorithm is relatively simple to program. Below is an implementation of the Euclidean Algorithm in Java.
public int Euclid(int m, int k)
{
if(k==0)
return m;
else
return this.Euclid(k, m%k);
}
The Bjorklund algorithm uses a similar concept to the Euclidean algorithm, but is used to distribute the zeros in a binary set evenly. The simplest, visual way of thinking about the Bjorklund can be described as thinking of a binary set as columns. The example below shows a Euclidean set of (4, 6) which we can describe as four 1s and six 0s. We therefore can create the initial binary set of “1111000000″. From here we choose the smallest of the two numbers and move that number of zeros at the end of the set to the first 1 to n columns. We update the numbers using the Euclidean Algorithm and continue this step until we are left with 0 or 1 column left, then we concatenate the remaining columns into a new set. A visual example of the (4,6) version of the Bjorklund algorithm is found below.
1111000000
111100
0000
1111
0000
00
11
00
00
11
00
1001010010
This algorithm was originally used with the operation of the components on the spallation neutron source (SNS) accelerators in nuclear physics. However, if we say that every one is associated with an accented note and the zeroes are unaccented, or rests, we can use this to generate the general rhythms and various world music rhythms. For this reason, the Bjorklund algorithm is a unique and powerful algorithm in any music generator. Below is an example Bjorklund algorithm implemented in Java:
import java.util.*;
public class Rhythm
{
private Vector<Boolean> rhythm = new Vector<Boolean>();
private Vector<Integer> colSizes = new Vector<Integer>();
public Rhythm(int accented, int total)
{
boolean bool = true;
for(int i=0;i<total;i++)
{
if(i>=accented) bool = false;
this.rhythm.addElement(bool);
this.colSizes.addElement(1);
}
this.Bjorklund(total, accented);
}
public int Bjorklund(int m, int k)
{
if(k==0)
return m;
else if(k==1)
return this.Bjorklund(k, m%k);
else
{
int location=0;
int searcher = (m-k < k) ? m-k: k;
for(int i=0;i<searcher;i++)
{
int newColSize = this.colSizes.elementAt(i) + this.colSizes.lastElement();
int oldLastSize = this.colSizes.lastElement();
location += newColSize;
this.colSizes.remove(i);
this.colSizes.remove(this.colSizes.size()-1);
this.colSizes.insertElementAt(newColSize, i);
for(int j=0;j < oldLastSize;j++)
{
this.rhythm.insertElementAt(this.rhythm.lastElement(), location-oldLastSize);
this.rhythm.remove(this.rhythm.size()-1);
}
}
return this.Bjorklund(k, m%k);
}
}
public Vector<Boolean> getRhythm()
{
return this.rhythm;
}
}
February 22nd, 2010 by Kristopher Reese.
No Comments »
Statistical Distributions of data are an important aspect in both analyzing resulting data and in generating random numbers based on a specified distribution. This post will discuss a handful of Statistical Distributions that are common in Discrete-Event Simulations. It will discuss the Uniform Distribution, the Triangular Distribution, the Binomial/Bernoulli Distribution, the Poisson Distribution, and the Exponential Distribution. The two most important aspects of these distributions for the purposes of this discussion are the Probability Mass Function and the Cumulative Distribution Function. Each distributions PDF and CDF will be discussed further here.
Before getting started, below is a table of equations for each of the distributions for finding the mean, median, mode, and variance of a specific distribution:
| Distribution |
Mean |
Median |
Mode |
Variance |
| Uniform |
 |
 |
N/A |
 |
| Triangular |
 |
 |
c |
 |
| Binomial |
np |
 |
 |
 |
| Exponential |
 |
 |
0 |
 |
| Poisson |
λ |
 |
 |
λ |
The Uniform Distribution exists in both the discrete and the continuous spaces. For this discussion however, we will strictly discuss the Discrete version. In a discrete uniform distribution, we can generate the Probability Mass Function with the equation:

where:

Using the same parameters, we can calculate the Cumulative Distribution Function using the equation:

Using the PDF and CDF we can plot a set of points onto graphs which would look similar to Figure 1a & 1b respectively (with the same parameters):

(a)

(b)
A plot of the Discrete Uniform (a) Probability Mass Function and (b) Cumulative Distribution Function. a = 1, b = 5. Graph generated with MATLAB
The Triangular Distribution can be used in situations which a normal distribution might be needed, but when we want to restricted the distribution to a set of bounds. A triangular distribution has three parameters a lower limit a, an upper limit b, and a mode c. These parameters are defined as:

With these parameters we can find the PMF and CDF to be:

Plotting these functions, will result in graphs that appear similar to Figure 2. Figure 2 images were taken from the wikipedia article. These two images are distributed under the Creative Commons Attribution ShareAlike 3.0 License.

(a)

(b)
A plot of a Continuous Triangular Distribution (a) Probability Mass Function and (b) Cumulative Distribution Function.
Binomial Distribution is a distribution of the number of sequences of trials in an experiment which contains only two possible outcomes (success/failure, true/false, etc.). This distribution is associated with Bernoulli Trials and when parameter n = 1, we can call this distribution a Bernoulli Distribution. The Binomial Distribution has three parameters:
![n \in \mathbb{N}\text { - number of trials} \\ p \in [0,1] \\ k \in \left \{ 0, \ldots, n \right \} n \in \mathbb{N}\text { - number of trials} \\ p \in [0,1] \\ k \in \left \{ 0, \ldots, n \right \}](http://s.wordpress.com/latex.php?latex=n%20%5Cin%20%5Cmathbb%7BN%7D%5Ctext%20%7B%20-%20number%20of%20trials%7D%20%20%5C%5C%20p%20%5Cin%20%5B0%2C1%5D%20%5C%5C%20k%20%5Cin%20%5Cleft%20%5C%7B%200%2C%20%5Cldots%2C%20n%20%5Cright%20%5C%7D&bg=ffffff&fg=000000&s=0)
We can find the PMF and the CDF to be:

we define
to be the binomial coefficient which we can define as:

The resulting PMF and CDF graphs might look like (with the same parameters) Figure 3a & Figure 3b respectively:

(a)

(b)
A plot of the Discrete Binomial distribution (a) Probability Mass Function and (b) Cumulative Distribution Function. n = 5, p = 0.5
Exponential Distribution is a continuous probability distribution that is frequently used in Simulations. Though it is considered a class of continuous distributions, discretizing the distribution does not take a lot of effort. The exponential function is always bounded between [0,∞) and has one parameter:

Using this parameter, we can solve the PMF and CDF for any exponential distribution:

Plotting these functions will result in graphs that look like (with the same parameters):

(a)

(b)
A plot of a Continuous Exponential Distribution (a) Probability Mass Function and (b) Cumulative Distribution Function. λ = 0.4
Poisson Distribution will be the last distribution discussed in this entry. This distribution can be used to express a probability of a number of events occurring within a fixed period of time, assuming that these events occur with an average rate and the events are independently from other events. The Poisson Distribution functions have two parameters:

The PMF and CDF are, respectively:

It is interesting to note that as λ moves towards sufficiently large values, a normal distribution with a mean and variance of λ and a standard deviation of
will decently approximate the Poisson distribution. If we plot the PDF and CDF for the Poisson Distribution, we get a graph that would look like (with the same parameters):

(a)

(b)
A plot of the the Discrete Poisson Distribution (a) Probability Mass Function and (b) Cumulative Distribution Function. λ = 4
These are only a handful of possible distributions that could result from simulation outcomes, but these are probably the most frequently encountered distributions. More distributions will be touched on in future posts. If you have one you’d like me to discuss or go into more detail about, leave a comment!
February 21st, 2010 by Kristopher Reese.
No Comments »
This post will discuss various mathematical formulae that are used in Event Based Simulation for very simple analysis of the data from the simulation. For this post lets assume the following example simulation results:
| Customer |
Inter-arrival Time |
Arrival Time |
Service Time |
Begin Service |
Wait Time |
End Service |
Time in System |
Server Idle |
| 1 |
- |
0 |
3.3 |
0 |
0 |
3.3 |
3.3 |
0 |
| 2 |
5.1 |
5.1 |
4.5 |
5.1 |
0 |
9.5 |
4.5 |
1.8 |
| 3 |
3.9 |
9.0 |
3.2 |
9.6 |
0.6 |
12.8 |
3.8 |
0 |
| 4 |
4.5 |
13.5 |
4.8 |
13.5 |
0 |
18.3 |
4.8 |
0.7 |
| 5 |
4.4 |
17.9 |
4.9 |
18.3 |
0.4 |
23.2 |
5.3 |
0 |
| 6 |
5.6 |
23.5 |
4.8 |
23.5 |
0 |
28.3 |
4.8 |
0.3 |
| 7 |
4.1 |
27.6 |
3.1 |
28.3 |
0.7 |
31.4 |
3.8 |
0 |
| 8 |
4.2 |
31.8 |
3.2 |
31.8 |
0 |
35 |
3.2 |
0.4 |
Since we have this data, we can do a very simple analysis of the resulting data. For the purposes of this post, I will present the mean service time, the mean inter-arrival time, the mean wait time for all customers and for just those customers who waited, server utilization, and the mean number of customers in the system.
Mean Service Time. The following equation can be used for determining the mean service time of the customers.

For our given example, we get:

Mean Inter-Arrival Time. The following equation can be used for determining the mean inter-arrival time of the customers.

For our example, we would get:

Mean Wait Time. We have two instances of the mean wait time to consider. The first is the mean wait time for all of the customers. In this case, we can find the mean wait time with:

For our example the mean wait time of all customers is:

In contrast, the mean wait time for only the customers who waited is:

Server Utilization. Server utilization is the percent of time in which the server was doing some form of work, or not idle. For this, we want to find the mean server idle time and subtract this number from 1. This will give us a value less than 1 which is the percent of utilization of the server. In otherwords:

For our example, we get:

Mean Number of Customers in System. Unlike the other equations, which are relatively trivial, the Mean number of Customers in the system is slightly trickier. However, we can still compute this using a relatively simple equation. First we want to define λ to be the mean interarrival rate and μ to be the mean service rate. With this, we can find the Traffic Intensity:

With ρ we can find the Mean number of Customers in the system with the following equation:
![E[n]=\frac {\rho}{(1-\rho)} E[n]=\frac {\rho}{(1-\rho)}](http://s.wordpress.com/latex.php?latex=E%5Bn%5D%3D%5Cfrac%20%7B%5Crho%7D%7B%281-%5Crho%29%7D&bg=ffffff&fg=000000&s=0)
Therefore for our example, we can define ρ as and then solve the Mean Number of Customers in the System:

![E[n]=\frac {1.14}{(1-1.14)} = \frac {1.14}{(0.14)} \approx 8 E[n]=\frac {1.14}{(1-1.14)} = \frac {1.14}{(0.14)} \approx 8](http://s.wordpress.com/latex.php?latex=E%5Bn%5D%3D%5Cfrac%20%7B1.14%7D%7B%281-1.14%29%7D%20%3D%20%5Cfrac%20%7B1.14%7D%7B%280.14%29%7D%20%5Capprox%208&bg=ffffff&fg=000000&s=0)
These equations are essential to know for simple statistical analysis of the data received by a simulation. There are much more complex equations within queuing theory that require the knowledge of these equations. So these are worthwhile equations to have memorized if you plan to do more complex statistical analysis of data.
February 21st, 2010 by Kristopher Reese.
No Comments »
Simulations are an important tool in computer scientists’ arsenal. It allows a scientist to statistically analyze a designed experiment with only a fraction of the cost of implementing a final system.
In simulations of systems, be it computer performance analysis or protein folding, there are two major classes of simulations: Discrete-Event Simulations and Continuous Simulations. A Continuous Simulation is a simulation in which the state variables changes continuously over the designated time that the simulation is run. In contrast, a Discrete-Event Simulation is a simulation in which the state variable of the system change only at discrete points in time.
The majority of the posts about Simulations on this site will discuss Discrete-Event Simulations, though many of the equations and statistical concepts could be used with analysis of Continuous Simulations with only minor changes. The rest of this post will outline the advantages and disadvantages of simulation, as well as a list of common mistakes in simulation.
Advantages/Disadvantages
Before listing some of the advantages and Disadvantages, we must first know when simulation is appropriate and when simulation is likely to not be appropriate to a situation. Though Simulations are a powerful tool, in many situations a common sense solution can be reached or in some cases can often by solved analytically using formulae related to the model. In some cases, we may run into a lack of data for the model, in which case the model may be difficult to simulate. We may also run into a few rare cases of the cost/benefit of the simulation to hold no significant value. However, simulations can often be applied to a multitude of situations including: The study of complex systems or subsystems, the knowledge that would be gained by the simulation holds significant value, to study the effects of changing inputs, to determine possible system requirements (for example, software), to train humans or intelligent machines, visualization of systems, and of course many other topic. The following lists are shortened versions from [1]
Advantages. A simulation can lead to discovery of new models or systems. Some of the advantages that using simulation is to:
- Explore new options without disrupting the current operation of a system.
- It has significantly lower cost in comparison to implementing a final system before testing.
- We can simulate a larger amount of time in a shorter period using simulations, known as time-warping.
- It allows us the study the interactions or importance of variables in the system
- It allows us to study various “what if…” conditions
- and, it helps scientists or engineers to develop a model that helps understand the system better.
Disadvatages. Despite the significant advantages, there are a handful of disadvantages that should be considered before implementing a simulation for your needs.
- Implementing a simulation takes time and often this results in a company spending more money to hire the necessary designers for the simulation.
- Simulation is an art. The results of a simulation are highly variable, often based on the skill of the designer.
- Analysis of simulation results can often be difficult unless programmed correctly. This again relates to Simulation being an art.
Common Mistakes in Simulation
This section discusses some of the aspects that have to be overcome and what makes Simulation more of an art. The list below is a simplified version of the list of common mistakes in [2]
- Inappropriate Level of Detail: Without the proper level of detail within the simulation could result in the failure of the simulation or inaccurate results in comparison to the model being tested. However in order to do a proper analysis of the data for the model, we have to make some simplifications and assumptions. The difficulty lies in capturing the proper level of detail while simplifying the model well enough to analyze the data received from the simulation.
- Improper Language: A simulation can be designed in almost any programming language, however choosing the proper programming language has a significant impact on the development time of the model. Many languages can cut development times significantly, and some can even make analysis of the data significantly easier for the developer.
- Unverified Models: Simulation models are often implemented with large programs. The developer has to take special care to ensure that there are no bugs or programming errors which might invalidate the results of the Simulation.
- Invalid Models: Incorrect of assumptions about the system could result in invalidated data even with a flawless program.
- Improperly Handled Initial Conditions: Including the initial portion of the simulation could result in non-representative data of the system. To avoid this, one should give the simulation the required burn-in period for the data to converge on data that is representative of the system.
- Too Short Simulations: Often times, the Analyst will try to save time by running the system within a shorter period of time. Related to item 5, this can result in data that is non-representative of the systems data.
- Poor Random-Number Generators: Random Numbers are an important aspect of simulations. It is often better to use a well known generator, however in some cases, the generators may not work for your simulation. Generating random numbers will be discussed in a future post.
- Improper Selection of Seeds: Seeding the random number generator is important to maintain independence among the streams in the simulation. In many cases, using the same seed in the streams may result in correlation among each of the system processes which might result in non-representative data.
References
[1] J.Banks, J.Carson, B. Nelson, and D. Nicol Discrete-Event System Simulation New Jersey: Prentice Hall, 2005.
[2] R.Jain, The Art of Computer Systems Performance Analysis New York: Wiley, 1991.
January 18th, 2010 by Kristopher Reese.
No Comments »
After spending a majority of the last few months of 2009 working on updating my site to make it easier to update, I finally caved in and decided to use WordPress instead of programming my own site. It took me a while to cave in simply because of the enjoyment I get from programming my site. But by using WordPress, I saved time with coding so that I can focus on schoolwork and content on the site. I look forward to finally getting more time to write on the blog rather than having to program new information.
Using WordPress has its advantages! Here’s a list of plugins that I’ve used on this version of the site:
- Akismet (Spam) – created by Matt Mullenweg
- Clean-contact (Contact Form) – created by Monkey Tree Labs
- Delete-Revision (Revision Management) – created by gohsy
- EG-Attachements (Document Library) – created by Emmanuel GEORJON
- Google XML Sitemaps (Site Map Generator) – created by Arne Brachhold
- ICS Calendar (Events Calendar) – created by Daniel Olfelt
- Multi-level Navigation Plugin (Main Navigation) – created by PixoPoint Web Development / Ryan Hellyer
- Recent Posts – created by Nick Momrik
- Sticky Menu (Secondary Navigation) – created by ericdes
- WP-Cache (Page Caching) – created by Ricardo Galli Granada
- WP-DBManger (Database Management) – created by Lester ‘GaMerZ’ Chan
- WP-Syntax (Code Syntax Highlighting) – created by Ryan McGeary
- WP Photo Album (Photo Albums) – created by Rubin J. Kaplan
Despite this, there are a few plugins that I had to create myself. This includes plugins for editing a Curriculum Vitae and creating document repositories. The theme itself was created by me specifically for kReese.net. As these plugins and themes become more stable, I will link to these through my website.