Towards a Temporal Topic Model

May 8th, 2010 by Kristopher Reese

During this last semester at UofL, I took a Comutational Cognitive Science class.  It focused on topics in Machine Learning and how we can use psychological ideologies in Machine Learning.  Near the end of the semester we had to complete a project on a topic of interest to us.  I chose an addition to the Latent Dirichlet Allocation (LDA) to attempt to capture temporal shifts in topics.

Though most topics remain stationary over an infinitely long period of time, there are examples where topics and words within topics change. For example, a medical topic in the 18th century might include keywords such as bleeding, leeching, etc. while the topic today might include cancer, medicines, or other such topics.   We could capture these as two separate topics using LDA, but if we wanted to capture this as a single topic that changed over time we need to modify LDA.  This paper proposes an addition to the LDA model that captures a complete shift in the keywords of topics.

You can also find the source code for my Temporal LDA model, written in MATLAB.  In order to use this, you need the Topic Modeling Toolbox which can be found at The University of California Irvine Cognitive Science Research Website.  This is a free download for scientific use.  Once you get the Topic Modeling Toolbox up and running, you can simply extract the zip files into the same folder as the toolbox.

When this is set up, you can run the following commands in MATLAB.

testData
[Sa, Sb] = TLDA(WS, DS, WO, 2, 1, 0.01, 50)

testData will randomly generate 2 topics and 30 documents with a split at about document number 10 where the topics have a complete change in keywords for each topic.  testData returns 3 vectors, WS, DS & WO.  WS is the words in the documents.  This vector matches up with DS which shows which document DS(i) the word in WS(i) is associated with.  The vector WO lists the words in order, where the value WS(i) is the word found in WO(WS(i)).

The next line is the actual call to the Temporal Latent Dirichlet Allocation Model.  This takes in 7 parameters.  The first three are the values that are returned from the testData call, WS, DS, & WO, in that order.  The next parameter is the total number of topics.  Parameter five and six are the alpha and Beta hyperparameters respectively.  The final parameter is the total number of iterations to run the TLDA Gibbs Sampler.

I hope to get a chance to implement the Latent Dirichlet Allocation into generic PHP classes and implement a version for automatic classification and clustering of wordpress topics.  I’ll keep the blog updated on the progress.

Title: Temporal Topic Model
Caption: Extension to the Topic Modeling Toolbox
Description: A Temporal Latent Dirichlet Allocation extension to the Topic Modeling Toolbox. Code is written in the MATLAB Programming Language.
File: TLDA.zip
Title: Towards a Temporal Latent Dirichlet Allocation
Caption: Computational Cognitive Sciences - UofL
Description: Though most topics remain stationary over an infinitely long period of time, there are examples where topics and words within topics change. Latent Dirichlet Allocation (LDA) is used to capture information about topic models given a known number of Topics. This algorithm does not capture information about topics, which may temporally change. The proposed model in this paper attempts to modify the existing LDA model to allow it to capture temporal changes in topics while allowing those models that do not change to remain through infinite time. This is achieved by adding a variable to the existing model, K, and using this variable to calculate the probability of a change in the topic given the hyperparameters, two topics, and the words that make up the topics.
File: TLDA.pdf
Title: Towards a Temporal Latent Dirichlet Allocation Presentation
Caption: Computational Cognitive Science - UofL
Description: Presentation given for the Computational Cognitive Science class on the paper "Towards a Temporal Latent Dirichlet Allocation".
File: TLDA.pptx

Leave a Reply