Tutorial10_Topic Models

In this tutorial we’ll learn about K-Means and topic models of two different types, the regular vanilla LDA version, and structural topic models.

K-Means

Introduction

K-means clustering is one of the simplest and popular unsupervised machine learning algorithms. The objective of K-means is: group similar data points together and discover underlying patterns. To achieve this objective, K-means looks for a fixed number (k) of clusters in a dataset.

In this tutorial, we are going to cluster a dataset consisting of health news tweets. These short sentences belong to one of the 16 sources of news considered in the dataset. We are then facing a multi-label classifying problem, with k = 16.

truth.K <- 16

Front-end Matters

First, let’s load the tm package.

library(tm)
Loading required package: NLP

We download the data from the UCI Machine Learning Repository.

# creating the empty dataset with the formatted columns
dataframe <- data.frame(ID = character(),
                        datetime = character(),
                        content = character(),
                        label = factor())
source.url <- 'https://archive.ics.uci.edu/ml/machine-learning-databases/00438/Health-News-Tweets.zip'

target.directory <- '/tmp/clustering-r'
temporary.file <- tempfile()
download.file(source.url, temporary.file)
unzip(temporary.file, exdir = target.directory)

# Reading the files
target.directory <- paste(target.directory, 'Health-Tweets', sep="/")
files <- list.files(path = target.directory, pattern = '.txt$')

# filling the dataframe by reading the text content
for (f in files){
  news.filename = paste(target.directory, f, sep = "/")
  news.label <- substr(f, 0, nchar(f) - 4) # removing the 4 last characters (.txt)
  news.data <- read.csv(news.filename,
                        encoding = "UTF-8",
                        header = FALSE,
                        quote = "",
                        sep = "|",
                        col.names = c("ID", "datetime", "content"))
  
  # Trick to ignore last part of tweets which content contains the split character "|"
  # no satisfying solution has been found to split and merging extra-columns with the last one
  news.data <- news.data[news.data$content != "", ]
  news.data['label'] = news.label # we add the label of the tweet
  
  # only considering a little portion of data
  # because handling sparse matrix for generic usage is a pain
  news.data <- head(news.data, floor(nrow(news.data) * 0.05))
  dataframe <- rbind(dataframe, news.data)
  
}
# deleting the temporary directory
unlink(target.directory, recursive = TRUE)

Preprocessing

Removing urls in the tweets

dataframe$content <- iconv(dataframe$content, from = "latin1", to = "UTF-8", sub = "")

sentences <- sub("http://([[:alnum:]|[:punct:]])+", '', dataframe$content)
head(sentences)
[1] "Breast cancer risk test devised "     
[2] "GP workload harming care - BMA poll " 
[3] "Short people's 'heart risk greater' " 
[4] "New approach against HIV 'promising' "
[5] "Coalition 'undermined NHS' - doctors "
[6] "Review of case against NHS manager "  

For common preprocessing problems, we are going to use tm package.

corpus <- tm::Corpus(tm::VectorSource(sentences))
# cleaning up
# handling utf-8 encoding problem from the dataset
corpus.cleaned <- tm::tm_map(corpus, function(x) iconv(x, to = 'UTF-8-MAC', sub = 'byte'))
Warning in tm_map.SimpleCorpus(corpus, function(x) iconv(x, to = "UTF-8-MAC", :
transformation drops documents
corpus.cleaned <- tm::tm_map(corpus.cleaned, tm::removeWords, tm::stopwords('english'))
Warning in tm_map.SimpleCorpus(corpus.cleaned, tm::removeWords,
tm::stopwords("english")): transformation drops documents
corpus.cleaned <- tm::tm_map(corpus.cleaned, tm::stripWhitespace)
Warning in tm_map.SimpleCorpus(corpus.cleaned, tm::stripWhitespace):
transformation drops documents

Text Representation

Now, we have a sequence of cleaned sentences that we can use to build our TF-IDF matrix. From this result, we will be able to execute every numerical processes that we want, such as clustering.

# Building the feature matrices
tfm <- tm::DocumentTermMatrix(corpus.cleaned)
dim(tfm)
[1] 3159 9416
tfm
<<DocumentTermMatrix (documents: 3159, terms: 9416)>>
Non-/sparse entries: 26434/29718710
Sparsity           : 100%
Maximal term length: 62
Weighting          : term frequency (tf)
tfm.tfidf <- tm::weightTfIdf(tfm)
dim(tfm.tfidf)
[1] 3159 9416
tfm.tfidf
<<DocumentTermMatrix (documents: 3159, terms: 9416)>>
Non-/sparse entries: 26434/29718710
Sparsity           : 100%
Maximal term length: 62
Weighting          : term frequency - inverse document frequency (normalized) (tf-idf)
# we remove a lot of features. 
tfm.tfidf <- tm::removeSparseTerms(tfm.tfidf, 0.999) # (data,allowed sparsity)
tfidf.matrix <- as.matrix(tfm.tfidf)
dim(tfidf.matrix)
[1] 3159 1327
# cosine distance matrix (useful for specific clustering algorithms)
dist.matrix = proxy::dist(tfidf.matrix, method = "cosine")

Running the clustering algorithms

K-means

Define clusters so that the total within-cluster variation is minimized.

Note

Hartigan-Wong algorithm (Hartigan and Wong 1979) defines the total within-cluster variation as the sum of squared Euclidean distances between items and the corresponding centroid:

\(W(C_{k}) = \sum_{x_{i} \in C_{k}}(x_{i} - \mu_{k})^{2}\)

  • \(x_{i}\): a data point belonging to the cluster \(C_{k}\)
  • \(\mu_{k}\): the mean value of the points assigned to the cluster \(C_{k}\)

Total within-cluster variation as follows:

total withinness = \(\sum^{k}_{k=1}W(C_{k}) = \sum^{k}_{k=1} \sum_{x_{i} \in C_{k}} (x_{i} - \mu_{k})^{2}\)

The total within-cluster sum of square measures the goodness of the clustering and we want it to be as small as possible.

clustering.kmeans <- kmeans(tfidf.matrix, truth.K)
names(clustering.kmeans)
[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
[6] "betweenss"    "size"         "iter"         "ifault"      

Hierarchical clustering

Define a clustering criterion and the pointwise distance matrix. Let’s use the Ward’s methods as the clustering criterion.

clustering.hierarchical <- hclust(dist.matrix, method = "ward.D2")
names(clustering.hierarchical)
[1] "merge"       "height"      "order"       "labels"      "method"     
[6] "call"        "dist.method"

Plotting

To plot the clustering results, as our feature spaces is highly dimensional (TF-IDF representation), we will reduce it to 2 thanks to multi-dimensional scaling. This technique is dependent of our distance metric, but in our case with TF-IDF.

points <- cmdscale(dist.matrix, k = 2) # running the PCA 
palette <- colorspace::diverge_hcl(truth.K) # creating a color palette
previous.par <- par(mfrow = c(1,2))# partitioning the plot space

master.cluster <- clustering.kmeans$cluster
plot(points,
     main = 'K-Means clustering',
     col = as.factor(master.cluster),
     mai = c(0, 0, 0, 0),
     mar = c(0, 0, 0, 0),
     xaxt = 'n', yaxt = 'n',
     xlab = '', ylab = '')

slave.hierarchical <- cutree(clustering.hierarchical, k = truth.K)
plot(points,
     main = 'Hierarchical clustering',
     col = as.factor(slave.hierarchical),
     mai = c(0, 0, 0, 0),
     mar = c(0, 0, 0, 0),
     xaxt = 'n', yaxt = 'n',
     xlab = '', ylab = '')

par(previous.par) # recovering the original plot space parameters

Determining K

In the previous example, we know sentences belong to one of the 16 sources. Then how to decide the best number of clusters (K)?

Here we use the “eblow” method. For each given number of clusters, we can calculate how much variance in the data can be explained by the clustering. Typically, this will increase with the number of clusters. However, the increase would slow down at a certain point and that’s where we choose the number of clusters.

k <- 16
varper <- NULL
for(i in 1:k){
  clustering.kmeans2 <- kmeans(tfidf.matrix, i)
  varper <- c(varper, as.numeric(clustering.kmeans2$betweenss)/as.numeric(clustering.kmeans2$totss))
}

varper
 [1] 4.757368e-12 5.562061e-03 7.625786e-03 2.237929e-02 2.446568e-02
 [6] 3.140732e-02 3.334774e-02 3.149158e-02 3.973052e-02 3.971028e-02
[11] 3.869794e-02 4.446518e-02 4.170454e-02 4.770083e-02 4.848910e-02
[16] 5.173870e-02
plot(1:k, varper, xlab = "# of clusters", ylab = "explained variance")

From the plot, after 3 clusters, the increase in the explained variance becomes slower - there is an elbow here. Therefore, we might use 3 clusters here.

Topic Models

Introduction

The general idea with topic models is to identify the topics that characterize a set of documents. The background on this is interesting; a lot of the initial interest came from digital humanities and library science where you had the need to systematically organize the massive thematic content of the huge collections of texts. Importantly, LDA and STM, the two we’ll discuss this week, are both mixed-membership models, meaning documents are characterized as arising from a distribution over topics, rather than coming from a single topic.

Latent Dirichlet Allocation

For LDA, we will be using the text2vec package. It is an R package that provides an efficient framework for text analysis and NLP. It’s a fast implementation of word embedding models (which is where it gets it’s name from) but it also has really nice and fast functionality for LDA.

Algorithms may classify topics within a text set, and Latent Dirichlet Allocation (LDA) is one of the most popular algorithms for topic modeling. LDA uses two basic principles:

  1. Each document is made up of topics.
  2. Each word in a document can be attributed to a topic.

Let’s begin!

Front-end Matter

First, let’s load the text2vec package:

library(text2vec)

We will be using the built in movie reviews dataset that comes with the package. It is labeled and can be called as “movie_review”. Let’s load it in:

# Load in built-in dataset
data("movie_review")

# Prints first ten rows of the dtaset:
head(movie_review, 10)
        id sentiment
1   5814_8         1
2   2381_9         1
3   7759_3         0
4   3630_4         0
5   9495_8         1
6   8196_8         1
7   7166_2         0
8  10633_1         0
9    319_1         0
10 8713_10         1
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        review
1                                                                                                                                               With all this stuff going down at the moment with MJ i've started listening to his music, watching the odd documentary here and there, watched The Wiz and watched Moonwalker again. Maybe i just want to get a certain insight into this guy who i thought was really cool in the eighties just to maybe make up my mind whether he is guilty or innocent. Moonwalker is part biography, part feature film which i remember going to see at the cinema when it was originally released. Some of it has subtle messages about MJ's feeling towards the press and also the obvious message of drugs are bad m'kay.<br /><br />Visually impressive but of course this is all about Michael Jackson so unless you remotely like MJ in anyway then you are going to hate this and find it boring. Some may call MJ an egotist for consenting to the making of this movie BUT MJ and most of his fans would say that he made it for the fans which if true is really nice of him.<br /><br />The actual feature film bit when it finally starts is only on for 20 minutes or so excluding the Smooth Criminal sequence and Joe Pesci is convincing as a psychopathic all powerful drug lord. Why he wants MJ dead so bad is beyond me. Because MJ overheard his plans? Nah, Joe Pesci's character ranted that he wanted people to know it is he who is supplying drugs etc so i dunno, maybe he just hates MJ's music.<br /><br />Lots of cool things in this like MJ turning into a car and a robot and the whole Speed Demon sequence. Also, the director must have had the patience of a saint when it came to filming the kiddy Bad sequence as usually directors hate working with one kid let alone a whole bunch of them performing a complex dance scene.<br /><br />Bottom line, this movie is for people who like MJ on one level or another (which i think is most people). If not, then stay away. It does try and give off a wholesome message and ironically MJ's bestest buddy in this movie is a girl! Michael Jackson is truly one of the most talented people ever to grace this planet but is he guilty? Well, with all the attention i've gave this subject....hmmm well i don't know because people can be different behind closed doors, i know this for a fact. He is either an extremely nice but stupid guy or one of the most sickest liars. I hope he is not the latter.
2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   \\"The Classic War of the Worlds\\" by Timothy Hines is a very entertaining film that obviously goes to great effort and lengths to faithfully recreate H. G. Wells' classic book. Mr. Hines succeeds in doing so. I, and those who watched his film with me, appreciated the fact that it was not the standard, predictable Hollywood fare that comes out every year, e.g. the Spielberg version with Tom Cruise that had only the slightest resemblance to the book. Obviously, everyone looks for different things in a movie. Those who envision themselves as amateur \\"critics\\" look only to criticize everything they can. Others rate a movie on more important bases,like being entertained, which is why most people never agree with the \\"critics\\". We enjoyed the effort Mr. Hines put into being faithful to H.G. Wells' classic novel, and we found it to be very entertaining. This made it easy to overlook what the \\"critics\\" perceive to be its shortcomings.
3  The film starts with a manager (Nicholas Bell) giving welcome investors (Robert Carradine) to Primal Park . A secret project mutating a primal animal using fossilized DNA, like Jurassik Park, and some scientists resurrect one of nature's most fearsome predators, the Sabretooth tiger or Smilodon . Scientific ambition turns deadly, however, and when the high voltage fence is opened the creature escape and begins savagely stalking its prey - the human visitors , tourists and scientific.Meanwhile some youngsters enter in the restricted area of the security center and are attacked by a pack of large pre-historical animals which are deadlier and bigger . In addition , a security agent (Stacy Haiduk) and her mate (Brian Wimmer) fight hardly against the carnivorous Smilodons. The Sabretooths, themselves , of course, are the real star stars and they are astounding terrifyingly though not convincing. The giant animals savagely are stalking its prey and the group run afoul and fight against one nature's most fearsome predators. Furthermore a third Sabretooth more dangerous and slow stalks its victims.<br /><br />The movie delivers the goods with lots of blood and gore as beheading, hair-raising chills,full of scares when the Sabretooths appear with mediocre special effects.The story provides exciting and stirring entertainment but it results to be quite boring .The giant animals are majority made by computer generator and seem totally lousy .Middling performances though the players reacting appropriately to becoming food.Actors give vigorously physical performances dodging the beasts ,running,bound and leaps or dangling over walls . And it packs a ridiculous final deadly scene. No for small kids by realistic,gory and violent attack scenes . Other films about Sabretooths or Smilodon are the following : Sabretooth(2002)by James R Hickox with Vanessa Angel, David Keith and John Rhys Davies and the much better 10.000 BC(2006) by Roland Emmerich with with Steven Strait, Cliff Curtis and Camilla Belle. This motion picture filled with bloody moments is badly directed by George Miller and with no originality because takes too many elements from previous films. Miller is an Australian director usually working for television (Tidal wave, Journey to the center of the earth, and many others) and occasionally for cinema ( The man from Snowy river, Zeus and Roxanne,Robinson Crusoe ). Rating : Below average, bottom of barrel.
4                                                                                                                                                                                                  It must be assumed that those who praised this film (\\"the greatest filmed opera ever,\\" didn't I read somewhere?) either don't care for opera, don't care for Wagner, or don't care about anything except their desire to appear Cultured. Either as a representation of Wagner's swan-song, or as a movie, this strikes me as an unmitigated disaster, with a leaden reading of the score matched to a tricksy, lugubrious realisation of the text.<br /><br />It's questionable that people with ideas as to what an opera (or, for that matter, a play, especially one by Shakespeare) is \\"about\\" should be allowed anywhere near a theatre or film studio; Syberberg, very fashionably, but without the smallest justification from Wagner's text, decided that Parsifal is \\"about\\" bisexual integration, so that the title character, in the latter stages, transmutes into a kind of beatnik babe, though one who continues to sing high tenor -- few if any of the actors in the film are the singers, and we get a double dose of Armin Jordan, the conductor, who is seen as the face (but not heard as the voice) of Amfortas, and also appears monstrously in double exposure as a kind of Batonzilla or Conductor Who Ate Monsalvat during the playing of the Good Friday music -- in which, by the way, the transcendant loveliness of nature is represented by a scattering of shopworn and flaccid crocuses stuck in ill-laid turf, an expedient which baffles me. In the theatre we sometimes have to piece out such imperfections with our thoughts, but I can't think why Syberberg couldn't splice in, for Parsifal and Gurnemanz, mountain pasture as lush as was provided for Julie Andrews in Sound of Music...<br /><br />The sound is hard to endure, the high voices and the trumpets in particular possessing an aural glare that adds another sort of fatigue to our impatience with the uninspired conducting and paralytic unfolding of the ritual. Someone in another review mentioned the 1951 Bayreuth recording, and Knappertsbusch, though his tempi are often very slow, had what Jordan altogether lacks, a sense of pulse, a feeling for the ebb and flow of the music -- and, after half a century, the orchestral sound in that set, in modern pressings, is still superior to this film.
5                                                                                                                                                                                                                  Superbly trashy and wondrously unpretentious 80's exploitation, hooray! The pre-credits opening sequences somewhat give the false impression that we're dealing with a serious and harrowing drama, but you need not fear because barely ten minutes later we're up until our necks in nonsensical chainsaw battles, rough fist-fights, lurid dialogs and gratuitous nudity! Bo and Ingrid are two orphaned siblings with an unusually close and even slightly perverted relationship. Can you imagine playfully ripping off the towel that covers your sister's naked body and then stare at her unshaven genitals for several whole minutes? Well, Bo does that to his sister and, judging by her dubbed laughter, she doesn't mind at all. Sick, dude! Anyway, as kids they fled from Russia with their parents, but nasty soldiers brutally slaughtered mommy and daddy. A friendly smuggler took custody over them, however, and even raised and trained Bo and Ingrid into expert smugglers. When the actual plot lifts off, 20 years later, they're facing their ultimate quest as the mythical and incredibly valuable White Fire diamond is coincidentally found in a mine. Very few things in life ever made as little sense as the plot and narrative structure of \\"White Fire\\", but it sure is a lot of fun to watch. Most of the time you have no clue who's beating up who or for what cause (and I bet the actors understood even less) but whatever! The violence is magnificently grotesque and every single plot twist is pleasingly retarded. The script goes totally bonkers beyond repair when suddenly  and I won't reveal for what reason  Bo needs a replacement for Ingrid and Fred Williamson enters the scene with a big cigar in his mouth and his sleazy black fingers all over the local prostitutes. Bo's principal opponent is an Italian chick with big breasts but a hideous accent, the preposterous but catchy theme song plays at least a dozen times throughout the film, there's the obligatory \\"we're-falling-in-love\\" montage and loads of other attractions! My God, what a brilliant experience. The original French title translates itself as \\"Life to Survive\\", which is uniquely appropriate because it makes just as much sense as the rest of the movie: None!
6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    I dont know why people think this is such a bad movie. Its got a pretty good plot, some good action, and the change of location for Harry does not hurt either. Sure some of its offensive and gratuitous but this is not the only movie like that. Eastwood is in good form as Dirty Harry, and I liked Pat Hingle in this movie as the small town cop. If you liked DIRTY HARRY, then you should see this one, its a lot better than THE DEAD POOL. 4/5
7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                This movie could have been very good, but comes up way short. Cheesy special effects and so-so acting. I could have looked past that if the story wasn't so lousy. If there was more of a background story, it would have been better. The plot centers around an evil Druid witch who is linked to this woman who gets migraines. The movie drags on and on and never clearly explains anything, it just keeps plodding on. Christopher Walken has a part, but it is completely senseless, as is most of the movie. This movie had potential, but it looks like some really bad made for TV movie. I would avoid this movie.
8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     I watched this video at a friend's house. I'm glad I did not waste money buying this one. The video cover has a scene from the 1975 movie Capricorn One. The movie starts out with several clips of rocket blow-ups, most not related to manned flight. Sibrel's smoking gun is a short video clip of the astronauts preparing a video broadcast. He edits in his own voice-over instead of letting us listen to what the crew had to say. The video curiously ends with a showing of the Zapruder film. His claims about radiation, shielding, star photography, and others lead me to believe is he extremely ignorant or has some sort of ax to grind against NASA, the astronauts, or American in general. His science is bad, and so is this video.
9                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           A friend of mine bought this film for 1, and even then it was grossly overpriced. Despite featuring big names such as Adam Sandler, Billy Bob Thornton and the incredibly talented Burt Young, this film was about as funny as taking a chisel and hammering it straight through your earhole. It uses tired, bottom of the barrel comedic techniques - consistently breaking the fourth wall as Sandler talks to the audience, and seemingly pointless montages of 'hot girls'.<br /><br />Adam Sandler plays a waiter on a cruise ship who wants to make it as a successful comedian in order to become successful with women. When the ship's resident comedian - the shamelessly named 'Dickie' due to his unfathomable success with the opposite gender - is presumed lost at sea, Sandler's character Shecker gets his big break. Dickie is not dead, he's rather locked in the bathroom, presumably sea sick.<br /><br />Perhaps from his mouth he just vomited the worst film of all time.
10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         <br /><br />This movie is full of references. Like \\"Mad Max II\\", \\"The wild one\\" and many others. The ladybugs face its a clear reference (or tribute) to Peter Lorre. This movie is a masterpiece. Well talk much more about in the future.
# checking dimensions of dataset
dim(movie_review)
[1] 5000    3

The dataset consists of 5000 movie reviews, each of which is marked as positive (1) or negative (0) in the ‘sentiment’ column.

Now, we need to clean the data up a bit. To make our lives easier and limit the amount of processing power, let’s use the first 3000 reviews. They are located in the ‘review’ column.

Vectorization

Texts can take up a lot of memory themselves, but vectorized texts typically do not. To represent documents in vector space, we first have to come to create mappings from terms to term IDs. We call them terms instead of words because they can be arbitrary n-grams not just single words. We represent a set of documents as a sparse matrix, where each row corresponds to a document and each column corresponds to a term. This can be done in two ways: using the vocabulary itself or by feature hashing.

Let’s perform tokenization and lowercase each token:

# creates string of combined lowercased words
tokens <- tolower(movie_review$review[1:3000])

# performs tokenization
tokens <- word_tokenizer(tokens)

# prints first two tokenized rows
head(tokens, 2)
[[1]]
  [1] "with"         "all"          "this"         "stuff"        "going"       
  [6] "down"         "at"           "the"          "moment"       "with"        
 [11] "mj"           "i've"         "started"      "listening"    "to"          
 [16] "his"          "music"        "watching"     "the"          "odd"         
 [21] "documentary"  "here"         "and"          "there"        "watched"     
 [26] "the"          "wiz"          "and"          "watched"      "moonwalker"  
 [31] "again"        "maybe"        "i"            "just"         "want"        
 [36] "to"           "get"          "a"            "certain"      "insight"     
 [41] "into"         "this"         "guy"          "who"          "i"           
 [46] "thought"      "was"          "really"       "cool"         "in"          
 [51] "the"          "eighties"     "just"         "to"           "maybe"       
 [56] "make"         "up"           "my"           "mind"         "whether"     
 [61] "he"           "is"           "guilty"       "or"           "innocent"    
 [66] "moonwalker"   "is"           "part"         "biography"    "part"        
 [71] "feature"      "film"         "which"        "i"            "remember"    
 [76] "going"        "to"           "see"          "at"           "the"         
 [81] "cinema"       "when"         "it"           "was"          "originally"  
 [86] "released"     "some"         "of"           "it"           "has"         
 [91] "subtle"       "messages"     "about"        "mj's"         "feeling"     
 [96] "towards"      "the"          "press"        "and"          "also"        
[101] "the"          "obvious"      "message"      "of"           "drugs"       
[106] "are"          "bad"          "m'kay"        "br"           "br"          
[111] "visually"     "impressive"   "but"          "of"           "course"      
[116] "this"         "is"           "all"          "about"        "michael"     
[121] "jackson"      "so"           "unless"       "you"          "remotely"    
[126] "like"         "mj"           "in"           "anyway"       "then"        
[131] "you"          "are"          "going"        "to"           "hate"        
[136] "this"         "and"          "find"         "it"           "boring"      
[141] "some"         "may"          "call"         "mj"           "an"          
[146] "egotist"      "for"          "consenting"   "to"           "the"         
[151] "making"       "of"           "this"         "movie"        "but"         
[156] "mj"           "and"          "most"         "of"           "his"         
[161] "fans"         "would"        "say"          "that"         "he"          
[166] "made"         "it"           "for"          "the"          "fans"        
[171] "which"        "if"           "true"         "is"           "really"      
[176] "nice"         "of"           "him"          "br"           "br"          
[181] "the"          "actual"       "feature"      "film"         "bit"         
[186] "when"         "it"           "finally"      "starts"       "is"          
[191] "only"         "on"           "for"          "20"           "minutes"     
[196] "or"           "so"           "excluding"    "the"          "smooth"      
[201] "criminal"     "sequence"     "and"          "joe"          "pesci"       
[206] "is"           "convincing"   "as"           "a"            "psychopathic"
[211] "all"          "powerful"     "drug"         "lord"         "why"         
[216] "he"           "wants"        "mj"           "dead"         "so"          
[221] "bad"          "is"           "beyond"       "me"           "because"     
[226] "mj"           "overheard"    "his"          "plans"        "nah"         
[231] "joe"          "pesci's"      "character"    "ranted"       "that"        
[236] "he"           "wanted"       "people"       "to"           "know"        
[241] "it"           "is"           "he"           "who"          "is"          
[246] "supplying"    "drugs"        "etc"          "so"           "i"           
[251] "dunno"        "maybe"        "he"           "just"         "hates"       
[256] "mj's"         "music"        "br"           "br"           "lots"        
[261] "of"           "cool"         "things"       "in"           "this"        
[266] "like"         "mj"           "turning"      "into"         "a"           
[271] "car"          "and"          "a"            "robot"        "and"         
[276] "the"          "whole"        "speed"        "demon"        "sequence"    
[281] "also"         "the"          "director"     "must"         "have"        
[286] "had"          "the"          "patience"     "of"           "a"           
[291] "saint"        "when"         "it"           "came"         "to"          
[296] "filming"      "the"          "kiddy"        "bad"          "sequence"    
[301] "as"           "usually"      "directors"    "hate"         "working"     
[306] "with"         "one"          "kid"          "let"          "alone"       
[311] "a"            "whole"        "bunch"        "of"           "them"        
[316] "performing"   "a"            "complex"      "dance"        "scene"       
[321] "br"           "br"           "bottom"       "line"         "this"        
[326] "movie"        "is"           "for"          "people"       "who"         
[331] "like"         "mj"           "on"           "one"          "level"       
[336] "or"           "another"      "which"        "i"            "think"       
[341] "is"           "most"         "people"       "if"           "not"         
[346] "then"         "stay"         "away"         "it"           "does"        
[351] "try"          "and"          "give"         "off"          "a"           
[356] "wholesome"    "message"      "and"          "ironically"   "mj's"        
[361] "bestest"      "buddy"        "in"           "this"         "movie"       
[366] "is"           "a"            "girl"         "michael"      "jackson"     
[371] "is"           "truly"        "one"          "of"           "the"         
[376] "most"         "talented"     "people"       "ever"         "to"          
[381] "grace"        "this"         "planet"       "but"          "is"          
[386] "he"           "guilty"       "well"         "with"         "all"         
[391] "the"          "attention"    "i've"         "gave"         "this"        
[396] "subject"      "hmmm"         "well"         "i"            "don't"       
[401] "know"         "because"      "people"       "can"          "be"          
[406] "different"    "behind"       "closed"       "doors"        "i"           
[411] "know"         "this"         "for"          "a"            "fact"        
[416] "he"           "is"           "either"       "an"           "extremely"   
[421] "nice"         "but"          "stupid"       "guy"          "or"          
[426] "one"          "of"           "the"          "most"         "sickest"     
[431] "liars"        "i"            "hope"         "he"           "is"          
[436] "not"          "the"          "latter"      

[[2]]
  [1] "the"          "classic"      "war"          "of"           "the"         
  [6] "worlds"       "by"           "timothy"      "hines"        "is"          
 [11] "a"            "very"         "entertaining" "film"         "that"        
 [16] "obviously"    "goes"         "to"           "great"        "effort"      
 [21] "and"          "lengths"      "to"           "faithfully"   "recreate"    
 [26] "h"            "g"            "wells"        "classic"      "book"        
 [31] "mr"           "hines"        "succeeds"     "in"           "doing"       
 [36] "so"           "i"            "and"          "those"        "who"         
 [41] "watched"      "his"          "film"         "with"         "me"          
 [46] "appreciated"  "the"          "fact"         "that"         "it"          
 [51] "was"          "not"          "the"          "standard"     "predictable" 
 [56] "hollywood"    "fare"         "that"         "comes"        "out"         
 [61] "every"        "year"         "e.g"          "the"          "spielberg"   
 [66] "version"      "with"         "tom"          "cruise"       "that"        
 [71] "had"          "only"         "the"          "slightest"    "resemblance" 
 [76] "to"           "the"          "book"         "obviously"    "everyone"    
 [81] "looks"        "for"          "different"    "things"       "in"          
 [86] "a"            "movie"        "those"        "who"          "envision"    
 [91] "themselves"   "as"           "amateur"      "critics"      "look"        
 [96] "only"         "to"           "criticize"    "everything"   "they"        
[101] "can"          "others"       "rate"         "a"            "movie"       
[106] "on"           "more"         "important"    "bases"        "like"        
[111] "being"        "entertained"  "which"        "is"           "why"         
[116] "most"         "people"       "never"        "agree"        "with"        
[121] "the"          "critics"      "we"           "enjoyed"      "the"         
[126] "effort"       "mr"           "hines"        "put"          "into"        
[131] "being"        "faithful"     "to"           "h.g"          "wells"       
[136] "classic"      "novel"        "and"          "we"           "found"       
[141] "it"           "to"           "be"           "very"         "entertaining"
[146] "this"         "made"         "it"           "easy"         "to"          
[151] "overlook"     "what"         "the"          "critics"      "perceive"    
[156] "to"           "be"           "its"          "shortcomings"

Note that text2vec provides a few tokenizer functions (see ?tokenizers). These are just simple wrappers for the base::gsub() function and are not very fast or flexible. If you need something smarter or faster you can use the tokenizers package.

We can create an iterator over each token using itoken(). An iterator is an object that can be iterated upon, meaning that you can traverse through all the values. In our example, we’ll be able to traverse through each token for each row using our newly generated iterator, it. The general thing to note here is that this is a way to make the approach less memory intensive, something that will turn out to be helpful.

# iterates over each token
it <- itoken(tokens, ids = movie_review$id[1:3000], progressbar = FALSE)

# prints iterator
it
<itoken>
  Inherits from: <CallbackIterator>
  Public:
    callback: function (x) 
    clone: function (deep = FALSE) 
    initialize: function (x, callback = identity) 
    is_complete: active binding
    length: active binding
    move_cursor: function () 
    nextElem: function () 
    x: GenericIterator, iterator, R6

Vocabulary-based Vectorization

As stated above, we represent our corpus as a document-feature matrix. The process for text2vec is much different than with quanteda, though the intuition is generally aligned. Effectively, the text2vec design is intended to be faster and more memory-efficient; the downside is that it’s a little more obtuse. The first step is to create our vocabulary for the DFM. That is simple since we have already created an iterator; all we need to do is place our iterator as an argument inside create_vocabulary().

# built the vocabulary
v <- create_vocabulary(it)

# print vocabulary
v
Number of docs: 3000 
0 stopwords:  ... 
ngram_min = 1; ngram_max = 1 
Vocabulary: 
         term term_count doc_count
       <char>      <int>     <int>
    1:    0.3          1         1
    2:   0.48          1         1
    3:    0.5          1         1
    4:   0.89          1         1
    5:  00015          1         1
   ---                            
33487:     to      16370      2826
33488:     of      17409      2829
33489:    and      19761      2892
33490:      a      19776      2910
33491:    the      40246      2975
# checking dimensions
dim(v)
[1] 33491     3

We can create stop words or prune our vocabulary with prune_vocabulary(). We will keep the terms that occur at least 10 times.

# prunes vocabulary
v <- prune_vocabulary(v, term_count_min = 10, doc_proportion_max = 0.2)

# check dimensions
dim(v)
[1] 5325    3

If we check the dimensions after pruning our vocabulary, we can see that we have less terms. We have removed the very common words so that our vocabulary can contain more high quality and meaningful words.

Before we can create our DFM, we’ll need to vectorize our vocabulary with vocab_vectorizer().

# creates a closure that helps transform list of tokens into vector space
vectorizer <- vocab_vectorizer(v)

We now have everything we need to create a DFM. We can pass in our iterator of tokens, our vectorized vocabulary, and a type of matrix (either dgCMatrix or dgTMatrix) in create_dtm().

# creates document term matrix
dtm <- create_dtm(it, vectorizer, type = "dgTMatrix")

Now we can create our topic model after we have created our DTM. We create our model using LDA$new().

# create new LDA model
lda_model <- LDA$new(n_topics = 10, doc_topic_prior = 0.1,
                     topic_word_prior = 0.01)

# print other methods for LDA
lda_model
<WarpLDA>
  Inherits from: <LDA>
  Public:
    clone: function (deep = FALSE) 
    components: active binding
    fit_transform: function (x, n_iter = 1000, convergence_tol = 0.001, n_check_convergence = 10, 
    get_top_words: function (n = 10, topic_number = 1L:private$n_topics, lambda = 1) 
    initialize: function (n_topics = 10L, doc_topic_prior = 50/n_topics, topic_word_prior = 1/n_topics, 
    plot: function (lambda.step = 0.1, reorder.topics = FALSE, doc_len = private$doc_len, 
    topic_word_distribution: active binding
    transform: function (x, n_iter = 1000, convergence_tol = 0.001, n_check_convergence = 10, 
  Private:
    calc_pseudo_loglikelihood: function (ptr = private$ptr) 
    check_convert_input: function (x) 
    components_: NULL
    doc_len: NULL
    doc_topic_distribution: function () 
    doc_topic_distribution_with_prior: function () 
    doc_topic_matrix: NULL
    doc_topic_prior: 0.1
    fit_transform_internal: function (model_ptr, n_iter, convergence_tol, n_check_convergence, 
    get_c_all: function () 
    get_c_all_local: function () 
    get_doc_topic_matrix: function (prt, nr) 
    get_topic_word_count: function () 
    init_model_dtm: function (x, ptr = private$ptr) 
    internal_matrix_formats: list
    is_initialized: FALSE
    n_iter_inference: 10
    n_topics: 10
    ptr: NULL
    reset_c_local: function () 
    run_iter_doc: function (update_topics = TRUE, ptr = private$ptr) 
    run_iter_word: function (update_topics = TRUE, ptr = private$ptr) 
    seeds: 877721554.682558 1522846961.08174
    set_c_all: function (x) 
    set_internal_matrix_formats: function (sparse = NULL, dense = NULL) 
    topic_word_distribution_with_prior: function () 
    topic_word_prior: 0.01
    transform_internal: function (x, n_iter = 1000, convergence_tol = 0.001, n_check_convergence = 10, 
    vocabulary: NULL

After printing lda_model, we can see there are other methods we can use with the model.

Note: the only accessible methods are the ones under ‘Public’. Documentation for all methods and arguments are available here on page 22.

Fitting

We can fit our model with $fit_transform:

# fitting model
doc_topic_distr <- 
  lda_model$fit_transform(x = dtm, n_iter = 1000,
                          convergence_tol = 0.001, n_check_convergence = 25,
                          progressbar = FALSE)
INFO  [01:55:46.229] early stopping at 225 iteration
INFO  [01:55:50.903] early stopping at 50 iteration

The doc_topic_distr object is a matrix where each row is a document, each column is a topic, and the cell entry is the proportion of the document estimated to be of the topic. That is, each row is the topic attention distribution for a document.

For example, here’s the topic distribution for the very first document:

barplot(doc_topic_distr[1, ], xlab = "topic",
        ylab = "proportion", ylim = c(0,1),
        names.arg = 1:ncol(doc_topic_distr))

Describing Topics: Top Words

We can also use $get_top_words as a method to get the top words for each topic.

# get top n words for topics 1, 5, and 10
lda_model$get_top_words(n = 10, topic_number = c(1L, 5L, 10L),
                        lambda = 1)
      [,1]       [,2]       [,3]     
 [1,] "did"      "horror"   "life"   
 [2,] "funny"    "man"      "love"   
 [3,] "i'm"      "scene"    "between"
 [4,] "know"     "there's"  "these"  
 [5,] "actors"   "little"   "those"  
 [6,] "watching" "scenes"   "seems"  
 [7,] "say"      "pretty"   "father" 
 [8,] "ever"     "head"     "work"   
 [9,] "didn't"   "house"    "always" 
[10,] "films"    "director" "where"  

Also top-words could be stored by “relevance” which also takes into account frequency of word in the corpus (0 < lambda < 1).

The creator recommends setting lambda to be between 0.2 and 0.4. Here’s the difference compared to a lambda of 1:

lda_model$get_top_words(n = 10, topic_number = c(1L, 5L, 10L),
                        lambda = 0.2)
      [,1]       [,2]       [,3]           
 [1,] "zombie"   "horror"   "relationship" 
 [2,] "funny"    "gore"     "relationships"
 [3,] "zombies"  "starts"   "arthur"       
 [4,] "reviews"  "killer"   "childhood"    
 [5,] "reminded" "kills"    "moral"        
 [6,] "rubbish"  "car"      "de"           
 [7,] "utter"    "police"   "office"       
 [8,] "laughing" "head"     "loving"       
 [9,] "laugh"    "thriller" "class"        
[10,] "i'd"      "slasher"  "anna"         

Apply Learned Model to New Data

One thing we occasionally may be interested in doing is understanding how well our model fits the data. Therefore, we can rely on our supervised learning insights and apply the estimated model to new data. From that, we’ll obtain a document-topic distribution that we can:

# creating iterator
it2 <- itoken(movie_review$review[3001:5000], tolower,
              word_tokenizer, ids = movie_review$id[3001:5000])
# creating new DFM
new_dtm <- create_dtm(it2, vectorizer, type = "dgTMatrix")

We will have to use $transform instead of $fit_transform since we don’t have to fit the new model (we are attempting to predict the last 2000).

new_doc_topiic_distr = lda_model$transform(new_dtm)
INFO  [01:55:56.388] early stopping at 30 iteration

One widely used approach for model hyper-parameter tuning is validation of per-word perplexity on hold-out set. This is quite easy with text2vec.

Remember that we’ve fit the model on only the first 3000 reviews and predicted the last 2000. Therefore, we will calculate the held-out perplexity on these 2000 docs as follows:

# calculates perplexity between new and old topic word distribution
perplexity(new_dtm, topic_word_distribution = lda_model$topic_word_distribution,
           doc_topic_distribution = new_doc_topiic_distr)
[1] 2312.273

The lower perplexity the better. We can imagine adapting our hyperparameters and re-estimating across perplexity to try to evaluate our model performance. Still, perplexity as a measure has it’s own concerns: it doesn’t directly provide insight on whether or not the topics make sense, and tends to prefer bigger models than smaller ones.

Visualization

Normally it would take one line to run the visualization for the LDA model, using the method $plot().

Let’s download and load in the required library the visuals depend on:

#install.packages('LDAvis')
library(LDAvis)
# creating plot
lda_model$plot()
Loading required namespace: servr

Structural Topic Model

Imagine you are interested in the topics that are explored in political speeches, and specifically whether Republicans and Democrats focus on different topics. One approach would be to–after estimating an LDA model like above–average the topic proportions by the speaker.

Of course, that seems inefficient. We might want to instead leverage the information on the speech itself as part of the estimation of the topics. That is, we are estimating topical prevalence, and we know that there’s a different speaker, so we should be incorporating that information in estimating the topics. That’s the fundamental idea with Structural Topic Models (STM).

Front-end Matters

STM has really fantastic documentation and a host of related packages for added functionality. You can find the STM website here. Let’s load the package. Note that this will almost certainly take a few minutes given all of the dependencies.

#install.packages("stm")
library(stm)
stm v1.3.7 successfully loaded. See ?stm for help. 
 Papers, resources, and other materials at structuraltopicmodel.com
library(quanteda)
Package version: 4.1.0
Unicode version: 14.0
ICU version: 71.1
Parallel computing: disabled
See https://quanteda.io for tutorials and examples.

Attaching package: 'quanteda'
The following object is masked from 'package:tm':

    stopwords
The following objects are masked from 'package:NLP':

    meta, meta<-

Creating the DFM

We’ll continue to use the movie reviews dataset. Now, we’ll leverage the sentiment variable included in the dataset as a covariate in our estimates of topical prevalence; that is, we expect some topics to be more prevalent in positive reviews as opposed to negative reviews, and vice versa. The variable is coded [0,1], with 0 indicating a negative review and 1 indicating a positive review.

table(movie_review$sentiment)

   0    1 
2483 2517 

STM works differently than the text2vec, so we’ll need to have our data in a different format now.

myTokens <- tokens(movie_review$review,
                   remove_punct = TRUE) %>%
             tokens_remove(stopwords("en"))

myDfm <- dfm(myTokens, tolower = TRUE)

dim(myDfm)
[1]  5000 46795

Correlated Topic Model

Now that we have our corpus, we can prep for a structural topic model that incorporates covariates. Recall, however, that the STM without covariates reduces to a very fast implementation of Correlated Topic Models (i.e., a version of the vanilla LDA model but where the topic proportions can be positively correlated with one another).

cor_topic_model <- stm(myDfm, K = 5,
                       verbose = FALSE, init.type = "Spectral")
cor_topic_model
A topic model with 5 topics, 5000 documents and a 46795 word dictionary.
summary(cor_topic_model)
A topic model with 5 topics, 5000 documents and a 46795 word dictionary.
Topic 1 Top Words:
     Highest Prob: film, one, horror, just, even, like, bad 
     FREX: slasher, zombie, zombies, scarecrows, halloween, kornbluth, fay 
     Lift: babban, boogey, boreanaz, btk, cheapo, copycat, dornwinkle 
     Score: zombie, slasher, horror, zombies, kornbluth, scarecrow, scarecrows 
Topic 2 Top Words:
     Highest Prob: >, <, br, film, one, movie, like 
     FREX: >, <, br, zizek, miya, |, aztec 
     Lift: 041, 1-2, 1-to-10-star, 1.0, 10-minute, 10-second, 102 
     Score: >, <, br, miya, zizek, slugs, oshii 
Topic 3 Top Words:
     Highest Prob: film, one, story, life, films, also, love 
     FREX: bettie, mathieu, sidney, vargas, macarthur, chavez, israel 
     Lift: 1918, 1945, anna's, antwone, beau, becker, belen 
     Score: bettie, film, mathieu, macarthur, chavez, vargas, guinness 
Topic 4 Top Words:
     Highest Prob: one, show, best, good, also, film, man 
     FREX: wwe, rochester, triple, blandings, kolchak, matthau, spock 
     Lift: 1692, 1931, absurdist, adrien, adversaries, affirmative, alaric 
     Score: wwe, taker, bubba, benoit, booker, rochester, kolchak 
Topic 5 Top Words:
     Highest Prob: movie, like, just, one, good, film, really 
     FREX: movie, watched, movies, liked, funny, kids, loved 
     Lift: #spoilers, affiliated, african-influenced, ai, airlift, alvin's, ammo 
     Score: movie, bad, stupid, movies, crap, funny, think 

Once we’ve estimated the model, we’ll want to take a look at the topics. Importantly, we don’t get nice, neat topic names. What we do have are the words that are most frequent, probable, or that otherwise characterize a topic. STM provides handy functionality to extract those words with the labelTopics() function.

labelTopics(cor_topic_model)
Topic 1 Top Words:
     Highest Prob: film, one, horror, just, even, like, bad 
     FREX: slasher, zombie, zombies, scarecrows, halloween, kornbluth, fay 
     Lift: babban, boogey, boreanaz, btk, cheapo, copycat, dornwinkle 
     Score: zombie, slasher, horror, zombies, kornbluth, scarecrow, scarecrows 
Topic 2 Top Words:
     Highest Prob: >, <, br, film, one, movie, like 
     FREX: >, <, br, zizek, miya, |, aztec 
     Lift: 041, 1-2, 1-to-10-star, 1.0, 10-minute, 10-second, 102 
     Score: >, <, br, miya, zizek, slugs, oshii 
Topic 3 Top Words:
     Highest Prob: film, one, story, life, films, also, love 
     FREX: bettie, mathieu, sidney, vargas, macarthur, chavez, israel 
     Lift: 1918, 1945, anna's, antwone, beau, becker, belen 
     Score: bettie, film, mathieu, macarthur, chavez, vargas, guinness 
Topic 4 Top Words:
     Highest Prob: one, show, best, good, also, film, man 
     FREX: wwe, rochester, triple, blandings, kolchak, matthau, spock 
     Lift: 1692, 1931, absurdist, adrien, adversaries, affirmative, alaric 
     Score: wwe, taker, bubba, benoit, booker, rochester, kolchak 
Topic 5 Top Words:
     Highest Prob: movie, like, just, one, good, film, really 
     FREX: movie, watched, movies, liked, funny, kids, loved 
     Lift: #spoilers, affiliated, african-influenced, ai, airlift, alvin's, ammo 
     Score: movie, bad, stupid, movies, crap, funny, think 

We can also look at the top documents associated with each topic using findThoughts(). Here, we’ll look at the top document (n=1) for each of the 5 topics (topics = c(1:5)).

findThoughts(cor_topic_model,
             texts = movie_review$review,
             topics = c(1:5),
             n = 1)

 Topic 1: 
     Need a lesson in pure, abject failure?? Look no further than \"Wizards of the Lost Kingdom\", an abysmal, dirt-poor, disgrace of a flick. As we all know, decent moovies tend to sprout horrible, horrible offspring: \"Halloween\" begat many, many bad 80's slasher flicks; \"Mad Max\" begat many, many bad 80's \"futuristic wasteland fantasy\" flicks; and \"Conan the Barbarian\" begat a whole slew of terrible, horrible, incredibly bad 80's sword-and-sorcery flicks. \"Wizards of the Lost Kingdom\" scrapes the bottom of that 80's barrel, in a way that's truly insulting to barrels. A young runt named Simon recaptured his \"good kingdom\" from an evil sorcerer with the help of a mangy rug, a garden gnome, a topless bimbo mermaid, and a tired-looking, pudgy Bo Svenson. Svenson(\"North Dallas Forty\", \"Inglorious Bastards\", \"Delta Force\"), a long-time b-moovie muscleman, looks barely able to swing his aluminum foil sword. However, he manages to defeat the forces of evil, which consist of the evil sorcerer, \"Shurka\", and his army of badly costumed monsters, giants, and midgets. At one point, a paper mache bat on a string attacks, but is eaten by a 1/2 hidden sock puppet, pitifully presented as some sort of dragon. The beginning of the film consists of what can only politely be described as bits of scenes scooped up from the cutting-room floor of udder bad moovies, stitched together in the vain hope of setting the scene for the film, and over-earnestly narrated by some guy who never appears again. Words cannot properly convey the jaw-dropping cheapness of this film; the producers probably spent moore moolah feeding Svenson's ever expanding gullet than on the cheesy fx of this flick. And we're talkin' Brie here, folks... :=8P Director Hector Olivera(\"Barbarian Queen\") presents this mish-mash in a hopelessly confused, confuddled, and cliched manner, destroying any possible hint of clear, linear storytelling. The acting is dreadful, the production levels below shoe-string, and the plot is one tired cliche after another paraded before our weary eyes. That they actually made a sequel(!!!) makes the MooCow's brain whirl. James Horner's(\"Braveheart\", \"Titanic\",\"The Rock\") cheesy moosic from \"Battle Beyond the Stars\" was lifted, screaming and kicking, and mercilessly grafted onto this turkey - bet this one doesn't pop up on his resume. Folks, you gotta see this to believe it. The MooCow says as a cheapo rent when there is NOTHING else to watch, well, it's moore fun than watching dust bunnies mate. Barely. :=8P 
 Topic 2: 
     David Arquette is a young and naive home security alarm<br /><br />salesman taken under the wing of Stanley Tucci. Arquette is a<br /><br />golden boy, scoring a big sale on his first call- to widow Kate<br /><br />Capshaw and her dopey son Ryan Reynolds. Things are going<br /><br />well for Arquette, he is appearing in commercials for the security<br /><br />firm and he is falling in love with Capshaw.<br /><br />Then Tucci and his right hand woman Mary McCormack let him in<br /><br />on a little secret- they sometimes break into the houses of their<br /><br />clients in order to scare them and to get their neighbors to buy<br /><br />security systems from the firm. Arquette decides not to get<br /><br />involved, taking Capshaw to meet his family, and going through life<br /><br />with a goofy smile on his face. Then, someone breaks into<br /><br />Capshaw's home and murders her and her son. Arquette suspects Tucci, and sets a series of traps, resulting in a gun to his<br /><br />boss' head as Tucci pleads his innocence.<br /><br />Based on a stage play, \"The Alarmist\" is not opened up well. The<br /><br />scenes where Arquette takes the Capshaw to meet his parents<br /><br />are badly played and completely unfunny. They are also out of line<br /><br />with the character Capshaw is playing, as she gets drunk and tells<br /><br />sexually explicit stories to Arquette's mom Michael Learned. Other<br /><br />than these scenes, Capshaw is not given much to do, but she<br /><br />does a lot with the little she is given.<br /><br />Stanley Tucci, looking just like Terry O'Quinn, is a riot as the<br /><br />security firm owner. He is a creep who really does not understand<br /><br />Arquette's moral revulsion. However, when he turns into a<br /><br />sniveling whiner after Arquette kidnaps him, he is hilarious. Mary<br /><br />McCormack seems to have been groomed for a bigger role, but<br /><br />she mostly stands around and agrees with Tucci. Ryan Reynolds<br /><br />is too old to play a dumb teenager, but he is funny, especially<br /><br />telling his own explicit sexual story to Arquette.<br /><br />The screenplay lurches from romantic comedy to dark comedy too<br /><br />soon. Capshaw meeting the parents is completely unmotivated,<br /><br />except to give her a reason to get out of town so someone can<br /><br />break into her house. Capshaw and Reynolds are in the film just<br /><br />to give Arquette a reason to take revenge on Tucci.<br /><br />Arquette, who has proven he is a good actor, is awful here. He<br /><br />relies on the constipated mugging that got him through those<br /><br />AT&T ads, and he is not a strong enough presence to build this<br /><br />weak film around. Actually, Reynolds might have been a better<br /><br />choice in the role.<br /><br />Dunsky's direction is good, nothing that will win an Oscar soon.<br /><br />Christophe Beck's light jazzy score recalls the type of film noir this<br /><br />film tries to be, and it is really catchy on top of that.<br /><br />Despite the pluses, Arquette's failure as a lead and the script's<br /><br />schizophrenic quality sinks the film. I do not recommend it.<br /><br />This is rated (R) for physical violence, gun violence, some gore,<br /><br />strong profanity, brief female nudity, sexual content, strong sexual<br /><br />references, and adult situations. 
 Topic 3: 
     Since most review's of this film are of screening's seen decade's ago I'd like to add a more recent one, the film open's with stock footage of B-17's bombing Germany, the film cut's to Oskar Werner's Hauptmann (captain) Wust character and his aide running for cover while making their way to Hitler's Fuehrer Bunker, once inside, they are debriefed by bunker staff personnel, the film then cut's to one of many conference scene's with Albin Skoda giving a decent impression of Adolf Hitler rallying his officer's to \"Ultimate Victory\" while Werner's character is shown as slowly coming to realize the bunker denizen's are caught up in a fantasy world-some non-bunker event's are depicted, most notable being the flooding of the subway system to prevent a Russian advance through them and a minor subplot involving a young member of the Flak unit's and his family's difficulty in surviving-this film suffer's from a number of detail inaccuracies that a German film made only 10 year's after WW2 should not have included; the actor portraying Goebbels (Willy Krause) wear's the same uniform as Hitler, including arm eagle- Goebbels wore a brown Nazi Party uniform with swastika armband-the \"SS\" soldier's wear German army camouflage, the well documented scene of Hitler awarding the iron cross to boy's of the Hitler Youth is shown as having taken place INSIDE the bunker (it was done outside in the courtyard) and lastly, Hitler's suicide weapon is clearly shown as a Belgian browning model 1922-most account's agree it was a Walther PPK-some bit's of acting also seem wholly inaccurate with the drunken dance scene near the end of the film being notable, this bit is shown as a cabaret skit, with a intoxicated wounded soldier (his arm in a splint) maniacally goose-stepping to music while a nurse does a combination striptease/belly dance, all by candlelight... this is actually embarrassing to watch-the most incredible bit is when Werner's Captain Wust gain's an audience alone with Skoda's Hitler, Hitler is shown as slumped on a wall bench, drugged and delirious, when Werner's character begin's to question him, Hitler start's screaming which bring's in a SS guard who mortally wound's Werner's character in the back with a gunshot-this fabricated scene is not based on any true historic account-Werner's character is then hauled off to die in a anteroom while Hitler prepare's his own ending, Hitler's farewell to his staff is shown but the suicide is off-screen, the final second's of the movie show Hitler's funeral pyre smoke slowly forming into a ghostly image of the face of the dead Oskar Werner/Hauptmann Wust-this film is more allegorical than historical and anyone interested in this period would do better to check out more recent film's such as the 1973 remake \"Hitler: the last 10 day's\" or the German film \"Downfall\" (Der Untergang) if they wish a more true accounting of this dramatic story, these last two film's are based on first person eyewitness account's, with \"Hitler: the last 10 day's\" being compiled from Gerhard Boldt's autobiography as a staff officer in the Fuehrer Bunker and \"Downfall\" being done from Hitler's secretary's recollection's, the screen play for \"Der Letzte Akte\" is taken from American Nuremberg war crime's trial judge Michael Musmanno's book \"Ten day's to die\", which is more a compilation of event's (many obviously fanciful) than eyewitness history-it is surprising that Hugh Trevor Roper's account,\"The last day's of Hitler\" was never made into a film. 
 Topic 4: 
     This happy-go-luck 1939 military swashbuckler, based rather loosely on Rudyard Kipling's memorable poem as well as his novel \"Soldiers Three,\" qualifies as first-rate entertainment about the British Imperial Army in India in the 1880s. Cary Grant delivers more knock-about blows with his knuckled-up fists than he did in all of his movies put together. Set in faraway India, this six-fisted yarn dwells on the exploits of three rugged British sergeants and their native water bearer Gunga Din (Sam Jaffe) who contend with a bloodthirsty cult of murderous Indians called the Thuggee. Sergeant Archibald Cutter (Cary Grant of \"The Last Outpost\"), Sergeant MacChesney (Oscar-winner Victor McLaglen of \"The Informer\"), and Sergeant Ballantine (Douglas Fairbanks, Jr. of \"The Dawn Patrol\"), are a competitive trio of hard-drinking, hard-brawling, and fun-loving Alpha males whose years of frolic are about to become history because Ballantine plans to marry Emmy Stebbins (Joan Fontaine) and enter the tea business. Naturally, Cutter and MacChesney drum up assorted schemes to derail Ballentine's plans. When their superiors order them back into action with Sgt. Bertie Higginbotham (Robert Coote of \"The Sheik Steps Out\"), Cutter and MacChesney drug Higginbotham so that he cannot accompany them and Ballantine has to replace him. Half of the fun here is watching the principals trying to outwit each other without hating themselves. Director George Stevens celebrates the spirit of adventure in grand style and scope as our heroes tangle with an army of Thuggees. Lenser Joseph H. August received an Oscar nomination for his outstanding black & white cinematography. 
 Topic 5: 
     Full House is a great show. I am still today growing up on it. I started watching it when i was 8 and now i am 12 and still watching it. i fell in love with all of the characters, especially Stephanie. she is my favorite. she had such a sense of humor. in case there are people on this sight that hardly watch the show, you should because you will get hooked on it. i became hooked on it after the first show i saw, which just happened to be the first episode, in 2002. it really is a good show. i really think that this show should go down to many generations in families. and it's great too because it is an appropriate show for all ages. and for all parents, it teaches kids lessons on how to go on with their life. nothing terrible happens, like violence or swearing. it is just a really great sit-com. i give it 5 out of 5 stars. what do you think? OH and the best time to watch it is when you are home sick from school or even the old office. It will make you feel a lot better. Trust me i am hardly home sick but i still know that it will make you feel better. and to everybody that thinks the show is stupid, well that's too bad for you because you won't get as far in life even if you are happy with your life. you really should watch it and you will get hooked on it. i am just telling you what happened to me and everybody else that started watching this awesome show. well i need must go to have some lunch. remember you must start watching full house and soon!

Structural Topic model

Let’s go ahead and estimate our structural topic model now. We’ll incorporate the sentiment variable as a predictor on prevalence.

# choose our number of topics
k <- 5

# specify model
myModel <- stm(myDfm,
               K = k,
               prevalence = ~ sentiment,
               data = movie_review,
               max.em.its = 1000,
               seed = 1234,
               init.type = "Spectral")
Beginning Spectral Initialization 
     Calculating the gram matrix...
     Using only 10000 most frequent terms during initialization...
     Finding anchor words...
    .....
     Recovering initialization...
    ....................................................................................................
Initialization complete.
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 1 (approx. per word bound = -8.600) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 2 (approx. per word bound = -8.036, relative change = 6.567e-02) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 3 (approx. per word bound = -8.001, relative change = 4.337e-03) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 4 (approx. per word bound = -7.992, relative change = 1.060e-03) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 5 (approx. per word bound = -7.988, relative change = 5.005e-04) 
Topic 1: film, br, <, >, movie 
 Topic 2: >, <, br, film, movie 
 Topic 3: film, br, <, >, one 
 Topic 4: one, br, <, >, show 
 Topic 5: movie, like, one, just, br 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 6 (approx. per word bound = -7.986, relative change = 2.995e-04) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 7 (approx. per word bound = -7.984, relative change = 2.025e-04) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 8 (approx. per word bound = -7.983, relative change = 1.519e-04) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 9 (approx. per word bound = -7.982, relative change = 1.200e-04) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 10 (approx. per word bound = -7.981, relative change = 1.004e-04) 
Topic 1: film, one, just, like, movie 
 Topic 2: >, <, br, film, one 
 Topic 3: film, br, <, >, one 
 Topic 4: one, show, like, good, film 
 Topic 5: movie, like, just, one, good 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 11 (approx. per word bound = -7.981, relative change = 8.937e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 12 (approx. per word bound = -7.980, relative change = 7.748e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 13 (approx. per word bound = -7.979, relative change = 6.546e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 14 (approx. per word bound = -7.979, relative change = 5.645e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 15 (approx. per word bound = -7.979, relative change = 4.526e-05) 
Topic 1: film, one, just, even, like 
 Topic 2: >, <, br, film, one 
 Topic 3: film, one, story, br, < 
 Topic 4: one, show, good, like, film 
 Topic 5: movie, like, just, one, good 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 16 (approx. per word bound = -7.978, relative change = 3.836e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 17 (approx. per word bound = -7.978, relative change = 3.395e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 18 (approx. per word bound = -7.978, relative change = 3.022e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 19 (approx. per word bound = -7.978, relative change = 2.789e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 20 (approx. per word bound = -7.977, relative change = 2.811e-05) 
Topic 1: film, one, just, even, like 
 Topic 2: >, <, br, film, one 
 Topic 3: film, one, story, life, films 
 Topic 4: one, show, good, best, film 
 Topic 5: movie, like, just, one, good 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 21 (approx. per word bound = -7.977, relative change = 2.682e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 22 (approx. per word bound = -7.977, relative change = 2.724e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 23 (approx. per word bound = -7.977, relative change = 2.685e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 24 (approx. per word bound = -7.977, relative change = 2.833e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 25 (approx. per word bound = -7.976, relative change = 1.984e-05) 
Topic 1: film, one, just, even, like 
 Topic 2: >, <, br, film, one 
 Topic 3: film, one, story, life, films 
 Topic 4: one, show, good, best, film 
 Topic 5: movie, like, just, one, good 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 26 (approx. per word bound = -7.976, relative change = 1.832e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 27 (approx. per word bound = -7.976, relative change = 1.726e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 28 (approx. per word bound = -7.976, relative change = 1.615e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 29 (approx. per word bound = -7.976, relative change = 1.577e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 30 (approx. per word bound = -7.976, relative change = 1.422e-05) 
Topic 1: film, one, just, even, bad 
 Topic 2: >, <, br, film, one 
 Topic 3: film, one, story, life, films 
 Topic 4: one, show, good, best, also 
 Topic 5: movie, like, just, one, good 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 31 (approx. per word bound = -7.976, relative change = 1.347e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 32 (approx. per word bound = -7.975, relative change = 1.307e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 33 (approx. per word bound = -7.975, relative change = 1.156e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 34 (approx. per word bound = -7.975, relative change = 1.081e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 35 (approx. per word bound = -7.975, relative change = 1.016e-05) 
Topic 1: film, one, just, even, bad 
 Topic 2: >, <, br, film, one 
 Topic 3: film, one, story, life, films 
 Topic 4: one, show, good, best, film 
 Topic 5: movie, like, just, one, good 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Model Converged 

Note what’s significantly different from before is added the prevalence formula. As we discuss in lecture, you can also include variables as content predictors.

labelTopics(myModel)
Topic 1 Top Words:
     Highest Prob: film, one, just, even, bad, like, horror 
     FREX: slasher, scarecrows, zombie, zombies, kornbluth, scarecrow, seagal 
     Lift: 2600, addy, amick, antichrist, aranoa, ba, babban 
     Score: zombie, slasher, kornbluth, zombies, scarecrows, bad, horror 
Topic 2 Top Words:
     Highest Prob: >, <, br, film, one, movie, like 
     FREX: >, <, br, zizek, miya, |, aztec 
     Lift: 1-2, 1-to-10-star, 1.0, 10-minute, 102, 102nd, 11-related 
     Score: >, <, br, miya, zizek, slugs, oshii 
Topic 3 Top Words:
     Highest Prob: film, one, story, life, films, also, love 
     FREX: bettie, mathieu, sidney, macarthur, chavez, israel, lumet 
     Lift: 1918, 1920, 53, adapt, addressing, aishwarya, albniz 
     Score: film, bettie, mathieu, macarthur, aids, antwone, flamenco 
Topic 4 Top Words:
     Highest Prob: one, show, best, good, film, also, man 
     FREX: wwe, rochester, triple, kolchak, spock, taker, christy 
     Lift: 1692, 1931, absurdist, adrien, adversaries, alaric, alekos 
     Score: wwe, taker, bubba, benoit, booker, kolchak, rochester 
Topic 5 Top Words:
     Highest Prob: movie, like, just, one, good, film, really 
     FREX: movie, movies, watched, liked, kids, funny, loved 
     Lift: _____, ______, _real_, @ers, @k, 00015, 1,65m 
     Score: movie, movies, bad, stupid, like, think, really 

The topics again look reasonable, and are generally similar to the topics we estimated earlier. We can go a step further by plotting out the top topics (as groups of words associated with that topic) and their estimated frequency across the corpus.

plot(myModel, type = "summary")

One thing we might want to do is to extract the topics and to assign them to the vector of document proportions; this is often useful if we’re using those topic proportions in any sort of downstream analysis, including just a visualization. The following extracts the top words (here, by frex, though you can update that to any of the other three top word sets). Then it iterates through the extracted sets and collapses the strings so the tokens are separated by an underscore; this is useful as a variable name for those downstream analyses.

# get the words
myTopicNames <- labelTopics(myModel, n=4)$frex

# set up an empty vector
myTopicLabels <- rep(NA, k)

# set up a loop to go through the topics and collapse the words to a single name
for (i in 1:k){
  myTopicLabels[i] <- paste(myTopicNames[i,], collapse = "_")
}

# print the names
myTopicLabels
[1] "slasher_scarecrows_zombie_zombies" ">_<_br_zizek"                     
[3] "bettie_mathieu_sidney_macarthur"   "wwe_rochester_triple_kolchak"     
[5] "movie_movies_watched_liked"       

Estimate Effect

Recall that we included sentiment as a predictor variable on topical prevalence. We can extract the effect of the predictor here using the estimateEffect() function, which takes as arguments a formula, the stm model object, and the metadata containing the predictor variable.

Once we’ve run the function, we can plot the estimated effects of sentiment on topic prevalence for each of the estimated topics. With a dichotomous predictor variable, we’ll plot these out solely as the difference (method = "difference") in topic prevalence across the values of the predictor. Here, our estimate indicates how much more (or less) the topic is discussed when the sentiment of the post is positive.

# estimate effects
modelEffects <- estimateEffect(formula = 1:k ~ sentiment,
                               stmobj = myModel,
                               metadata = movie_review)

# plot effects
myRows <- 2
par(mfrow = c(myRows, 3), bty = "n", lwd = 2)
for (i in 1:k){
  plot.estimateEffect(modelEffects,
                      covariate = "sentiment",
                      xlim = c(-.25, .25),
                      model = myModel,
                      topics = modelEffects$topics[i],
                      method = "difference",
                      cov.value1 = 1,
                      cov.value2 = 0, 
                      main = myTopicLabels[i],
                      printlegend = F,
                      linecol = "grey26",
                      labeltype = "custom",
                      verbose.labels = F,
                      custom.labels = c(""))
  par(new = F)
}

Choosing K

I’m sure you were thinking “How did she select 5 topics?” Well, the answer is that it was just a random number that I selected out of thin air. The choice of the number of topics, typically denoted K, is one of the areas where the design of topic models let’s us as researchers down a bit. While some approaches have been proposed, none have really gained traction. STM includes an approach that we won’t explore based on work by David Mimno that automatically identifies a topic; in reality, it normally results in far more topics than a human would be likely to choose.

With all that said, there is some functionality included with STM to explore different specifications and to try to at least get some idea of how different approaches perform. searchK() lets you estimate a series of different models, then you can plot a series of different evaluation metrics across those choices.

differentKs <- searchK(myDfm,
                       K = c(5, 25, 50),
                       prevalence = ~ sentiment,
                       N = 250,
                       data = movie_review,
                       max.em.its = 1000,
                       init.type = "Spectral")
Beginning Spectral Initialization 
     Calculating the gram matrix...
     Using only 10000 most frequent terms during initialization...
     Finding anchor words...
    .....
     Recovering initialization...
    ....................................................................................................
Initialization complete.
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 1 (approx. per word bound = -8.598) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 2 (approx. per word bound = -8.030, relative change = 6.606e-02) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 3 (approx. per word bound = -7.995, relative change = 4.335e-03) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 4 (approx. per word bound = -7.988, relative change = 9.748e-04) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 5 (approx. per word bound = -7.984, relative change = 4.582e-04) 
Topic 1: >, br, <, film, one 
 Topic 2: movie, like, just, one, film 
 Topic 3: >, <, br, film, movie 
 Topic 4: film, <, br, >, one 
 Topic 5: >, br, <, one, show 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 6 (approx. per word bound = -7.982, relative change = 2.807e-04) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 7 (approx. per word bound = -7.980, relative change = 1.973e-04) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 8 (approx. per word bound = -7.979, relative change = 1.478e-04) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 9 (approx. per word bound = -7.978, relative change = 1.109e-04) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 10 (approx. per word bound = -7.977, relative change = 8.100e-05) 
Topic 1: film, one, like, characters, just 
 Topic 2: movie, like, film, just, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, <, br, > 
 Topic 5: >, br, <, one, show 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 11 (approx. per word bound = -7.977, relative change = 6.465e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 12 (approx. per word bound = -7.977, relative change = 5.587e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 13 (approx. per word bound = -7.976, relative change = 4.741e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 14 (approx. per word bound = -7.976, relative change = 4.042e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 15 (approx. per word bound = -7.976, relative change = 3.656e-05) 
Topic 1: film, one, like, characters, just 
 Topic 2: movie, film, like, just, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, story, <, br 
 Topic 5: >, br, <, one, show 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 16 (approx. per word bound = -7.975, relative change = 3.245e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 17 (approx. per word bound = -7.975, relative change = 2.718e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 18 (approx. per word bound = -7.975, relative change = 2.308e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 19 (approx. per word bound = -7.975, relative change = 2.039e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 20 (approx. per word bound = -7.975, relative change = 1.805e-05) 
Topic 1: film, one, like, characters, just 
 Topic 2: movie, film, like, just, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, story, life, also 
 Topic 5: >, br, <, one, show 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 21 (approx. per word bound = -7.974, relative change = 1.768e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 22 (approx. per word bound = -7.974, relative change = 1.726e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 23 (approx. per word bound = -7.974, relative change = 1.614e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 24 (approx. per word bound = -7.974, relative change = 1.481e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 25 (approx. per word bound = -7.974, relative change = 1.389e-05) 
Topic 1: film, one, like, characters, even 
 Topic 2: movie, film, like, just, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, story, life, also 
 Topic 5: one, >, br, <, show 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 26 (approx. per word bound = -7.974, relative change = 1.278e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Completing Iteration 27 (approx. per word bound = -7.974, relative change = 1.014e-05) 
....................................................................................................
Completed E-Step (1 seconds). 
Completed M-Step. 
Model Converged 
Beginning Spectral Initialization 
     Calculating the gram matrix...
     Using only 10000 most frequent terms during initialization...
     Finding anchor words...
    .........................
     Recovering initialization...
    ....................................................................................................
Initialization complete.
....................................................................................................
Completed E-Step (4 seconds). 
Completed M-Step. 
Completing Iteration 1 (approx. per word bound = -8.574) 
....................................................................................................
Completed E-Step (4 seconds). 
Completed M-Step. 
Completing Iteration 2 (approx. per word bound = -7.788, relative change = 9.173e-02) 
....................................................................................................
Completed E-Step (3 seconds). 
Completed M-Step. 
Completing Iteration 3 (approx. per word bound = -7.706, relative change = 1.047e-02) 
....................................................................................................
Completed E-Step (3 seconds). 
Completed M-Step. 
Completing Iteration 4 (approx. per word bound = -7.688, relative change = 2.341e-03) 
....................................................................................................
Completed E-Step (3 seconds). 
Completed M-Step. 
Completing Iteration 5 (approx. per word bound = -7.681, relative change = 9.745e-04) 
Topic 1: br, <, >, characters, film 
 Topic 2: movie, film, actors, good, br 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, story, br, excellent 
 Topic 5: show, one, episode, best, series 
 Topic 6: movie, great, one, film, story 
 Topic 7: movie, film, good, one, interesting 
 Topic 8: film, <, >, br, one 
 Topic 9: film, <, br, >, one 
 Topic 10: film, just, >, <, br 
 Topic 11: br, <, >, good, film 
 Topic 12: >, br, <, one, film 
 Topic 13: movie, >, br, <, see 
 Topic 14: >, br, <, film, one 
 Topic 15: >, <, br, film, movie 
 Topic 16: br, <, >, film, one 
 Topic 17: <, >, br, movie, one 
 Topic 18: movie, bad, good, film, one 
 Topic 19: movie, like, <, br, > 
 Topic 20: >, br, <, movie, film 
 Topic 21: >, <, br, one, film 
 Topic 22: one, br, <, >, movie 
 Topic 23: <, br, >, new, movie 
 Topic 24: film, br, <, >, comedy 
 Topic 25: br, <, >, mark, movie 
....................................................................................................
Completed E-Step (3 seconds). 
Completed M-Step. 
Completing Iteration 6 (approx. per word bound = -7.676, relative change = 5.703e-04) 
....................................................................................................
Completed E-Step (3 seconds). 
Completed M-Step. 
Completing Iteration 7 (approx. per word bound = -7.673, relative change = 4.066e-04) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 8 (approx. per word bound = -7.671, relative change = 3.013e-04) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 9 (approx. per word bound = -7.669, relative change = 2.385e-04) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 10 (approx. per word bound = -7.668, relative change = 1.856e-04) 
Topic 1: characters, film, like, one, br 
 Topic 2: movie, film, actors, good, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, story, one, films, excellent 
 Topic 5: show, one, episode, series, best 
 Topic 6: movie, great, film, one, story 
 Topic 7: movie, film, good, one, interesting 
 Topic 8: film, >, <, br, one 
 Topic 9: film, <, br, >, one 
 Topic 10: film, just, like, one, even 
 Topic 11: br, <, >, good, film 
 Topic 12: >, br, <, one, film 
 Topic 13: movie, see, just, like, bad 
 Topic 14: >, br, <, film, films 
 Topic 15: film, movie, one, like, got 
 Topic 16: br, <, >, film, one 
 Topic 17: movie, <, >, br, one 
 Topic 18: movie, bad, film, good, one 
 Topic 19: movie, like, just, show, one 
 Topic 20: >, br, <, film, movie 
 Topic 21: >, <, br, one, film 
 Topic 22: one, movie, film, br, < 
 Topic 23: <, br, >, new, movie 
 Topic 24: film, comedy, good, one, see 
 Topic 25: film, br, >, <, mark 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 11 (approx. per word bound = -7.667, relative change = 1.488e-04) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 12 (approx. per word bound = -7.666, relative change = 1.332e-04) 
....................................................................................................
Completed E-Step (3 seconds). 
Completed M-Step. 
Completing Iteration 13 (approx. per word bound = -7.665, relative change = 1.265e-04) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 14 (approx. per word bound = -7.664, relative change = 1.224e-04) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 15 (approx. per word bound = -7.663, relative change = 8.885e-05) 
Topic 1: characters, film, one, like, script 
 Topic 2: movie, film, actors, good, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, story, one, films, love 
 Topic 5: show, one, episode, series, best 
 Topic 6: movie, great, film, one, story 
 Topic 7: movie, film, good, one, interesting 
 Topic 8: film, one, also, story, like 
 Topic 9: film, <, br, >, one 
 Topic 10: film, just, like, one, even 
 Topic 11: good, film, one, time, br 
 Topic 12: >, br, <, film, one 
 Topic 13: movie, see, just, like, bad 
 Topic 14: >, <, br, film, films 
 Topic 15: film, movie, one, like, got 
 Topic 16: film, one, much, like, even 
 Topic 17: movie, one, <, br, > 
 Topic 18: movie, bad, film, good, one 
 Topic 19: movie, like, just, show, one 
 Topic 20: >, br, <, film, movie 
 Topic 21: one, >, <, br, film 
 Topic 22: one, movie, film, movies, seen 
 Topic 23: <, br, >, new, movie 
 Topic 24: film, comedy, good, one, see 
 Topic 25: film, mark, >, br, < 
....................................................................................................
Completed E-Step (3 seconds). 
Completed M-Step. 
Completing Iteration 16 (approx. per word bound = -7.662, relative change = 6.174e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 17 (approx. per word bound = -7.662, relative change = 5.780e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 18 (approx. per word bound = -7.662, relative change = 4.898e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 19 (approx. per word bound = -7.661, relative change = 3.730e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 20 (approx. per word bound = -7.661, relative change = 3.811e-05) 
Topic 1: film, characters, one, like, script 
 Topic 2: movie, film, actors, good, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, story, films, love 
 Topic 5: show, one, episode, series, best 
 Topic 6: movie, great, film, one, story 
 Topic 7: movie, film, good, one, interesting 
 Topic 8: film, one, also, story, like 
 Topic 9: film, <, br, >, one 
 Topic 10: film, just, one, like, even 
 Topic 11: good, film, one, time, story 
 Topic 12: >, br, film, <, one 
 Topic 13: movie, just, see, like, bad 
 Topic 14: film, <, br, >, films 
 Topic 15: film, movie, one, like, got 
 Topic 16: film, one, much, like, even 
 Topic 17: movie, one, film, just, like 
 Topic 18: movie, bad, film, good, one 
 Topic 19: movie, like, show, just, one 
 Topic 20: film, >, br, <, movie 
 Topic 21: one, film, match, >, < 
 Topic 22: one, movie, film, movies, seen 
 Topic 23: <, br, >, new, movie 
 Topic 24: film, comedy, good, one, like 
 Topic 25: film, mark, boys, one, movie 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 21 (approx. per word bound = -7.661, relative change = 4.129e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 22 (approx. per word bound = -7.660, relative change = 3.903e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 23 (approx. per word bound = -7.660, relative change = 4.001e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 24 (approx. per word bound = -7.660, relative change = 4.297e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 25 (approx. per word bound = -7.660, relative change = 4.002e-05) 
Topic 1: film, characters, one, like, script 
 Topic 2: movie, film, actors, good, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, story, films, love 
 Topic 5: show, one, episode, series, best 
 Topic 6: movie, great, film, one, story 
 Topic 7: movie, film, good, one, interesting 
 Topic 8: film, one, also, story, like 
 Topic 9: film, one, <, br, > 
 Topic 10: film, just, one, like, even 
 Topic 11: good, film, one, time, story 
 Topic 12: film, one, show, life, > 
 Topic 13: movie, just, like, see, one 
 Topic 14: film, films, one, movie, like 
 Topic 15: film, movie, one, like, got 
 Topic 16: film, one, much, like, even 
 Topic 17: movie, one, film, just, like 
 Topic 18: movie, bad, film, good, one 
 Topic 19: movie, like, show, just, one 
 Topic 20: film, movie, one, much, > 
 Topic 21: one, film, match, man, also 
 Topic 22: one, movie, film, movies, seen 
 Topic 23: <, br, >, new, one 
 Topic 24: film, comedy, good, one, like 
 Topic 25: film, mark, boys, one, movie 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 26 (approx. per word bound = -7.659, relative change = 3.646e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 27 (approx. per word bound = -7.659, relative change = 3.469e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 28 (approx. per word bound = -7.659, relative change = 3.399e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 29 (approx. per word bound = -7.658, relative change = 3.070e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 30 (approx. per word bound = -7.658, relative change = 2.761e-05) 
Topic 1: film, characters, one, like, script 
 Topic 2: movie, film, actors, good, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, story, films, love 
 Topic 5: show, one, episode, series, best 
 Topic 6: movie, great, film, one, story 
 Topic 7: movie, film, good, one, interesting 
 Topic 8: film, one, also, story, like 
 Topic 9: film, one, family, <, br 
 Topic 10: film, just, one, like, even 
 Topic 11: film, good, one, time, story 
 Topic 12: film, one, show, life, young 
 Topic 13: movie, just, like, see, one 
 Topic 14: film, films, one, movie, like 
 Topic 15: film, movie, one, like, got 
 Topic 16: film, one, much, like, even 
 Topic 17: movie, one, film, just, like 
 Topic 18: bad, movie, film, good, one 
 Topic 19: movie, like, show, just, one 
 Topic 20: film, one, movie, much, like 
 Topic 21: one, film, match, man, also 
 Topic 22: one, movie, film, movies, seen 
 Topic 23: new, one, movie, <, br 
 Topic 24: film, comedy, good, one, like 
 Topic 25: film, mark, boys, one, watch 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 31 (approx. per word bound = -7.658, relative change = 2.762e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 32 (approx. per word bound = -7.658, relative change = 2.623e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 33 (approx. per word bound = -7.658, relative change = 2.472e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 34 (approx. per word bound = -7.657, relative change = 2.557e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 35 (approx. per word bound = -7.657, relative change = 2.484e-05) 
Topic 1: film, characters, one, like, script 
 Topic 2: movie, film, actors, good, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, story, films, love 
 Topic 5: show, episode, one, series, best 
 Topic 6: movie, great, film, one, story 
 Topic 7: movie, film, good, one, interesting 
 Topic 8: film, one, also, story, like 
 Topic 9: film, one, family, <, br 
 Topic 10: film, just, one, like, even 
 Topic 11: film, good, one, time, story 
 Topic 12: film, one, show, life, young 
 Topic 13: movie, just, like, see, one 
 Topic 14: film, films, one, movie, like 
 Topic 15: film, movie, one, like, got 
 Topic 16: film, one, much, like, even 
 Topic 17: movie, one, film, just, like 
 Topic 18: bad, movie, film, good, one 
 Topic 19: movie, like, show, just, one 
 Topic 20: film, one, much, movie, like 
 Topic 21: one, film, match, man, also 
 Topic 22: one, movie, film, movies, seen 
 Topic 23: new, one, movie, film, like 
 Topic 24: film, comedy, good, one, like 
 Topic 25: film, mark, one, boys, watch 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 36 (approx. per word bound = -7.657, relative change = 1.990e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 37 (approx. per word bound = -7.657, relative change = 1.914e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 38 (approx. per word bound = -7.657, relative change = 2.108e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 39 (approx. per word bound = -7.657, relative change = 1.694e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 40 (approx. per word bound = -7.657, relative change = 1.761e-05) 
Topic 1: film, characters, one, like, script 
 Topic 2: movie, film, actors, good, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, story, films, love 
 Topic 5: show, episode, one, series, best 
 Topic 6: movie, great, film, one, story 
 Topic 7: movie, film, good, one, interesting 
 Topic 8: film, one, also, story, like 
 Topic 9: film, one, family, first, like 
 Topic 10: film, just, one, like, even 
 Topic 11: film, good, one, time, story 
 Topic 12: film, one, show, life, young 
 Topic 13: movie, just, like, see, one 
 Topic 14: film, films, one, movie, like 
 Topic 15: film, movie, one, like, got 
 Topic 16: film, one, much, like, even 
 Topic 17: movie, one, film, just, like 
 Topic 18: bad, movie, film, good, one 
 Topic 19: movie, like, show, just, one 
 Topic 20: film, one, much, movie, like 
 Topic 21: one, film, match, man, also 
 Topic 22: one, film, movie, movies, seen 
 Topic 23: new, one, movie, film, joe 
 Topic 24: film, comedy, good, one, like 
 Topic 25: film, mark, one, boys, college 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 41 (approx. per word bound = -7.656, relative change = 1.812e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 42 (approx. per word bound = -7.656, relative change = 2.174e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 43 (approx. per word bound = -7.656, relative change = 2.054e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 44 (approx. per word bound = -7.656, relative change = 2.242e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 45 (approx. per word bound = -7.656, relative change = 3.411e-05) 
Topic 1: film, characters, one, like, script 
 Topic 2: movie, film, actors, good, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, story, films, love 
 Topic 5: show, episode, series, one, best 
 Topic 6: movie, great, film, one, story 
 Topic 7: movie, film, good, one, interesting 
 Topic 8: film, one, also, story, like 
 Topic 9: film, one, family, first, like 
 Topic 10: film, just, one, like, even 
 Topic 11: film, good, one, time, story 
 Topic 12: film, one, show, life, young 
 Topic 13: movie, just, like, see, one 
 Topic 14: film, films, one, movie, like 
 Topic 15: film, movie, one, like, got 
 Topic 16: film, one, much, like, even 
 Topic 17: movie, one, film, just, like 
 Topic 18: bad, movie, film, one, good 
 Topic 19: movie, like, show, just, one 
 Topic 20: film, one, much, movie, like 
 Topic 21: one, film, match, man, also 
 Topic 22: one, film, movie, movies, seen 
 Topic 23: new, one, movie, film, joe 
 Topic 24: film, comedy, good, one, like 
 Topic 25: film, mark, one, boys, like 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 46 (approx. per word bound = -7.655, relative change = 1.998e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 47 (approx. per word bound = -7.655, relative change = 1.621e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 48 (approx. per word bound = -7.655, relative change = 1.770e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 49 (approx. per word bound = -7.655, relative change = 1.784e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 50 (approx. per word bound = -7.655, relative change = 1.697e-05) 
Topic 1: film, characters, one, like, character 
 Topic 2: movie, film, actors, good, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, story, films, love 
 Topic 5: show, series, episode, one, best 
 Topic 6: movie, great, film, one, story 
 Topic 7: movie, film, good, one, interesting 
 Topic 8: film, one, also, story, like 
 Topic 9: film, one, family, first, like 
 Topic 10: film, just, one, like, even 
 Topic 11: film, good, one, time, story 
 Topic 12: film, one, show, life, young 
 Topic 13: movie, just, like, see, one 
 Topic 14: film, films, one, movie, like 
 Topic 15: film, movie, one, like, got 
 Topic 16: film, one, much, like, even 
 Topic 17: movie, one, film, just, like 
 Topic 18: bad, movie, film, one, good 
 Topic 19: movie, like, show, just, one 
 Topic 20: film, one, much, movie, like 
 Topic 21: one, film, match, man, also 
 Topic 22: one, film, movie, movies, seen 
 Topic 23: new, one, movie, film, joe 
 Topic 24: film, comedy, good, one, like 
 Topic 25: film, mark, one, boys, like 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 51 (approx. per word bound = -7.655, relative change = 1.602e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 52 (approx. per word bound = -7.655, relative change = 1.507e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 53 (approx. per word bound = -7.655, relative change = 1.419e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 54 (approx. per word bound = -7.655, relative change = 1.434e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 55 (approx. per word bound = -7.654, relative change = 1.465e-05) 
Topic 1: film, characters, one, like, character 
 Topic 2: movie, film, actors, good, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, story, films, love 
 Topic 5: show, series, episode, one, best 
 Topic 6: movie, great, film, one, story 
 Topic 7: movie, film, good, one, interesting 
 Topic 8: film, one, also, story, like 
 Topic 9: film, one, family, first, like 
 Topic 10: film, just, one, like, even 
 Topic 11: film, good, one, time, story 
 Topic 12: film, one, show, life, young 
 Topic 13: movie, just, like, see, one 
 Topic 14: film, films, one, movie, like 
 Topic 15: film, one, movie, like, got 
 Topic 16: film, one, much, like, even 
 Topic 17: movie, one, film, just, like 
 Topic 18: bad, movie, film, one, good 
 Topic 19: movie, like, show, just, one 
 Topic 20: film, one, much, movie, like 
 Topic 21: one, film, match, man, also 
 Topic 22: one, film, movie, movies, seen 
 Topic 23: new, one, movie, film, joe 
 Topic 24: film, comedy, good, one, like 
 Topic 25: film, mark, one, boys, like 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 56 (approx. per word bound = -7.654, relative change = 1.529e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 57 (approx. per word bound = -7.654, relative change = 1.414e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 58 (approx. per word bound = -7.654, relative change = 1.333e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 59 (approx. per word bound = -7.654, relative change = 1.357e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 60 (approx. per word bound = -7.654, relative change = 1.270e-05) 
Topic 1: film, characters, one, like, character 
 Topic 2: movie, film, actors, good, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, story, films, love 
 Topic 5: show, series, episode, one, best 
 Topic 6: movie, great, film, one, story 
 Topic 7: movie, film, good, one, interesting 
 Topic 8: film, one, also, story, like 
 Topic 9: film, one, family, first, like 
 Topic 10: film, just, one, like, even 
 Topic 11: film, good, one, time, story 
 Topic 12: film, one, show, life, young 
 Topic 13: movie, just, like, see, one 
 Topic 14: film, films, one, movie, like 
 Topic 15: film, one, movie, like, got 
 Topic 16: film, one, much, like, even 
 Topic 17: movie, one, film, just, like 
 Topic 18: bad, movie, film, one, good 
 Topic 19: movie, like, show, just, one 
 Topic 20: film, one, much, movie, like 
 Topic 21: one, film, match, man, also 
 Topic 22: one, film, movie, movies, seen 
 Topic 23: new, one, movie, film, joe 
 Topic 24: film, comedy, good, one, like 
 Topic 25: film, one, mark, like, boys 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 61 (approx. per word bound = -7.654, relative change = 1.258e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 62 (approx. per word bound = -7.654, relative change = 1.380e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 63 (approx. per word bound = -7.654, relative change = 1.242e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 64 (approx. per word bound = -7.653, relative change = 1.124e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 65 (approx. per word bound = -7.653, relative change = 1.030e-05) 
Topic 1: film, characters, one, like, character 
 Topic 2: movie, film, actors, good, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, story, films, love 
 Topic 5: show, series, episode, one, best 
 Topic 6: movie, great, film, one, story 
 Topic 7: movie, film, good, one, interesting 
 Topic 8: film, one, also, story, like 
 Topic 9: film, one, family, first, like 
 Topic 10: film, just, one, like, even 
 Topic 11: film, good, one, time, story 
 Topic 12: film, one, show, life, young 
 Topic 13: movie, just, like, see, one 
 Topic 14: film, films, one, movie, like 
 Topic 15: film, one, movie, like, got 
 Topic 16: film, one, much, like, even 
 Topic 17: movie, one, film, just, like 
 Topic 18: bad, movie, film, one, good 
 Topic 19: movie, like, show, just, one 
 Topic 20: film, one, much, movie, like 
 Topic 21: one, film, match, man, also 
 Topic 22: one, film, movie, movies, seen 
 Topic 23: new, one, movie, film, joe 
 Topic 24: film, comedy, good, one, like 
 Topic 25: film, one, mark, like, boys 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Completing Iteration 66 (approx. per word bound = -7.653, relative change = 1.075e-05) 
....................................................................................................
Completed E-Step (2 seconds). 
Completed M-Step. 
Model Converged 
Beginning Spectral Initialization 
     Calculating the gram matrix...
     Using only 10000 most frequent terms during initialization...
     Finding anchor words...
    ..................................................
     Recovering initialization...
    ....................................................................................................
Initialization complete.
....................................................................................................
Completed E-Step (9 seconds). 
Completed M-Step. 
Completing Iteration 1 (approx. per word bound = -8.551) 
....................................................................................................
Completed E-Step (7 seconds). 
Completed M-Step. 
Completing Iteration 2 (approx. per word bound = -7.652, relative change = 1.051e-01) 
....................................................................................................
Completed E-Step (7 seconds). 
Completed M-Step. 
Completing Iteration 3 (approx. per word bound = -7.522, relative change = 1.702e-02) 
....................................................................................................
Completed E-Step (6 seconds). 
Completed M-Step. 
Completing Iteration 4 (approx. per word bound = -7.486, relative change = 4.819e-03) 
....................................................................................................
Completed E-Step (6 seconds). 
Completed M-Step. 
Completing Iteration 5 (approx. per word bound = -7.469, relative change = 2.238e-03) 
Topic 1: characters, film, br, <, > 
 Topic 2: movie, director, actors, disappointed, good 
 Topic 3: <, >, br, film, one 
 Topic 4: film, excellent, woman, one, makes 
 Topic 5: character, best, one, film, well 
 Topic 6: movie, great, one, made, peter 
 Topic 7: movie, film, good, interesting, really 
 Topic 8: film, <, >, br, one 
 Topic 9: <, br, >, film, family 
 Topic 10: just, film, like, even, one 
 Topic 11: good, time, one, film, > 
 Topic 12: br, >, <, one, film 
 Topic 13: movie, >, br, <, see 
 Topic 14: >, br, <, film, films 
 Topic 15: film, like, one, movie, never 
 Topic 16: br, >, <, 2, film 
 Topic 17: movie, <, br, >, one 
 Topic 18: movie, bad, good, get, one 
 Topic 19: movie, like, br, >, < 
 Topic 20: >, br, <, movie, much 
 Topic 21: br, >, <, match, one 
 Topic 22: movie, one, movies, br, > 
 Topic 23: br, <, >, new, one 
 Topic 24: film, good, cast, comedy, like 
 Topic 25: movie, >, br, <, love 
 Topic 26: >, <, br, movie, now 
 Topic 27: film, <, >, br, time 
 Topic 28: >, <, br, one, movie 
 Topic 29: show, br, <, >, movie 
 Topic 30: br, >, <, film, films 
 Topic 31: one, like, movie, <, br 
 Topic 32: film, horror, br, >, < 
 Topic 33: <, >, br, people, like 
 Topic 34: movie, really, like, one, just 
 Topic 35: >, <, br, like, one 
 Topic 36: br, >, <, film, one 
 Topic 37: book, <, movie, br, > 
 Topic 38: movie, director, actors, >, br 
 Topic 39: >, br, <, show, one 
 Topic 40: film, br, <, >, show 
 Topic 41: br, <, >, film, story 
 Topic 42: br, >, <, film, one 
 Topic 43: >, <, br, movie, one 
 Topic 44: film, one, br, >, < 
 Topic 45: br, <, >, film, one 
 Topic 46: >, br, <, movie, life 
 Topic 47: film, <, >, br, one 
 Topic 48: film, >, br, <, one 
 Topic 49: episode, series, show, episodes, season 
 Topic 50: church, movie, joseph, smith, wife 
....................................................................................................
Completed E-Step (6 seconds). 
Completed M-Step. 
Completing Iteration 6 (approx. per word bound = -7.460, relative change = 1.234e-03) 
....................................................................................................
Completed E-Step (6 seconds). 
Completed M-Step. 
Completing Iteration 7 (approx. per word bound = -7.454, relative change = 7.622e-04) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 8 (approx. per word bound = -7.450, relative change = 5.127e-04) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 9 (approx. per word bound = -7.447, relative change = 3.489e-04) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 10 (approx. per word bound = -7.446, relative change = 2.530e-04) 
Topic 1: characters, film, script, character, even 
 Topic 2: movie, director, actors, disappointed, film 
 Topic 3: >, <, br, film, one 
 Topic 4: film, excellent, one, makes, woman 
 Topic 5: character, best, film, one, well 
 Topic 6: movie, great, one, film, made 
 Topic 7: movie, film, good, interesting, really 
 Topic 8: film, one, also, plot, think 
 Topic 9: film, br, <, >, family 
 Topic 10: just, film, like, one, even 
 Topic 11: good, time, film, one, actually 
 Topic 12: br, >, <, film, one 
 Topic 13: movie, >, br, <, just 
 Topic 14: >, br, <, film, films 
 Topic 15: film, like, one, even, never 
 Topic 16: film, 2, one, much, br 
 Topic 17: movie, one, film, just, like 
 Topic 18: movie, bad, get, good, one 
 Topic 19: movie, like, tv, just, good 
 Topic 20: >, br, much, <, movie 
 Topic 21: match, br, >, <, one 
 Topic 22: movie, one, movies, seen, film 
 Topic 23: br, <, >, new, one 
 Topic 24: film, good, comedy, cast, like 
 Topic 25: movie, br, >, <, love 
 Topic 26: >, <, br, movie, now 
 Topic 27: film, time, just, one, first 
 Topic 28: one, movie, film, funny, > 
 Topic 29: show, good, see, just, also 
 Topic 30: film, br, >, <, films 
 Topic 31: one, like, movie, film, see 
 Topic 32: film, horror, one, films, good 
 Topic 33: <, >, br, people, like 
 Topic 34: movie, really, like, just, one 
 Topic 35: >, <, br, like, one 
 Topic 36: br, >, <, film, one 
 Topic 37: book, movie, novel, read, film 
 Topic 38: movie, director, actors, film, first 
 Topic 39: show, >, br, <, one 
 Topic 40: film, like, show, one, even 
 Topic 41: film, br, <, >, story 
 Topic 42: film, br, >, <, one 
 Topic 43: movie, one, film, just, > 
 Topic 44: film, one, story, play, heart 
 Topic 45: br, <, >, film, one 
 Topic 46: >, br, <, movie, life 
 Topic 47: film, one, many, <, > 
 Topic 48: film, one, >, br, < 
 Topic 49: episode, series, show, episodes, season 
 Topic 50: church, smith, movie, joseph, lds 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 11 (approx. per word bound = -7.444, relative change = 1.942e-04) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 12 (approx. per word bound = -7.443, relative change = 1.550e-04) 
....................................................................................................
Completed E-Step (6 seconds). 
Completed M-Step. 
Completing Iteration 13 (approx. per word bound = -7.442, relative change = 1.307e-04) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 14 (approx. per word bound = -7.441, relative change = 1.050e-04) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 15 (approx. per word bound = -7.441, relative change = 9.156e-05) 
Topic 1: characters, film, script, character, even 
 Topic 2: movie, film, director, disappointed, actors 
 Topic 3: >, <, br, film, one 
 Topic 4: film, excellent, one, makes, story 
 Topic 5: character, best, film, one, well 
 Topic 6: movie, great, one, film, made 
 Topic 7: movie, film, good, interesting, one 
 Topic 8: film, one, also, plot, think 
 Topic 9: film, br, <, >, family 
 Topic 10: just, film, like, one, even 
 Topic 11: good, time, film, one, actually 
 Topic 12: film, one, life, old, br 
 Topic 13: movie, see, just, bad, like 
 Topic 14: >, br, <, film, films 
 Topic 15: film, one, like, never, even 
 Topic 16: film, 2, one, much, like 
 Topic 17: movie, one, film, just, like 
 Topic 18: bad, movie, get, good, one 
 Topic 19: movie, like, tv, just, lot 
 Topic 20: much, movie, film, one, like 
 Topic 21: match, one, br, <, > 
 Topic 22: one, movie, movies, seen, film 
 Topic 23: new, one, joe, film, br 
 Topic 24: film, good, comedy, cast, see 
 Topic 25: movie, love, film, s, one 
 Topic 26: >, <, br, movie, now 
 Topic 27: film, time, just, one, first 
 Topic 28: one, movie, film, comedy, funny 
 Topic 29: show, good, see, also, just 
 Topic 30: film, films, one, best, br 
 Topic 31: one, like, movie, film, see 
 Topic 32: film, horror, one, films, just 
 Topic 33: <, >, br, people, documentary 
 Topic 34: movie, really, like, just, good 
 Topic 35: >, <, br, like, one 
 Topic 36: film, br, <, >, one 
 Topic 37: book, movie, novel, read, film 
 Topic 38: movie, director, film, actors, first 
 Topic 39: show, one, like, >, br 
 Topic 40: film, like, show, one, even 
 Topic 41: film, story, br, <, > 
 Topic 42: film, one, get, just, br 
 Topic 43: one, movie, film, just, david 
 Topic 44: film, one, story, heart, play 
 Topic 45: br, <, >, film, one 
 Topic 46: >, br, <, movie, life 
 Topic 47: film, one, many, also, war 
 Topic 48: film, one, like, hero, story 
 Topic 49: episode, series, show, episodes, season 
 Topic 50: church, movie, smith, joseph, lds 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 16 (approx. per word bound = -7.440, relative change = 8.058e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 17 (approx. per word bound = -7.439, relative change = 7.160e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 18 (approx. per word bound = -7.439, relative change = 7.070e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 19 (approx. per word bound = -7.438, relative change = 6.957e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 20 (approx. per word bound = -7.438, relative change = 6.370e-05) 
Topic 1: characters, film, script, character, even 
 Topic 2: movie, film, director, actors, disappointed 
 Topic 3: >, <, br, film, one 
 Topic 4: film, excellent, one, makes, story 
 Topic 5: character, best, film, one, well 
 Topic 6: movie, great, one, film, made 
 Topic 7: movie, film, good, interesting, one 
 Topic 8: film, one, also, plot, think 
 Topic 9: film, br, <, >, family 
 Topic 10: just, film, like, one, even 
 Topic 11: good, time, film, one, actually 
 Topic 12: film, one, life, old, man 
 Topic 13: movie, see, bad, just, like 
 Topic 14: film, br, >, <, films 
 Topic 15: film, one, like, never, even 
 Topic 16: film, 2, one, much, like 
 Topic 17: movie, one, film, just, like 
 Topic 18: bad, movie, get, one, good 
 Topic 19: movie, like, tv, just, lot 
 Topic 20: much, movie, film, one, like 
 Topic 21: match, one, br, <, rock 
 Topic 22: one, movies, movie, seen, film 
 Topic 23: new, one, joe, film, time 
 Topic 24: film, good, comedy, cast, see 
 Topic 25: movie, love, film, s, one 
 Topic 26: >, <, br, movie, now 
 Topic 27: film, time, just, one, first 
 Topic 28: one, film, movie, comedy, funny 
 Topic 29: show, good, see, also, just 
 Topic 30: film, films, one, best, great 
 Topic 31: one, like, movie, film, people 
 Topic 32: film, horror, one, films, just 
 Topic 33: <, >, br, people, documentary 
 Topic 34: movie, really, like, just, good 
 Topic 35: >, like, <, br, one 
 Topic 36: film, one, good, bad, br 
 Topic 37: book, movie, novel, read, film 
 Topic 38: movie, director, film, actors, first 
 Topic 39: show, one, like, really, just 
 Topic 40: film, like, show, one, even 
 Topic 41: film, story, one, movie, role 
 Topic 42: film, one, get, just, real 
 Topic 43: one, movie, film, just, david 
 Topic 44: film, one, story, heart, play 
 Topic 45: br, <, >, film, one 
 Topic 46: br, >, <, movie, life 
 Topic 47: film, one, many, also, war 
 Topic 48: film, one, like, hero, films 
 Topic 49: episode, series, show, episodes, season 
 Topic 50: church, movie, smith, joseph, film 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 21 (approx. per word bound = -7.437, relative change = 5.772e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 22 (approx. per word bound = -7.437, relative change = 4.588e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 23 (approx. per word bound = -7.437, relative change = 4.576e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 24 (approx. per word bound = -7.436, relative change = 4.364e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 25 (approx. per word bound = -7.436, relative change = 3.793e-05) 
Topic 1: characters, film, script, character, even 
 Topic 2: movie, film, director, actors, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, excellent, makes, woman 
 Topic 5: character, best, film, one, well 
 Topic 6: movie, great, one, film, made 
 Topic 7: film, movie, good, interesting, one 
 Topic 8: film, one, also, plot, think 
 Topic 9: film, family, br, <, > 
 Topic 10: film, just, like, one, even 
 Topic 11: good, time, film, one, actually 
 Topic 12: film, one, life, old, man 
 Topic 13: movie, bad, see, just, people 
 Topic 14: film, films, like, lost, br 
 Topic 15: film, one, like, never, even 
 Topic 16: film, 2, one, much, like 
 Topic 17: movie, one, film, just, school 
 Topic 18: bad, movie, get, one, good 
 Topic 19: movie, like, tv, just, lot 
 Topic 20: much, movie, film, one, like 
 Topic 21: match, one, rock, wwe, ring 
 Topic 22: one, movies, movie, film, seen 
 Topic 23: new, one, joe, film, time 
 Topic 24: film, good, comedy, cast, see 
 Topic 25: movie, love, film, s, one 
 Topic 26: >, <, br, movie, now 
 Topic 27: film, time, just, one, first 
 Topic 28: one, film, comedy, funny, movie 
 Topic 29: show, good, see, also, just 
 Topic 30: film, films, one, best, great 
 Topic 31: one, like, movie, film, people 
 Topic 32: film, horror, films, one, just 
 Topic 33: people, documentary, film, <, like 
 Topic 34: movie, really, like, just, good 
 Topic 35: like, film, one, >, br 
 Topic 36: film, one, good, bad, pretty 
 Topic 37: book, movie, novel, read, film 
 Topic 38: movie, director, film, actors, first 
 Topic 39: show, one, like, really, just 
 Topic 40: film, like, show, one, even 
 Topic 41: film, story, one, role, movie 
 Topic 42: film, one, get, just, real 
 Topic 43: one, movie, film, just, david 
 Topic 44: film, one, story, heart, play 
 Topic 45: br, <, >, film, one 
 Topic 46: br, >, <, movie, life 
 Topic 47: film, one, many, also, war 
 Topic 48: film, one, like, hero, films 
 Topic 49: episode, series, show, episodes, season 
 Topic 50: church, movie, smith, joseph, film 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 26 (approx. per word bound = -7.436, relative change = 3.347e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 27 (approx. per word bound = -7.436, relative change = 2.909e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 28 (approx. per word bound = -7.436, relative change = 2.210e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 29 (approx. per word bound = -7.435, relative change = 2.580e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 30 (approx. per word bound = -7.435, relative change = 2.434e-05) 
Topic 1: characters, film, script, character, even 
 Topic 2: movie, film, director, actors, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, excellent, makes, woman 
 Topic 5: character, best, film, one, well 
 Topic 6: movie, great, film, one, made 
 Topic 7: film, movie, good, interesting, one 
 Topic 8: film, one, also, plot, think 
 Topic 9: film, family, one, br, < 
 Topic 10: film, just, like, one, even 
 Topic 11: good, time, film, one, actually 
 Topic 12: film, one, life, old, man 
 Topic 13: movie, bad, see, just, people 
 Topic 14: film, films, lost, like, one 
 Topic 15: film, one, never, even, like 
 Topic 16: film, 2, one, much, even 
 Topic 17: movie, one, film, just, school 
 Topic 18: bad, movie, get, one, good 
 Topic 19: movie, like, tv, just, lot 
 Topic 20: much, film, movie, one, like 
 Topic 21: match, one, rock, wwe, ring 
 Topic 22: one, movies, movie, film, seen 
 Topic 23: new, one, joe, film, time 
 Topic 24: film, comedy, good, cast, see 
 Topic 25: movie, love, film, s, one 
 Topic 26: >, <, br, movie, now 
 Topic 27: film, time, just, one, first 
 Topic 28: one, film, comedy, funny, movie 
 Topic 29: show, good, see, also, one 
 Topic 30: film, films, one, best, great 
 Topic 31: one, like, movie, film, people 
 Topic 32: film, horror, films, one, just 
 Topic 33: people, documentary, film, like, one 
 Topic 34: movie, really, like, just, good 
 Topic 35: like, film, one, just, game 
 Topic 36: film, one, good, bad, pretty 
 Topic 37: book, movie, novel, read, film 
 Topic 38: movie, film, director, actors, first 
 Topic 39: show, one, like, really, just 
 Topic 40: film, like, show, one, even 
 Topic 41: film, story, one, role, movie 
 Topic 42: film, one, get, just, real 
 Topic 43: one, film, movie, just, david 
 Topic 44: film, one, story, heart, play 
 Topic 45: br, <, >, film, one 
 Topic 46: br, >, <, movie, life 
 Topic 47: film, one, many, also, war 
 Topic 48: film, one, like, hero, films 
 Topic 49: episode, series, show, episodes, season 
 Topic 50: church, movie, smith, joseph, film 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 31 (approx. per word bound = -7.435, relative change = 2.752e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 32 (approx. per word bound = -7.435, relative change = 2.951e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 33 (approx. per word bound = -7.435, relative change = 2.276e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 34 (approx. per word bound = -7.434, relative change = 2.120e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 35 (approx. per word bound = -7.434, relative change = 2.347e-05) 
Topic 1: characters, film, script, character, even 
 Topic 2: movie, film, director, actors, one 
 Topic 3: >, <, br, film, one 
 Topic 4: film, one, excellent, makes, woman 
 Topic 5: character, best, film, one, well 
 Topic 6: movie, great, film, one, also 
 Topic 7: film, movie, good, interesting, one 
 Topic 8: film, one, also, plot, think 
 Topic 9: film, family, one, br, < 
 Topic 10: film, just, like, one, even 
 Topic 11: good, time, film, one, actually 
 Topic 12: film, one, life, old, man 
 Topic 13: movie, bad, just, see, even 
 Topic 14: film, films, lost, like, one 
 Topic 15: film, one, never, even, like 
 Topic 16: film, 2, one, much, even 
 Topic 17: movie, one, film, just, school 
 Topic 18: bad, movie, get, one, even 
 Topic 19: movie, like, tv, just, lot 
 Topic 20: much, film, movie, one, like 
 Topic 21: match, one, rock, wwe, ring 
 Topic 22: one, movies, movie, film, seen 
 Topic 23: new, one, joe, film, time 
 Topic 24: film, comedy, good, cast, story 
 Topic 25: movie, love, film, s, one 
 Topic 26: >, <, br, movie, now 
 Topic 27: film, time, just, one, first 
 Topic 28: one, film, comedy, funny, humor 
 Topic 29: show, good, see, also, one 
 Topic 30: film, films, one, best, great 
 Topic 31: one, like, movie, film, people 
 Topic 32: film, horror, films, one, just 
 Topic 33: people, documentary, film, like, one 
 Topic 34: movie, like, really, just, good 
 Topic 35: like, film, one, just, game 
 Topic 36: film, one, good, bad, pretty 
 Topic 37: book, movie, novel, film, read 
 Topic 38: movie, film, director, actors, first 
 Topic 39: show, one, like, really, just 
 Topic 40: film, like, show, one, even 
 Topic 41: film, story, one, role, beautiful 
 Topic 42: film, one, get, just, real 
 Topic 43: one, film, movie, just, david 
 Topic 44: film, one, heart, story, play 
 Topic 45: film, br, <, >, one 
 Topic 46: movie, life, br, >, < 
 Topic 47: film, one, many, also, war 
 Topic 48: film, one, like, hero, films 
 Topic 49: episode, series, show, episodes, season 
 Topic 50: church, movie, smith, joseph, film 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 36 (approx. per word bound = -7.434, relative change = 2.298e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 37 (approx. per word bound = -7.434, relative change = 2.441e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 38 (approx. per word bound = -7.434, relative change = 2.119e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 39 (approx. per word bound = -7.434, relative change = 1.905e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 40 (approx. per word bound = -7.433, relative change = 1.407e-05) 
Topic 1: characters, film, script, character, even 
 Topic 2: movie, film, director, actors, one 
 Topic 3: >, <, br, film, just 
 Topic 4: film, one, excellent, makes, woman 
 Topic 5: character, best, film, one, well 
 Topic 6: movie, great, film, one, also 
 Topic 7: film, movie, good, interesting, one 
 Topic 8: film, one, also, plot, think 
 Topic 9: film, family, one, love, story 
 Topic 10: film, just, like, one, even 
 Topic 11: time, good, film, one, actually 
 Topic 12: film, one, life, old, man 
 Topic 13: movie, bad, just, see, even 
 Topic 14: film, films, lost, like, one 
 Topic 15: film, one, never, even, like 
 Topic 16: film, 2, one, much, even 
 Topic 17: movie, one, film, school, just 
 Topic 18: bad, movie, get, one, even 
 Topic 19: movie, like, tv, just, lot 
 Topic 20: much, film, one, movie, like 
 Topic 21: match, one, rock, wwe, ring 
 Topic 22: one, movies, movie, film, seen 
 Topic 23: new, one, joe, film, time 
 Topic 24: film, comedy, good, cast, also 
 Topic 25: movie, love, film, s, one 
 Topic 26: >, <, br, movie, now 
 Topic 27: film, time, just, one, first 
 Topic 28: one, film, comedy, funny, humor 
 Topic 29: show, good, also, see, one 
 Topic 30: film, films, one, best, great 
 Topic 31: one, like, movie, film, people 
 Topic 32: film, horror, films, one, just 
 Topic 33: people, documentary, film, like, one 
 Topic 34: movie, like, really, just, good 
 Topic 35: like, film, one, just, game 
 Topic 36: film, one, good, bad, pretty 
 Topic 37: book, novel, movie, film, read 
 Topic 38: movie, film, director, actors, first 
 Topic 39: show, one, like, really, just 
 Topic 40: film, like, show, one, even 
 Topic 41: film, story, one, role, beautiful 
 Topic 42: film, one, get, just, real 
 Topic 43: one, film, movie, just, david 
 Topic 44: film, one, heart, story, play 
 Topic 45: film, br, <, >, one 
 Topic 46: life, movie, br, >, < 
 Topic 47: film, one, many, also, war 
 Topic 48: film, one, like, hero, films 
 Topic 49: episode, series, show, episodes, season 
 Topic 50: church, movie, smith, joseph, god 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 41 (approx. per word bound = -7.433, relative change = 1.954e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 42 (approx. per word bound = -7.433, relative change = 1.752e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 43 (approx. per word bound = -7.433, relative change = 2.002e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 44 (approx. per word bound = -7.433, relative change = 2.428e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 45 (approx. per word bound = -7.433, relative change = 1.768e-05) 
Topic 1: characters, film, script, character, even 
 Topic 2: movie, film, director, actors, one 
 Topic 3: >, <, br, film, just 
 Topic 4: film, one, excellent, woman, makes 
 Topic 5: character, best, film, one, well 
 Topic 6: movie, great, film, one, also 
 Topic 7: film, movie, good, interesting, one 
 Topic 8: film, one, also, plot, think 
 Topic 9: film, family, one, love, story 
 Topic 10: film, just, like, one, even 
 Topic 11: time, good, film, one, actually 
 Topic 12: film, one, life, old, man 
 Topic 13: movie, bad, just, see, even 
 Topic 14: film, films, lost, like, one 
 Topic 15: film, one, never, even, like 
 Topic 16: film, 2, one, much, even 
 Topic 17: movie, one, film, school, just 
 Topic 18: bad, movie, get, one, even 
 Topic 19: movie, like, tv, just, lot 
 Topic 20: much, film, one, movie, like 
 Topic 21: match, one, rock, wwe, ring 
 Topic 22: one, movies, movie, film, seen 
 Topic 23: new, one, joe, film, time 
 Topic 24: film, comedy, good, cast, also 
 Topic 25: movie, love, film, s, one 
 Topic 26: movie, >, <, br, now 
 Topic 27: film, time, just, one, first 
 Topic 28: one, film, comedy, funny, humor 
 Topic 29: show, good, also, see, one 
 Topic 30: film, films, one, best, great 
 Topic 31: one, like, movie, film, people 
 Topic 32: film, horror, films, one, just 
 Topic 33: people, documentary, film, like, one 
 Topic 34: movie, like, really, just, good 
 Topic 35: like, film, one, just, game 
 Topic 36: film, one, bad, good, pretty 
 Topic 37: book, novel, film, read, movie 
 Topic 38: movie, film, director, actors, first 
 Topic 39: show, one, like, really, just 
 Topic 40: film, like, show, one, even 
 Topic 41: film, story, one, role, beautiful 
 Topic 42: film, one, get, real, just 
 Topic 43: one, film, movie, just, david 
 Topic 44: film, one, heart, story, play 
 Topic 45: film, br, <, >, one 
 Topic 46: life, movie, film, japanese, br 
 Topic 47: film, one, many, also, man 
 Topic 48: film, one, like, hero, films 
 Topic 49: episode, series, show, episodes, season 
 Topic 50: church, movie, smith, joseph, god 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 46 (approx. per word bound = -7.433, relative change = 2.570e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 47 (approx. per word bound = -7.432, relative change = 1.889e-05) 
....................................................................................................
Completed E-Step (6 seconds). 
Completed M-Step. 
Completing Iteration 48 (approx. per word bound = -7.432, relative change = 1.374e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Completing Iteration 49 (approx. per word bound = -7.432, relative change = 1.427e-05) 
....................................................................................................
Completed E-Step (5 seconds). 
Completed M-Step. 
Model Converged 
plot(differentKs)

The plot is a mixed bag for us. Higher values of the held-out likelihood and semantic coherence both indicate better models, while lower values of residuals indicates a better model. It’s also important to note that it’s artificially easy to get more semantic coherence by having fewer topics (semantic coherence is a measure based on how well the top topic words identify the topics). If it was me, I’d probably settle at the midpoint here (25 topics). But there’s no magic solution. Instead, the decision is largely left up to you. That flexibility is nice, but it also means that *you need to be able to defend your choice of K**, because external audiences are going to want to know why you chose the number you did.