<- 16 truth.K
Tutorial10_Topic Models
In this tutorial we’ll learn about K-Means and topic models of two different types, the regular vanilla LDA version, and structural topic models.
K-Means
Introduction
K-means clustering is one of the simplest and popular unsupervised machine learning algorithms. The objective of K-means is: group similar data points together and discover underlying patterns. To achieve this objective, K-means looks for a fixed number (k) of clusters in a dataset.
In this tutorial, we are going to cluster a dataset consisting of health news tweets. These short sentences belong to one of the 16 sources of news considered in the dataset. We are then facing a multi-label classifying problem, with k = 16.
Front-end Matters
First, let’s load the tm
package.
library(tm)
Loading required package: NLP
We download the data from the UCI Machine Learning Repository.
# creating the empty dataset with the formatted columns
<- data.frame(ID = character(),
dataframe datetime = character(),
content = character(),
label = factor())
<- 'https://archive.ics.uci.edu/ml/machine-learning-databases/00438/Health-News-Tweets.zip'
source.url
<- '/tmp/clustering-r'
target.directory <- tempfile()
temporary.file download.file(source.url, temporary.file)
unzip(temporary.file, exdir = target.directory)
# Reading the files
<- paste(target.directory, 'Health-Tweets', sep="/")
target.directory <- list.files(path = target.directory, pattern = '.txt$')
files
# filling the dataframe by reading the text content
for (f in files){
= paste(target.directory, f, sep = "/")
news.filename <- substr(f, 0, nchar(f) - 4) # removing the 4 last characters (.txt)
news.label <- read.csv(news.filename,
news.data encoding = "UTF-8",
header = FALSE,
quote = "",
sep = "|",
col.names = c("ID", "datetime", "content"))
# Trick to ignore last part of tweets which content contains the split character "|"
# no satisfying solution has been found to split and merging extra-columns with the last one
<- news.data[news.data$content != "", ]
news.data 'label'] = news.label # we add the label of the tweet
news.data[
# only considering a little portion of data
# because handling sparse matrix for generic usage is a pain
<- head(news.data, floor(nrow(news.data) * 0.05))
news.data <- rbind(dataframe, news.data)
dataframe
}# deleting the temporary directory
unlink(target.directory, recursive = TRUE)
Preprocessing
Removing urls in the tweets
$content <- iconv(dataframe$content, from = "latin1", to = "UTF-8", sub = "")
dataframe
<- sub("http://([[:alnum:]|[:punct:]])+", '', dataframe$content)
sentences head(sentences)
[1] "Breast cancer risk test devised "
[2] "GP workload harming care - BMA poll "
[3] "Short people's 'heart risk greater' "
[4] "New approach against HIV 'promising' "
[5] "Coalition 'undermined NHS' - doctors "
[6] "Review of case against NHS manager "
For common preprocessing problems, we are going to use tm
package.
<- tm::Corpus(tm::VectorSource(sentences))
corpus # cleaning up
# handling utf-8 encoding problem from the dataset
<- tm::tm_map(corpus, function(x) iconv(x, to = 'UTF-8-MAC', sub = 'byte')) corpus.cleaned
Warning in tm_map.SimpleCorpus(corpus, function(x) iconv(x, to = "UTF-8-MAC", :
transformation drops documents
<- tm::tm_map(corpus.cleaned, tm::removeWords, tm::stopwords('english')) corpus.cleaned
Warning in tm_map.SimpleCorpus(corpus.cleaned, tm::removeWords,
tm::stopwords("english")): transformation drops documents
<- tm::tm_map(corpus.cleaned, tm::stripWhitespace) corpus.cleaned
Warning in tm_map.SimpleCorpus(corpus.cleaned, tm::stripWhitespace):
transformation drops documents
Text Representation
Now, we have a sequence of cleaned sentences that we can use to build our TF-IDF matrix. From this result, we will be able to execute every numerical processes that we want, such as clustering.
# Building the feature matrices
<- tm::DocumentTermMatrix(corpus.cleaned)
tfm dim(tfm)
[1] 3159 9416
tfm
<<DocumentTermMatrix (documents: 3159, terms: 9416)>>
Non-/sparse entries: 26434/29718710
Sparsity : 100%
Maximal term length: 62
Weighting : term frequency (tf)
<- tm::weightTfIdf(tfm)
tfm.tfidf dim(tfm.tfidf)
[1] 3159 9416
tfm.tfidf
<<DocumentTermMatrix (documents: 3159, terms: 9416)>>
Non-/sparse entries: 26434/29718710
Sparsity : 100%
Maximal term length: 62
Weighting : term frequency - inverse document frequency (normalized) (tf-idf)
# we remove a lot of features.
<- tm::removeSparseTerms(tfm.tfidf, 0.999) # (data,allowed sparsity)
tfm.tfidf <- as.matrix(tfm.tfidf)
tfidf.matrix dim(tfidf.matrix)
[1] 3159 1327
# cosine distance matrix (useful for specific clustering algorithms)
= proxy::dist(tfidf.matrix, method = "cosine") dist.matrix
Running the clustering algorithms
K-means
Define clusters so that the total within-cluster variation is minimized.
Hartigan-Wong algorithm (Hartigan and Wong 1979) defines the total within-cluster variation as the sum of squared Euclidean distances between items and the corresponding centroid:
\(W(C_{k}) = \sum_{x_{i} \in C_{k}}(x_{i} - \mu_{k})^{2}\)
- \(x_{i}\): a data point belonging to the cluster \(C_{k}\)
- \(\mu_{k}\): the mean value of the points assigned to the cluster \(C_{k}\)
Total within-cluster variation as follows:
total withinness = \(\sum^{k}_{k=1}W(C_{k}) = \sum^{k}_{k=1} \sum_{x_{i} \in C_{k}} (x_{i} - \mu_{k})^{2}\)
The total within-cluster sum of square measures the goodness of the clustering and we want it to be as small as possible.
<- kmeans(tfidf.matrix, truth.K)
clustering.kmeans names(clustering.kmeans)
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"
Hierarchical clustering
Define a clustering criterion and the pointwise distance matrix. Let’s use the Ward’s methods as the clustering criterion.
<- hclust(dist.matrix, method = "ward.D2")
clustering.hierarchical names(clustering.hierarchical)
[1] "merge" "height" "order" "labels" "method"
[6] "call" "dist.method"
Plotting
To plot the clustering results, as our feature spaces is highly dimensional (TF-IDF representation), we will reduce it to 2 thanks to multi-dimensional scaling. This technique is dependent of our distance metric, but in our case with TF-IDF.
<- cmdscale(dist.matrix, k = 2) # running the PCA
points <- colorspace::diverge_hcl(truth.K) # creating a color palette
palette <- par(mfrow = c(1,2))# partitioning the plot space
previous.par
<- clustering.kmeans$cluster
master.cluster plot(points,
main = 'K-Means clustering',
col = as.factor(master.cluster),
mai = c(0, 0, 0, 0),
mar = c(0, 0, 0, 0),
xaxt = 'n', yaxt = 'n',
xlab = '', ylab = '')
<- cutree(clustering.hierarchical, k = truth.K)
slave.hierarchical plot(points,
main = 'Hierarchical clustering',
col = as.factor(slave.hierarchical),
mai = c(0, 0, 0, 0),
mar = c(0, 0, 0, 0),
xaxt = 'n', yaxt = 'n',
xlab = '', ylab = '')
par(previous.par) # recovering the original plot space parameters
Determining K
In the previous example, we know sentences belong to one of the 16 sources. Then how to decide the best number of clusters (K)?
Here we use the “eblow” method. For each given number of clusters, we can calculate how much variance in the data can be explained by the clustering. Typically, this will increase with the number of clusters. However, the increase would slow down at a certain point and that’s where we choose the number of clusters.
<- 16
k <- NULL
varper for(i in 1:k){
<- kmeans(tfidf.matrix, i)
clustering.kmeans2 <- c(varper, as.numeric(clustering.kmeans2$betweenss)/as.numeric(clustering.kmeans2$totss))
varper
}
varper
[1] 4.757368e-12 5.562061e-03 7.625786e-03 2.237929e-02 2.446568e-02
[6] 3.140732e-02 3.334774e-02 3.149158e-02 3.973052e-02 3.971028e-02
[11] 3.869794e-02 4.446518e-02 4.170454e-02 4.770083e-02 4.848910e-02
[16] 5.173870e-02
plot(1:k, varper, xlab = "# of clusters", ylab = "explained variance")
From the plot, after 3 clusters, the increase in the explained variance becomes slower - there is an elbow here. Therefore, we might use 3 clusters here.
Topic Models
Introduction
The general idea with topic models is to identify the topics that characterize a set of documents. The background on this is interesting; a lot of the initial interest came from digital humanities and library science where you had the need to systematically organize the massive thematic content of the huge collections of texts. Importantly, LDA and STM, the two we’ll discuss this week, are both mixed-membership models, meaning documents are characterized as arising from a distribution over topics, rather than coming from a single topic.
Latent Dirichlet Allocation
For LDA, we will be using the text2vec
package. It is an R
package that provides an efficient framework for text analysis and NLP. It’s a fast implementation of word embedding models (which is where it gets it’s name from) but it also has really nice and fast functionality for LDA.
Algorithms may classify topics within a text set, and Latent Dirichlet Allocation (LDA) is one of the most popular algorithms for topic modeling. LDA uses two basic principles:
- Each document is made up of topics.
- Each word in a document can be attributed to a topic.
Let’s begin!
Front-end Matter
First, let’s load the text2vec
package:
library(text2vec)
We will be using the built in movie reviews dataset that comes with the package. It is labeled and can be called as “movie_review”. Let’s load it in:
# Load in built-in dataset
data("movie_review")
# Prints first ten rows of the dtaset:
head(movie_review, 10)
id sentiment
1 5814_8 1
2 2381_9 1
3 7759_3 0
4 3630_4 0
5 9495_8 1
6 8196_8 1
7 7166_2 0
8 10633_1 0
9 319_1 0
10 8713_10 1
review
1 With all this stuff going down at the moment with MJ i've started listening to his music, watching the odd documentary here and there, watched The Wiz and watched Moonwalker again. Maybe i just want to get a certain insight into this guy who i thought was really cool in the eighties just to maybe make up my mind whether he is guilty or innocent. Moonwalker is part biography, part feature film which i remember going to see at the cinema when it was originally released. Some of it has subtle messages about MJ's feeling towards the press and also the obvious message of drugs are bad m'kay.<br /><br />Visually impressive but of course this is all about Michael Jackson so unless you remotely like MJ in anyway then you are going to hate this and find it boring. Some may call MJ an egotist for consenting to the making of this movie BUT MJ and most of his fans would say that he made it for the fans which if true is really nice of him.<br /><br />The actual feature film bit when it finally starts is only on for 20 minutes or so excluding the Smooth Criminal sequence and Joe Pesci is convincing as a psychopathic all powerful drug lord. Why he wants MJ dead so bad is beyond me. Because MJ overheard his plans? Nah, Joe Pesci's character ranted that he wanted people to know it is he who is supplying drugs etc so i dunno, maybe he just hates MJ's music.<br /><br />Lots of cool things in this like MJ turning into a car and a robot and the whole Speed Demon sequence. Also, the director must have had the patience of a saint when it came to filming the kiddy Bad sequence as usually directors hate working with one kid let alone a whole bunch of them performing a complex dance scene.<br /><br />Bottom line, this movie is for people who like MJ on one level or another (which i think is most people). If not, then stay away. It does try and give off a wholesome message and ironically MJ's bestest buddy in this movie is a girl! Michael Jackson is truly one of the most talented people ever to grace this planet but is he guilty? Well, with all the attention i've gave this subject....hmmm well i don't know because people can be different behind closed doors, i know this for a fact. He is either an extremely nice but stupid guy or one of the most sickest liars. I hope he is not the latter.
2 \\"The Classic War of the Worlds\\" by Timothy Hines is a very entertaining film that obviously goes to great effort and lengths to faithfully recreate H. G. Wells' classic book. Mr. Hines succeeds in doing so. I, and those who watched his film with me, appreciated the fact that it was not the standard, predictable Hollywood fare that comes out every year, e.g. the Spielberg version with Tom Cruise that had only the slightest resemblance to the book. Obviously, everyone looks for different things in a movie. Those who envision themselves as amateur \\"critics\\" look only to criticize everything they can. Others rate a movie on more important bases,like being entertained, which is why most people never agree with the \\"critics\\". We enjoyed the effort Mr. Hines put into being faithful to H.G. Wells' classic novel, and we found it to be very entertaining. This made it easy to overlook what the \\"critics\\" perceive to be its shortcomings.
3 The film starts with a manager (Nicholas Bell) giving welcome investors (Robert Carradine) to Primal Park . A secret project mutating a primal animal using fossilized DNA, like Jurassik Park, and some scientists resurrect one of nature's most fearsome predators, the Sabretooth tiger or Smilodon . Scientific ambition turns deadly, however, and when the high voltage fence is opened the creature escape and begins savagely stalking its prey - the human visitors , tourists and scientific.Meanwhile some youngsters enter in the restricted area of the security center and are attacked by a pack of large pre-historical animals which are deadlier and bigger . In addition , a security agent (Stacy Haiduk) and her mate (Brian Wimmer) fight hardly against the carnivorous Smilodons. The Sabretooths, themselves , of course, are the real star stars and they are astounding terrifyingly though not convincing. The giant animals savagely are stalking its prey and the group run afoul and fight against one nature's most fearsome predators. Furthermore a third Sabretooth more dangerous and slow stalks its victims.<br /><br />The movie delivers the goods with lots of blood and gore as beheading, hair-raising chills,full of scares when the Sabretooths appear with mediocre special effects.The story provides exciting and stirring entertainment but it results to be quite boring .The giant animals are majority made by computer generator and seem totally lousy .Middling performances though the players reacting appropriately to becoming food.Actors give vigorously physical performances dodging the beasts ,running,bound and leaps or dangling over walls . And it packs a ridiculous final deadly scene. No for small kids by realistic,gory and violent attack scenes . Other films about Sabretooths or Smilodon are the following : Sabretooth(2002)by James R Hickox with Vanessa Angel, David Keith and John Rhys Davies and the much better 10.000 BC(2006) by Roland Emmerich with with Steven Strait, Cliff Curtis and Camilla Belle. This motion picture filled with bloody moments is badly directed by George Miller and with no originality because takes too many elements from previous films. Miller is an Australian director usually working for television (Tidal wave, Journey to the center of the earth, and many others) and occasionally for cinema ( The man from Snowy river, Zeus and Roxanne,Robinson Crusoe ). Rating : Below average, bottom of barrel.
4 It must be assumed that those who praised this film (\\"the greatest filmed opera ever,\\" didn't I read somewhere?) either don't care for opera, don't care for Wagner, or don't care about anything except their desire to appear Cultured. Either as a representation of Wagner's swan-song, or as a movie, this strikes me as an unmitigated disaster, with a leaden reading of the score matched to a tricksy, lugubrious realisation of the text.<br /><br />It's questionable that people with ideas as to what an opera (or, for that matter, a play, especially one by Shakespeare) is \\"about\\" should be allowed anywhere near a theatre or film studio; Syberberg, very fashionably, but without the smallest justification from Wagner's text, decided that Parsifal is \\"about\\" bisexual integration, so that the title character, in the latter stages, transmutes into a kind of beatnik babe, though one who continues to sing high tenor -- few if any of the actors in the film are the singers, and we get a double dose of Armin Jordan, the conductor, who is seen as the face (but not heard as the voice) of Amfortas, and also appears monstrously in double exposure as a kind of Batonzilla or Conductor Who Ate Monsalvat during the playing of the Good Friday music -- in which, by the way, the transcendant loveliness of nature is represented by a scattering of shopworn and flaccid crocuses stuck in ill-laid turf, an expedient which baffles me. In the theatre we sometimes have to piece out such imperfections with our thoughts, but I can't think why Syberberg couldn't splice in, for Parsifal and Gurnemanz, mountain pasture as lush as was provided for Julie Andrews in Sound of Music...<br /><br />The sound is hard to endure, the high voices and the trumpets in particular possessing an aural glare that adds another sort of fatigue to our impatience with the uninspired conducting and paralytic unfolding of the ritual. Someone in another review mentioned the 1951 Bayreuth recording, and Knappertsbusch, though his tempi are often very slow, had what Jordan altogether lacks, a sense of pulse, a feeling for the ebb and flow of the music -- and, after half a century, the orchestral sound in that set, in modern pressings, is still superior to this film.
5 Superbly trashy and wondrously unpretentious 80's exploitation, hooray! The pre-credits opening sequences somewhat give the false impression that we're dealing with a serious and harrowing drama, but you need not fear because barely ten minutes later we're up until our necks in nonsensical chainsaw battles, rough fist-fights, lurid dialogs and gratuitous nudity! Bo and Ingrid are two orphaned siblings with an unusually close and even slightly perverted relationship. Can you imagine playfully ripping off the towel that covers your sister's naked body and then stare at her unshaven genitals for several whole minutes? Well, Bo does that to his sister and, judging by her dubbed laughter, she doesn't mind at all. Sick, dude! Anyway, as kids they fled from Russia with their parents, but nasty soldiers brutally slaughtered mommy and daddy. A friendly smuggler took custody over them, however, and even raised and trained Bo and Ingrid into expert smugglers. When the actual plot lifts off, 20 years later, they're facing their ultimate quest as the mythical and incredibly valuable White Fire diamond is coincidentally found in a mine. Very few things in life ever made as little sense as the plot and narrative structure of \\"White Fire\\", but it sure is a lot of fun to watch. Most of the time you have no clue who's beating up who or for what cause (and I bet the actors understood even less) but whatever! The violence is magnificently grotesque and every single plot twist is pleasingly retarded. The script goes totally bonkers beyond repair when suddenly and I won't reveal for what reason Bo needs a replacement for Ingrid and Fred Williamson enters the scene with a big cigar in his mouth and his sleazy black fingers all over the local prostitutes. Bo's principal opponent is an Italian chick with big breasts but a hideous accent, the preposterous but catchy theme song plays at least a dozen times throughout the film, there's the obligatory \\"we're-falling-in-love\\" montage and loads of other attractions! My God, what a brilliant experience. The original French title translates itself as \\"Life to Survive\\", which is uniquely appropriate because it makes just as much sense as the rest of the movie: None!
6 I dont know why people think this is such a bad movie. Its got a pretty good plot, some good action, and the change of location for Harry does not hurt either. Sure some of its offensive and gratuitous but this is not the only movie like that. Eastwood is in good form as Dirty Harry, and I liked Pat Hingle in this movie as the small town cop. If you liked DIRTY HARRY, then you should see this one, its a lot better than THE DEAD POOL. 4/5
7 This movie could have been very good, but comes up way short. Cheesy special effects and so-so acting. I could have looked past that if the story wasn't so lousy. If there was more of a background story, it would have been better. The plot centers around an evil Druid witch who is linked to this woman who gets migraines. The movie drags on and on and never clearly explains anything, it just keeps plodding on. Christopher Walken has a part, but it is completely senseless, as is most of the movie. This movie had potential, but it looks like some really bad made for TV movie. I would avoid this movie.
8 I watched this video at a friend's house. I'm glad I did not waste money buying this one. The video cover has a scene from the 1975 movie Capricorn One. The movie starts out with several clips of rocket blow-ups, most not related to manned flight. Sibrel's smoking gun is a short video clip of the astronauts preparing a video broadcast. He edits in his own voice-over instead of letting us listen to what the crew had to say. The video curiously ends with a showing of the Zapruder film. His claims about radiation, shielding, star photography, and others lead me to believe is he extremely ignorant or has some sort of ax to grind against NASA, the astronauts, or American in general. His science is bad, and so is this video.
9 A friend of mine bought this film for 1, and even then it was grossly overpriced. Despite featuring big names such as Adam Sandler, Billy Bob Thornton and the incredibly talented Burt Young, this film was about as funny as taking a chisel and hammering it straight through your earhole. It uses tired, bottom of the barrel comedic techniques - consistently breaking the fourth wall as Sandler talks to the audience, and seemingly pointless montages of 'hot girls'.<br /><br />Adam Sandler plays a waiter on a cruise ship who wants to make it as a successful comedian in order to become successful with women. When the ship's resident comedian - the shamelessly named 'Dickie' due to his unfathomable success with the opposite gender - is presumed lost at sea, Sandler's character Shecker gets his big break. Dickie is not dead, he's rather locked in the bathroom, presumably sea sick.<br /><br />Perhaps from his mouth he just vomited the worst film of all time.
10 <br /><br />This movie is full of references. Like \\"Mad Max II\\", \\"The wild one\\" and many others. The ladybugs face its a clear reference (or tribute) to Peter Lorre. This movie is a masterpiece. Well talk much more about in the future.
# checking dimensions of dataset
dim(movie_review)
[1] 5000 3
The dataset consists of 5000 movie reviews, each of which is marked as positive (1) or negative (0) in the ‘sentiment’ column.
Now, we need to clean the data up a bit. To make our lives easier and limit the amount of processing power, let’s use the first 3000 reviews. They are located in the ‘review’ column.
Vectorization
Texts can take up a lot of memory themselves, but vectorized texts typically do not. To represent documents in vector space, we first have to come to create mappings from terms to term IDs. We call them terms instead of words because they can be arbitrary n-grams not just single words. We represent a set of documents as a sparse matrix, where each row corresponds to a document and each column corresponds to a term. This can be done in two ways: using the vocabulary itself or by feature hashing.
Let’s perform tokenization and lowercase each token:
# creates string of combined lowercased words
<- tolower(movie_review$review[1:3000])
tokens
# performs tokenization
<- word_tokenizer(tokens)
tokens
# prints first two tokenized rows
head(tokens, 2)
[[1]]
[1] "with" "all" "this" "stuff" "going"
[6] "down" "at" "the" "moment" "with"
[11] "mj" "i've" "started" "listening" "to"
[16] "his" "music" "watching" "the" "odd"
[21] "documentary" "here" "and" "there" "watched"
[26] "the" "wiz" "and" "watched" "moonwalker"
[31] "again" "maybe" "i" "just" "want"
[36] "to" "get" "a" "certain" "insight"
[41] "into" "this" "guy" "who" "i"
[46] "thought" "was" "really" "cool" "in"
[51] "the" "eighties" "just" "to" "maybe"
[56] "make" "up" "my" "mind" "whether"
[61] "he" "is" "guilty" "or" "innocent"
[66] "moonwalker" "is" "part" "biography" "part"
[71] "feature" "film" "which" "i" "remember"
[76] "going" "to" "see" "at" "the"
[81] "cinema" "when" "it" "was" "originally"
[86] "released" "some" "of" "it" "has"
[91] "subtle" "messages" "about" "mj's" "feeling"
[96] "towards" "the" "press" "and" "also"
[101] "the" "obvious" "message" "of" "drugs"
[106] "are" "bad" "m'kay" "br" "br"
[111] "visually" "impressive" "but" "of" "course"
[116] "this" "is" "all" "about" "michael"
[121] "jackson" "so" "unless" "you" "remotely"
[126] "like" "mj" "in" "anyway" "then"
[131] "you" "are" "going" "to" "hate"
[136] "this" "and" "find" "it" "boring"
[141] "some" "may" "call" "mj" "an"
[146] "egotist" "for" "consenting" "to" "the"
[151] "making" "of" "this" "movie" "but"
[156] "mj" "and" "most" "of" "his"
[161] "fans" "would" "say" "that" "he"
[166] "made" "it" "for" "the" "fans"
[171] "which" "if" "true" "is" "really"
[176] "nice" "of" "him" "br" "br"
[181] "the" "actual" "feature" "film" "bit"
[186] "when" "it" "finally" "starts" "is"
[191] "only" "on" "for" "20" "minutes"
[196] "or" "so" "excluding" "the" "smooth"
[201] "criminal" "sequence" "and" "joe" "pesci"
[206] "is" "convincing" "as" "a" "psychopathic"
[211] "all" "powerful" "drug" "lord" "why"
[216] "he" "wants" "mj" "dead" "so"
[221] "bad" "is" "beyond" "me" "because"
[226] "mj" "overheard" "his" "plans" "nah"
[231] "joe" "pesci's" "character" "ranted" "that"
[236] "he" "wanted" "people" "to" "know"
[241] "it" "is" "he" "who" "is"
[246] "supplying" "drugs" "etc" "so" "i"
[251] "dunno" "maybe" "he" "just" "hates"
[256] "mj's" "music" "br" "br" "lots"
[261] "of" "cool" "things" "in" "this"
[266] "like" "mj" "turning" "into" "a"
[271] "car" "and" "a" "robot" "and"
[276] "the" "whole" "speed" "demon" "sequence"
[281] "also" "the" "director" "must" "have"
[286] "had" "the" "patience" "of" "a"
[291] "saint" "when" "it" "came" "to"
[296] "filming" "the" "kiddy" "bad" "sequence"
[301] "as" "usually" "directors" "hate" "working"
[306] "with" "one" "kid" "let" "alone"
[311] "a" "whole" "bunch" "of" "them"
[316] "performing" "a" "complex" "dance" "scene"
[321] "br" "br" "bottom" "line" "this"
[326] "movie" "is" "for" "people" "who"
[331] "like" "mj" "on" "one" "level"
[336] "or" "another" "which" "i" "think"
[341] "is" "most" "people" "if" "not"
[346] "then" "stay" "away" "it" "does"
[351] "try" "and" "give" "off" "a"
[356] "wholesome" "message" "and" "ironically" "mj's"
[361] "bestest" "buddy" "in" "this" "movie"
[366] "is" "a" "girl" "michael" "jackson"
[371] "is" "truly" "one" "of" "the"
[376] "most" "talented" "people" "ever" "to"
[381] "grace" "this" "planet" "but" "is"
[386] "he" "guilty" "well" "with" "all"
[391] "the" "attention" "i've" "gave" "this"
[396] "subject" "hmmm" "well" "i" "don't"
[401] "know" "because" "people" "can" "be"
[406] "different" "behind" "closed" "doors" "i"
[411] "know" "this" "for" "a" "fact"
[416] "he" "is" "either" "an" "extremely"
[421] "nice" "but" "stupid" "guy" "or"
[426] "one" "of" "the" "most" "sickest"
[431] "liars" "i" "hope" "he" "is"
[436] "not" "the" "latter"
[[2]]
[1] "the" "classic" "war" "of" "the"
[6] "worlds" "by" "timothy" "hines" "is"
[11] "a" "very" "entertaining" "film" "that"
[16] "obviously" "goes" "to" "great" "effort"
[21] "and" "lengths" "to" "faithfully" "recreate"
[26] "h" "g" "wells" "classic" "book"
[31] "mr" "hines" "succeeds" "in" "doing"
[36] "so" "i" "and" "those" "who"
[41] "watched" "his" "film" "with" "me"
[46] "appreciated" "the" "fact" "that" "it"
[51] "was" "not" "the" "standard" "predictable"
[56] "hollywood" "fare" "that" "comes" "out"
[61] "every" "year" "e.g" "the" "spielberg"
[66] "version" "with" "tom" "cruise" "that"
[71] "had" "only" "the" "slightest" "resemblance"
[76] "to" "the" "book" "obviously" "everyone"
[81] "looks" "for" "different" "things" "in"
[86] "a" "movie" "those" "who" "envision"
[91] "themselves" "as" "amateur" "critics" "look"
[96] "only" "to" "criticize" "everything" "they"
[101] "can" "others" "rate" "a" "movie"
[106] "on" "more" "important" "bases" "like"
[111] "being" "entertained" "which" "is" "why"
[116] "most" "people" "never" "agree" "with"
[121] "the" "critics" "we" "enjoyed" "the"
[126] "effort" "mr" "hines" "put" "into"
[131] "being" "faithful" "to" "h.g" "wells"
[136] "classic" "novel" "and" "we" "found"
[141] "it" "to" "be" "very" "entertaining"
[146] "this" "made" "it" "easy" "to"
[151] "overlook" "what" "the" "critics" "perceive"
[156] "to" "be" "its" "shortcomings"
Note that text2vec
provides a few tokenizer functions (see ?tokenizers)
. These are just simple wrappers for the base::gsub()
function and are not very fast or flexible. If you need something smarter or faster you can use the tokenizers
package.
We can create an iterator over each token using itoken()
. An iterator is an object that can be iterated upon, meaning that you can traverse through all the values. In our example, we’ll be able to traverse through each token for each row using our newly generated iterator, it
. The general thing to note here is that this is a way to make the approach less memory intensive, something that will turn out to be helpful.
# iterates over each token
<- itoken(tokens, ids = movie_review$id[1:3000], progressbar = FALSE)
it
# prints iterator
it
<itoken>
Inherits from: <CallbackIterator>
Public:
callback: function (x)
clone: function (deep = FALSE)
initialize: function (x, callback = identity)
is_complete: active binding
length: active binding
move_cursor: function ()
nextElem: function ()
x: GenericIterator, iterator, R6
Vocabulary-based Vectorization
As stated above, we represent our corpus as a document-feature matrix. The process for text2vec
is much different than with quanteda
, though the intuition is generally aligned. Effectively, the text2vec
design is intended to be faster and more memory-efficient; the downside is that it’s a little more obtuse. The first step is to create our vocabulary for the DFM. That is simple since we have already created an iterator; all we need to do is place our iterator as an argument inside create_vocabulary()
.
# built the vocabulary
<- create_vocabulary(it)
v
# print vocabulary
v
Number of docs: 3000
0 stopwords: ...
ngram_min = 1; ngram_max = 1
Vocabulary:
term term_count doc_count
<char> <int> <int>
1: 0.3 1 1
2: 0.48 1 1
3: 0.5 1 1
4: 0.89 1 1
5: 00015 1 1
---
33487: to 16370 2826
33488: of 17409 2829
33489: and 19761 2892
33490: a 19776 2910
33491: the 40246 2975
# checking dimensions
dim(v)
[1] 33491 3
We can create stop words or prune our vocabulary with prune_vocabulary()
. We will keep the terms that occur at least 10 times.
# prunes vocabulary
<- prune_vocabulary(v, term_count_min = 10, doc_proportion_max = 0.2)
v
# check dimensions
dim(v)
[1] 5325 3
If we check the dimensions after pruning our vocabulary, we can see that we have less terms. We have removed the very common words so that our vocabulary can contain more high quality and meaningful words.
Before we can create our DFM, we’ll need to vectorize our vocabulary with vocab_vectorizer()
.
# creates a closure that helps transform list of tokens into vector space
<- vocab_vectorizer(v) vectorizer
We now have everything we need to create a DFM. We can pass in our iterator of tokens, our vectorized vocabulary, and a type of matrix (either dgCMatrix
or dgTMatrix
) in create_dtm()
.
# creates document term matrix
<- create_dtm(it, vectorizer, type = "dgTMatrix") dtm
Now we can create our topic model after we have created our DTM. We create our model using LDA$new()
.
# create new LDA model
<- LDA$new(n_topics = 10, doc_topic_prior = 0.1,
lda_model topic_word_prior = 0.01)
# print other methods for LDA
lda_model
<WarpLDA>
Inherits from: <LDA>
Public:
clone: function (deep = FALSE)
components: active binding
fit_transform: function (x, n_iter = 1000, convergence_tol = 0.001, n_check_convergence = 10,
get_top_words: function (n = 10, topic_number = 1L:private$n_topics, lambda = 1)
initialize: function (n_topics = 10L, doc_topic_prior = 50/n_topics, topic_word_prior = 1/n_topics,
plot: function (lambda.step = 0.1, reorder.topics = FALSE, doc_len = private$doc_len,
topic_word_distribution: active binding
transform: function (x, n_iter = 1000, convergence_tol = 0.001, n_check_convergence = 10,
Private:
calc_pseudo_loglikelihood: function (ptr = private$ptr)
check_convert_input: function (x)
components_: NULL
doc_len: NULL
doc_topic_distribution: function ()
doc_topic_distribution_with_prior: function ()
doc_topic_matrix: NULL
doc_topic_prior: 0.1
fit_transform_internal: function (model_ptr, n_iter, convergence_tol, n_check_convergence,
get_c_all: function ()
get_c_all_local: function ()
get_doc_topic_matrix: function (prt, nr)
get_topic_word_count: function ()
init_model_dtm: function (x, ptr = private$ptr)
internal_matrix_formats: list
is_initialized: FALSE
n_iter_inference: 10
n_topics: 10
ptr: NULL
reset_c_local: function ()
run_iter_doc: function (update_topics = TRUE, ptr = private$ptr)
run_iter_word: function (update_topics = TRUE, ptr = private$ptr)
seeds: 877721554.682558 1522846961.08174
set_c_all: function (x)
set_internal_matrix_formats: function (sparse = NULL, dense = NULL)
topic_word_distribution_with_prior: function ()
topic_word_prior: 0.01
transform_internal: function (x, n_iter = 1000, convergence_tol = 0.001, n_check_convergence = 10,
vocabulary: NULL
After printing lda_model
, we can see there are other methods we can use with the model.
Note: the only accessible methods are the ones under ‘Public’. Documentation for all methods and arguments are available here on page 22.
Fitting
We can fit our model with $fit_transform
:
# fitting model
<-
doc_topic_distr $fit_transform(x = dtm, n_iter = 1000,
lda_modelconvergence_tol = 0.001, n_check_convergence = 25,
progressbar = FALSE)
INFO [01:55:46.229] early stopping at 225 iteration
INFO [01:55:50.903] early stopping at 50 iteration
The doc_topic_distr
object is a matrix where each row is a document, each column is a topic, and the cell entry is the proportion of the document estimated to be of the topic. That is, each row is the topic attention distribution for a document.
For example, here’s the topic distribution for the very first document:
barplot(doc_topic_distr[1, ], xlab = "topic",
ylab = "proportion", ylim = c(0,1),
names.arg = 1:ncol(doc_topic_distr))
Describing Topics: Top Words
We can also use $get_top_words
as a method to get the top words for each topic.
# get top n words for topics 1, 5, and 10
$get_top_words(n = 10, topic_number = c(1L, 5L, 10L),
lda_modellambda = 1)
[,1] [,2] [,3]
[1,] "did" "horror" "life"
[2,] "funny" "man" "love"
[3,] "i'm" "scene" "between"
[4,] "know" "there's" "these"
[5,] "actors" "little" "those"
[6,] "watching" "scenes" "seems"
[7,] "say" "pretty" "father"
[8,] "ever" "head" "work"
[9,] "didn't" "house" "always"
[10,] "films" "director" "where"
Also top-words could be stored by “relevance” which also takes into account frequency of word in the corpus (0 < lambda < 1).
The creator recommends setting lambda to be between 0.2 and 0.4. Here’s the difference compared to a lambda of 1:
$get_top_words(n = 10, topic_number = c(1L, 5L, 10L),
lda_modellambda = 0.2)
[,1] [,2] [,3]
[1,] "zombie" "horror" "relationship"
[2,] "funny" "gore" "relationships"
[3,] "zombies" "starts" "arthur"
[4,] "reviews" "killer" "childhood"
[5,] "reminded" "kills" "moral"
[6,] "rubbish" "car" "de"
[7,] "utter" "police" "office"
[8,] "laughing" "head" "loving"
[9,] "laugh" "thriller" "class"
[10,] "i'd" "slasher" "anna"
Apply Learned Model to New Data
One thing we occasionally may be interested in doing is understanding how well our model fits the data. Therefore, we can rely on our supervised learning insights and apply the estimated model to new data. From that, we’ll obtain a document-topic distribution that we can:
# creating iterator
<- itoken(movie_review$review[3001:5000], tolower,
it2 ids = movie_review$id[3001:5000])
word_tokenizer, # creating new DFM
<- create_dtm(it2, vectorizer, type = "dgTMatrix") new_dtm
We will have to use $transform
instead of $fit_transform
since we don’t have to fit the new model (we are attempting to predict the last 2000).
= lda_model$transform(new_dtm) new_doc_topiic_distr
INFO [01:55:56.388] early stopping at 30 iteration
One widely used approach for model hyper-parameter tuning is validation of per-word perplexity on hold-out set. This is quite easy with text2vec
.
Remember that we’ve fit the model on only the first 3000 reviews and predicted the last 2000. Therefore, we will calculate the held-out perplexity on these 2000 docs as follows:
# calculates perplexity between new and old topic word distribution
perplexity(new_dtm, topic_word_distribution = lda_model$topic_word_distribution,
doc_topic_distribution = new_doc_topiic_distr)
[1] 2312.273
The lower perplexity the better. We can imagine adapting our hyperparameters and re-estimating across perplexity to try to evaluate our model performance. Still, perplexity as a measure has it’s own concerns: it doesn’t directly provide insight on whether or not the topics make sense, and tends to prefer bigger models than smaller ones.
Visualization
Normally it would take one line to run the visualization for the LDA model, using the method $plot()
.
Let’s download and load in the required library the visuals depend on:
#install.packages('LDAvis')
library(LDAvis)
# creating plot
$plot() lda_model
Loading required namespace: servr
Structural Topic Model
Imagine you are interested in the topics that are explored in political speeches, and specifically whether Republicans and Democrats focus on different topics. One approach would be to–after estimating an LDA model like above–average the topic proportions by the speaker.
Of course, that seems inefficient. We might want to instead leverage the information on the speech itself as part of the estimation of the topics. That is, we are estimating topical prevalence, and we know that there’s a different speaker, so we should be incorporating that information in estimating the topics. That’s the fundamental idea with Structural Topic Models (STM).
Front-end Matters
STM has really fantastic documentation and a host of related packages for added functionality. You can find the STM website here. Let’s load the package. Note that this will almost certainly take a few minutes given all of the dependencies.
#install.packages("stm")
library(stm)
stm v1.3.7 successfully loaded. See ?stm for help.
Papers, resources, and other materials at structuraltopicmodel.com
library(quanteda)
Package version: 4.1.0
Unicode version: 14.0
ICU version: 71.1
Parallel computing: disabled
See https://quanteda.io for tutorials and examples.
Attaching package: 'quanteda'
The following object is masked from 'package:tm':
stopwords
The following objects are masked from 'package:NLP':
meta, meta<-
Creating the DFM
We’ll continue to use the movie reviews dataset. Now, we’ll leverage the sentiment
variable included in the dataset as a covariate in our estimates of topical prevalence; that is, we expect some topics to be more prevalent in positive reviews as opposed to negative reviews, and vice versa. The variable is coded [0,1], with 0 indicating a negative review and 1 indicating a positive review.
table(movie_review$sentiment)
0 1
2483 2517
STM works differently than the text2vec
, so we’ll need to have our data in a different format now.
<- tokens(movie_review$review,
myTokens remove_punct = TRUE) %>%
tokens_remove(stopwords("en"))
<- dfm(myTokens, tolower = TRUE)
myDfm
dim(myDfm)
[1] 5000 46795
Structural Topic model
Let’s go ahead and estimate our structural topic model now. We’ll incorporate the sentiment
variable as a predictor on prevalence.
# choose our number of topics
<- 5
k
# specify model
<- stm(myDfm,
myModel K = k,
prevalence = ~ sentiment,
data = movie_review,
max.em.its = 1000,
seed = 1234,
init.type = "Spectral")
Beginning Spectral Initialization
Calculating the gram matrix...
Using only 10000 most frequent terms during initialization...
Finding anchor words...
.....
Recovering initialization...
....................................................................................................
Initialization complete.
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 1 (approx. per word bound = -8.600)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 2 (approx. per word bound = -8.036, relative change = 6.567e-02)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 3 (approx. per word bound = -8.001, relative change = 4.337e-03)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 4 (approx. per word bound = -7.992, relative change = 1.060e-03)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 5 (approx. per word bound = -7.988, relative change = 5.005e-04)
Topic 1: film, br, <, >, movie
Topic 2: >, <, br, film, movie
Topic 3: film, br, <, >, one
Topic 4: one, br, <, >, show
Topic 5: movie, like, one, just, br
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 6 (approx. per word bound = -7.986, relative change = 2.995e-04)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 7 (approx. per word bound = -7.984, relative change = 2.025e-04)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 8 (approx. per word bound = -7.983, relative change = 1.519e-04)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 9 (approx. per word bound = -7.982, relative change = 1.200e-04)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 10 (approx. per word bound = -7.981, relative change = 1.004e-04)
Topic 1: film, one, just, like, movie
Topic 2: >, <, br, film, one
Topic 3: film, br, <, >, one
Topic 4: one, show, like, good, film
Topic 5: movie, like, just, one, good
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 11 (approx. per word bound = -7.981, relative change = 8.937e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 12 (approx. per word bound = -7.980, relative change = 7.748e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 13 (approx. per word bound = -7.979, relative change = 6.546e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 14 (approx. per word bound = -7.979, relative change = 5.645e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 15 (approx. per word bound = -7.979, relative change = 4.526e-05)
Topic 1: film, one, just, even, like
Topic 2: >, <, br, film, one
Topic 3: film, one, story, br, <
Topic 4: one, show, good, like, film
Topic 5: movie, like, just, one, good
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 16 (approx. per word bound = -7.978, relative change = 3.836e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 17 (approx. per word bound = -7.978, relative change = 3.395e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 18 (approx. per word bound = -7.978, relative change = 3.022e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 19 (approx. per word bound = -7.978, relative change = 2.789e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 20 (approx. per word bound = -7.977, relative change = 2.811e-05)
Topic 1: film, one, just, even, like
Topic 2: >, <, br, film, one
Topic 3: film, one, story, life, films
Topic 4: one, show, good, best, film
Topic 5: movie, like, just, one, good
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 21 (approx. per word bound = -7.977, relative change = 2.682e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 22 (approx. per word bound = -7.977, relative change = 2.724e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 23 (approx. per word bound = -7.977, relative change = 2.685e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 24 (approx. per word bound = -7.977, relative change = 2.833e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 25 (approx. per word bound = -7.976, relative change = 1.984e-05)
Topic 1: film, one, just, even, like
Topic 2: >, <, br, film, one
Topic 3: film, one, story, life, films
Topic 4: one, show, good, best, film
Topic 5: movie, like, just, one, good
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 26 (approx. per word bound = -7.976, relative change = 1.832e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 27 (approx. per word bound = -7.976, relative change = 1.726e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 28 (approx. per word bound = -7.976, relative change = 1.615e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 29 (approx. per word bound = -7.976, relative change = 1.577e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 30 (approx. per word bound = -7.976, relative change = 1.422e-05)
Topic 1: film, one, just, even, bad
Topic 2: >, <, br, film, one
Topic 3: film, one, story, life, films
Topic 4: one, show, good, best, also
Topic 5: movie, like, just, one, good
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 31 (approx. per word bound = -7.976, relative change = 1.347e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 32 (approx. per word bound = -7.975, relative change = 1.307e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 33 (approx. per word bound = -7.975, relative change = 1.156e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 34 (approx. per word bound = -7.975, relative change = 1.081e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 35 (approx. per word bound = -7.975, relative change = 1.016e-05)
Topic 1: film, one, just, even, bad
Topic 2: >, <, br, film, one
Topic 3: film, one, story, life, films
Topic 4: one, show, good, best, film
Topic 5: movie, like, just, one, good
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Model Converged
Note what’s significantly different from before is added the prevalence
formula. As we discuss in lecture, you can also include variables as content
predictors.
labelTopics(myModel)
Topic 1 Top Words:
Highest Prob: film, one, just, even, bad, like, horror
FREX: slasher, scarecrows, zombie, zombies, kornbluth, scarecrow, seagal
Lift: 2600, addy, amick, antichrist, aranoa, ba, babban
Score: zombie, slasher, kornbluth, zombies, scarecrows, bad, horror
Topic 2 Top Words:
Highest Prob: >, <, br, film, one, movie, like
FREX: >, <, br, zizek, miya, |, aztec
Lift: 1-2, 1-to-10-star, 1.0, 10-minute, 102, 102nd, 11-related
Score: >, <, br, miya, zizek, slugs, oshii
Topic 3 Top Words:
Highest Prob: film, one, story, life, films, also, love
FREX: bettie, mathieu, sidney, macarthur, chavez, israel, lumet
Lift: 1918, 1920, 53, adapt, addressing, aishwarya, albniz
Score: film, bettie, mathieu, macarthur, aids, antwone, flamenco
Topic 4 Top Words:
Highest Prob: one, show, best, good, film, also, man
FREX: wwe, rochester, triple, kolchak, spock, taker, christy
Lift: 1692, 1931, absurdist, adrien, adversaries, alaric, alekos
Score: wwe, taker, bubba, benoit, booker, kolchak, rochester
Topic 5 Top Words:
Highest Prob: movie, like, just, one, good, film, really
FREX: movie, movies, watched, liked, kids, funny, loved
Lift: _____, ______, _real_, @ers, @k, 00015, 1,65m
Score: movie, movies, bad, stupid, like, think, really
The topics again look reasonable, and are generally similar to the topics we estimated earlier. We can go a step further by plotting out the top topics (as groups of words associated with that topic) and their estimated frequency across the corpus.
plot(myModel, type = "summary")
One thing we might want to do is to extract the topics and to assign them to the vector of document proportions; this is often useful if we’re using those topic proportions in any sort of downstream analysis, including just a visualization. The following extracts the top words (here, by frex
, though you can update that to any of the other three top word sets). Then it iterates through the extracted sets and collapses the strings so the tokens are separated by an underscore; this is useful as a variable name for those downstream analyses.
# get the words
<- labelTopics(myModel, n=4)$frex
myTopicNames
# set up an empty vector
<- rep(NA, k)
myTopicLabels
# set up a loop to go through the topics and collapse the words to a single name
for (i in 1:k){
<- paste(myTopicNames[i,], collapse = "_")
myTopicLabels[i]
}
# print the names
myTopicLabels
[1] "slasher_scarecrows_zombie_zombies" ">_<_br_zizek"
[3] "bettie_mathieu_sidney_macarthur" "wwe_rochester_triple_kolchak"
[5] "movie_movies_watched_liked"
Estimate Effect
Recall that we included sentiment
as a predictor variable on topical prevalence. We can extract the effect of the predictor here using the estimateEffect()
function, which takes as arguments a formula, the stm model object, and the metadata containing the predictor variable.
Once we’ve run the function, we can plot the estimated effects of sentiment
on topic prevalence for each of the estimated topics. With a dichotomous predictor variable, we’ll plot these out solely as the difference (method = "difference"
) in topic prevalence across the values of the predictor. Here, our estimate indicates how much more (or less) the topic is discussed when the sentiment of the post is positive.
# estimate effects
<- estimateEffect(formula = 1:k ~ sentiment,
modelEffects stmobj = myModel,
metadata = movie_review)
# plot effects
<- 2
myRows par(mfrow = c(myRows, 3), bty = "n", lwd = 2)
for (i in 1:k){
plot.estimateEffect(modelEffects,
covariate = "sentiment",
xlim = c(-.25, .25),
model = myModel,
topics = modelEffects$topics[i],
method = "difference",
cov.value1 = 1,
cov.value2 = 0,
main = myTopicLabels[i],
printlegend = F,
linecol = "grey26",
labeltype = "custom",
verbose.labels = F,
custom.labels = c(""))
par(new = F)
}
Choosing K
I’m sure you were thinking “How did she select 5 topics?” Well, the answer is that it was just a random number that I selected out of thin air. The choice of the number of topics, typically denoted K, is one of the areas where the design of topic models let’s us as researchers down a bit. While some approaches have been proposed, none have really gained traction. STM includes an approach that we won’t explore based on work by David Mimno that automatically identifies a topic; in reality, it normally results in far more topics than a human would be likely to choose.
With all that said, there is some functionality included with STM to explore different specifications and to try to at least get some idea of how different approaches perform. searchK()
lets you estimate a series of different models, then you can plot a series of different evaluation metrics across those choices.
<- searchK(myDfm,
differentKs K = c(5, 25, 50),
prevalence = ~ sentiment,
N = 250,
data = movie_review,
max.em.its = 1000,
init.type = "Spectral")
Beginning Spectral Initialization
Calculating the gram matrix...
Using only 10000 most frequent terms during initialization...
Finding anchor words...
.....
Recovering initialization...
....................................................................................................
Initialization complete.
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 1 (approx. per word bound = -8.598)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 2 (approx. per word bound = -8.030, relative change = 6.606e-02)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 3 (approx. per word bound = -7.995, relative change = 4.335e-03)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 4 (approx. per word bound = -7.988, relative change = 9.748e-04)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 5 (approx. per word bound = -7.984, relative change = 4.582e-04)
Topic 1: >, br, <, film, one
Topic 2: movie, like, just, one, film
Topic 3: >, <, br, film, movie
Topic 4: film, <, br, >, one
Topic 5: >, br, <, one, show
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 6 (approx. per word bound = -7.982, relative change = 2.807e-04)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 7 (approx. per word bound = -7.980, relative change = 1.973e-04)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 8 (approx. per word bound = -7.979, relative change = 1.478e-04)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 9 (approx. per word bound = -7.978, relative change = 1.109e-04)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 10 (approx. per word bound = -7.977, relative change = 8.100e-05)
Topic 1: film, one, like, characters, just
Topic 2: movie, like, film, just, one
Topic 3: >, <, br, film, one
Topic 4: film, one, <, br, >
Topic 5: >, br, <, one, show
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 11 (approx. per word bound = -7.977, relative change = 6.465e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 12 (approx. per word bound = -7.977, relative change = 5.587e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 13 (approx. per word bound = -7.976, relative change = 4.741e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 14 (approx. per word bound = -7.976, relative change = 4.042e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 15 (approx. per word bound = -7.976, relative change = 3.656e-05)
Topic 1: film, one, like, characters, just
Topic 2: movie, film, like, just, one
Topic 3: >, <, br, film, one
Topic 4: film, one, story, <, br
Topic 5: >, br, <, one, show
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 16 (approx. per word bound = -7.975, relative change = 3.245e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 17 (approx. per word bound = -7.975, relative change = 2.718e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 18 (approx. per word bound = -7.975, relative change = 2.308e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 19 (approx. per word bound = -7.975, relative change = 2.039e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 20 (approx. per word bound = -7.975, relative change = 1.805e-05)
Topic 1: film, one, like, characters, just
Topic 2: movie, film, like, just, one
Topic 3: >, <, br, film, one
Topic 4: film, one, story, life, also
Topic 5: >, br, <, one, show
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 21 (approx. per word bound = -7.974, relative change = 1.768e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 22 (approx. per word bound = -7.974, relative change = 1.726e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 23 (approx. per word bound = -7.974, relative change = 1.614e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 24 (approx. per word bound = -7.974, relative change = 1.481e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 25 (approx. per word bound = -7.974, relative change = 1.389e-05)
Topic 1: film, one, like, characters, even
Topic 2: movie, film, like, just, one
Topic 3: >, <, br, film, one
Topic 4: film, one, story, life, also
Topic 5: one, >, br, <, show
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 26 (approx. per word bound = -7.974, relative change = 1.278e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Completing Iteration 27 (approx. per word bound = -7.974, relative change = 1.014e-05)
....................................................................................................
Completed E-Step (1 seconds).
Completed M-Step.
Model Converged
Beginning Spectral Initialization
Calculating the gram matrix...
Using only 10000 most frequent terms during initialization...
Finding anchor words...
.........................
Recovering initialization...
....................................................................................................
Initialization complete.
....................................................................................................
Completed E-Step (4 seconds).
Completed M-Step.
Completing Iteration 1 (approx. per word bound = -8.574)
....................................................................................................
Completed E-Step (4 seconds).
Completed M-Step.
Completing Iteration 2 (approx. per word bound = -7.788, relative change = 9.173e-02)
....................................................................................................
Completed E-Step (3 seconds).
Completed M-Step.
Completing Iteration 3 (approx. per word bound = -7.706, relative change = 1.047e-02)
....................................................................................................
Completed E-Step (3 seconds).
Completed M-Step.
Completing Iteration 4 (approx. per word bound = -7.688, relative change = 2.341e-03)
....................................................................................................
Completed E-Step (3 seconds).
Completed M-Step.
Completing Iteration 5 (approx. per word bound = -7.681, relative change = 9.745e-04)
Topic 1: br, <, >, characters, film
Topic 2: movie, film, actors, good, br
Topic 3: >, <, br, film, one
Topic 4: film, one, story, br, excellent
Topic 5: show, one, episode, best, series
Topic 6: movie, great, one, film, story
Topic 7: movie, film, good, one, interesting
Topic 8: film, <, >, br, one
Topic 9: film, <, br, >, one
Topic 10: film, just, >, <, br
Topic 11: br, <, >, good, film
Topic 12: >, br, <, one, film
Topic 13: movie, >, br, <, see
Topic 14: >, br, <, film, one
Topic 15: >, <, br, film, movie
Topic 16: br, <, >, film, one
Topic 17: <, >, br, movie, one
Topic 18: movie, bad, good, film, one
Topic 19: movie, like, <, br, >
Topic 20: >, br, <, movie, film
Topic 21: >, <, br, one, film
Topic 22: one, br, <, >, movie
Topic 23: <, br, >, new, movie
Topic 24: film, br, <, >, comedy
Topic 25: br, <, >, mark, movie
....................................................................................................
Completed E-Step (3 seconds).
Completed M-Step.
Completing Iteration 6 (approx. per word bound = -7.676, relative change = 5.703e-04)
....................................................................................................
Completed E-Step (3 seconds).
Completed M-Step.
Completing Iteration 7 (approx. per word bound = -7.673, relative change = 4.066e-04)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 8 (approx. per word bound = -7.671, relative change = 3.013e-04)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 9 (approx. per word bound = -7.669, relative change = 2.385e-04)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 10 (approx. per word bound = -7.668, relative change = 1.856e-04)
Topic 1: characters, film, like, one, br
Topic 2: movie, film, actors, good, one
Topic 3: >, <, br, film, one
Topic 4: film, story, one, films, excellent
Topic 5: show, one, episode, series, best
Topic 6: movie, great, film, one, story
Topic 7: movie, film, good, one, interesting
Topic 8: film, >, <, br, one
Topic 9: film, <, br, >, one
Topic 10: film, just, like, one, even
Topic 11: br, <, >, good, film
Topic 12: >, br, <, one, film
Topic 13: movie, see, just, like, bad
Topic 14: >, br, <, film, films
Topic 15: film, movie, one, like, got
Topic 16: br, <, >, film, one
Topic 17: movie, <, >, br, one
Topic 18: movie, bad, film, good, one
Topic 19: movie, like, just, show, one
Topic 20: >, br, <, film, movie
Topic 21: >, <, br, one, film
Topic 22: one, movie, film, br, <
Topic 23: <, br, >, new, movie
Topic 24: film, comedy, good, one, see
Topic 25: film, br, >, <, mark
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 11 (approx. per word bound = -7.667, relative change = 1.488e-04)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 12 (approx. per word bound = -7.666, relative change = 1.332e-04)
....................................................................................................
Completed E-Step (3 seconds).
Completed M-Step.
Completing Iteration 13 (approx. per word bound = -7.665, relative change = 1.265e-04)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 14 (approx. per word bound = -7.664, relative change = 1.224e-04)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 15 (approx. per word bound = -7.663, relative change = 8.885e-05)
Topic 1: characters, film, one, like, script
Topic 2: movie, film, actors, good, one
Topic 3: >, <, br, film, one
Topic 4: film, story, one, films, love
Topic 5: show, one, episode, series, best
Topic 6: movie, great, film, one, story
Topic 7: movie, film, good, one, interesting
Topic 8: film, one, also, story, like
Topic 9: film, <, br, >, one
Topic 10: film, just, like, one, even
Topic 11: good, film, one, time, br
Topic 12: >, br, <, film, one
Topic 13: movie, see, just, like, bad
Topic 14: >, <, br, film, films
Topic 15: film, movie, one, like, got
Topic 16: film, one, much, like, even
Topic 17: movie, one, <, br, >
Topic 18: movie, bad, film, good, one
Topic 19: movie, like, just, show, one
Topic 20: >, br, <, film, movie
Topic 21: one, >, <, br, film
Topic 22: one, movie, film, movies, seen
Topic 23: <, br, >, new, movie
Topic 24: film, comedy, good, one, see
Topic 25: film, mark, >, br, <
....................................................................................................
Completed E-Step (3 seconds).
Completed M-Step.
Completing Iteration 16 (approx. per word bound = -7.662, relative change = 6.174e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 17 (approx. per word bound = -7.662, relative change = 5.780e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 18 (approx. per word bound = -7.662, relative change = 4.898e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 19 (approx. per word bound = -7.661, relative change = 3.730e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 20 (approx. per word bound = -7.661, relative change = 3.811e-05)
Topic 1: film, characters, one, like, script
Topic 2: movie, film, actors, good, one
Topic 3: >, <, br, film, one
Topic 4: film, one, story, films, love
Topic 5: show, one, episode, series, best
Topic 6: movie, great, film, one, story
Topic 7: movie, film, good, one, interesting
Topic 8: film, one, also, story, like
Topic 9: film, <, br, >, one
Topic 10: film, just, one, like, even
Topic 11: good, film, one, time, story
Topic 12: >, br, film, <, one
Topic 13: movie, just, see, like, bad
Topic 14: film, <, br, >, films
Topic 15: film, movie, one, like, got
Topic 16: film, one, much, like, even
Topic 17: movie, one, film, just, like
Topic 18: movie, bad, film, good, one
Topic 19: movie, like, show, just, one
Topic 20: film, >, br, <, movie
Topic 21: one, film, match, >, <
Topic 22: one, movie, film, movies, seen
Topic 23: <, br, >, new, movie
Topic 24: film, comedy, good, one, like
Topic 25: film, mark, boys, one, movie
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 21 (approx. per word bound = -7.661, relative change = 4.129e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 22 (approx. per word bound = -7.660, relative change = 3.903e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 23 (approx. per word bound = -7.660, relative change = 4.001e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 24 (approx. per word bound = -7.660, relative change = 4.297e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 25 (approx. per word bound = -7.660, relative change = 4.002e-05)
Topic 1: film, characters, one, like, script
Topic 2: movie, film, actors, good, one
Topic 3: >, <, br, film, one
Topic 4: film, one, story, films, love
Topic 5: show, one, episode, series, best
Topic 6: movie, great, film, one, story
Topic 7: movie, film, good, one, interesting
Topic 8: film, one, also, story, like
Topic 9: film, one, <, br, >
Topic 10: film, just, one, like, even
Topic 11: good, film, one, time, story
Topic 12: film, one, show, life, >
Topic 13: movie, just, like, see, one
Topic 14: film, films, one, movie, like
Topic 15: film, movie, one, like, got
Topic 16: film, one, much, like, even
Topic 17: movie, one, film, just, like
Topic 18: movie, bad, film, good, one
Topic 19: movie, like, show, just, one
Topic 20: film, movie, one, much, >
Topic 21: one, film, match, man, also
Topic 22: one, movie, film, movies, seen
Topic 23: <, br, >, new, one
Topic 24: film, comedy, good, one, like
Topic 25: film, mark, boys, one, movie
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 26 (approx. per word bound = -7.659, relative change = 3.646e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 27 (approx. per word bound = -7.659, relative change = 3.469e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 28 (approx. per word bound = -7.659, relative change = 3.399e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 29 (approx. per word bound = -7.658, relative change = 3.070e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 30 (approx. per word bound = -7.658, relative change = 2.761e-05)
Topic 1: film, characters, one, like, script
Topic 2: movie, film, actors, good, one
Topic 3: >, <, br, film, one
Topic 4: film, one, story, films, love
Topic 5: show, one, episode, series, best
Topic 6: movie, great, film, one, story
Topic 7: movie, film, good, one, interesting
Topic 8: film, one, also, story, like
Topic 9: film, one, family, <, br
Topic 10: film, just, one, like, even
Topic 11: film, good, one, time, story
Topic 12: film, one, show, life, young
Topic 13: movie, just, like, see, one
Topic 14: film, films, one, movie, like
Topic 15: film, movie, one, like, got
Topic 16: film, one, much, like, even
Topic 17: movie, one, film, just, like
Topic 18: bad, movie, film, good, one
Topic 19: movie, like, show, just, one
Topic 20: film, one, movie, much, like
Topic 21: one, film, match, man, also
Topic 22: one, movie, film, movies, seen
Topic 23: new, one, movie, <, br
Topic 24: film, comedy, good, one, like
Topic 25: film, mark, boys, one, watch
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 31 (approx. per word bound = -7.658, relative change = 2.762e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 32 (approx. per word bound = -7.658, relative change = 2.623e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 33 (approx. per word bound = -7.658, relative change = 2.472e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 34 (approx. per word bound = -7.657, relative change = 2.557e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 35 (approx. per word bound = -7.657, relative change = 2.484e-05)
Topic 1: film, characters, one, like, script
Topic 2: movie, film, actors, good, one
Topic 3: >, <, br, film, one
Topic 4: film, one, story, films, love
Topic 5: show, episode, one, series, best
Topic 6: movie, great, film, one, story
Topic 7: movie, film, good, one, interesting
Topic 8: film, one, also, story, like
Topic 9: film, one, family, <, br
Topic 10: film, just, one, like, even
Topic 11: film, good, one, time, story
Topic 12: film, one, show, life, young
Topic 13: movie, just, like, see, one
Topic 14: film, films, one, movie, like
Topic 15: film, movie, one, like, got
Topic 16: film, one, much, like, even
Topic 17: movie, one, film, just, like
Topic 18: bad, movie, film, good, one
Topic 19: movie, like, show, just, one
Topic 20: film, one, much, movie, like
Topic 21: one, film, match, man, also
Topic 22: one, movie, film, movies, seen
Topic 23: new, one, movie, film, like
Topic 24: film, comedy, good, one, like
Topic 25: film, mark, one, boys, watch
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 36 (approx. per word bound = -7.657, relative change = 1.990e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 37 (approx. per word bound = -7.657, relative change = 1.914e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 38 (approx. per word bound = -7.657, relative change = 2.108e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 39 (approx. per word bound = -7.657, relative change = 1.694e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 40 (approx. per word bound = -7.657, relative change = 1.761e-05)
Topic 1: film, characters, one, like, script
Topic 2: movie, film, actors, good, one
Topic 3: >, <, br, film, one
Topic 4: film, one, story, films, love
Topic 5: show, episode, one, series, best
Topic 6: movie, great, film, one, story
Topic 7: movie, film, good, one, interesting
Topic 8: film, one, also, story, like
Topic 9: film, one, family, first, like
Topic 10: film, just, one, like, even
Topic 11: film, good, one, time, story
Topic 12: film, one, show, life, young
Topic 13: movie, just, like, see, one
Topic 14: film, films, one, movie, like
Topic 15: film, movie, one, like, got
Topic 16: film, one, much, like, even
Topic 17: movie, one, film, just, like
Topic 18: bad, movie, film, good, one
Topic 19: movie, like, show, just, one
Topic 20: film, one, much, movie, like
Topic 21: one, film, match, man, also
Topic 22: one, film, movie, movies, seen
Topic 23: new, one, movie, film, joe
Topic 24: film, comedy, good, one, like
Topic 25: film, mark, one, boys, college
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 41 (approx. per word bound = -7.656, relative change = 1.812e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 42 (approx. per word bound = -7.656, relative change = 2.174e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 43 (approx. per word bound = -7.656, relative change = 2.054e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 44 (approx. per word bound = -7.656, relative change = 2.242e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 45 (approx. per word bound = -7.656, relative change = 3.411e-05)
Topic 1: film, characters, one, like, script
Topic 2: movie, film, actors, good, one
Topic 3: >, <, br, film, one
Topic 4: film, one, story, films, love
Topic 5: show, episode, series, one, best
Topic 6: movie, great, film, one, story
Topic 7: movie, film, good, one, interesting
Topic 8: film, one, also, story, like
Topic 9: film, one, family, first, like
Topic 10: film, just, one, like, even
Topic 11: film, good, one, time, story
Topic 12: film, one, show, life, young
Topic 13: movie, just, like, see, one
Topic 14: film, films, one, movie, like
Topic 15: film, movie, one, like, got
Topic 16: film, one, much, like, even
Topic 17: movie, one, film, just, like
Topic 18: bad, movie, film, one, good
Topic 19: movie, like, show, just, one
Topic 20: film, one, much, movie, like
Topic 21: one, film, match, man, also
Topic 22: one, film, movie, movies, seen
Topic 23: new, one, movie, film, joe
Topic 24: film, comedy, good, one, like
Topic 25: film, mark, one, boys, like
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 46 (approx. per word bound = -7.655, relative change = 1.998e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 47 (approx. per word bound = -7.655, relative change = 1.621e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 48 (approx. per word bound = -7.655, relative change = 1.770e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 49 (approx. per word bound = -7.655, relative change = 1.784e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 50 (approx. per word bound = -7.655, relative change = 1.697e-05)
Topic 1: film, characters, one, like, character
Topic 2: movie, film, actors, good, one
Topic 3: >, <, br, film, one
Topic 4: film, one, story, films, love
Topic 5: show, series, episode, one, best
Topic 6: movie, great, film, one, story
Topic 7: movie, film, good, one, interesting
Topic 8: film, one, also, story, like
Topic 9: film, one, family, first, like
Topic 10: film, just, one, like, even
Topic 11: film, good, one, time, story
Topic 12: film, one, show, life, young
Topic 13: movie, just, like, see, one
Topic 14: film, films, one, movie, like
Topic 15: film, movie, one, like, got
Topic 16: film, one, much, like, even
Topic 17: movie, one, film, just, like
Topic 18: bad, movie, film, one, good
Topic 19: movie, like, show, just, one
Topic 20: film, one, much, movie, like
Topic 21: one, film, match, man, also
Topic 22: one, film, movie, movies, seen
Topic 23: new, one, movie, film, joe
Topic 24: film, comedy, good, one, like
Topic 25: film, mark, one, boys, like
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 51 (approx. per word bound = -7.655, relative change = 1.602e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 52 (approx. per word bound = -7.655, relative change = 1.507e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 53 (approx. per word bound = -7.655, relative change = 1.419e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 54 (approx. per word bound = -7.655, relative change = 1.434e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 55 (approx. per word bound = -7.654, relative change = 1.465e-05)
Topic 1: film, characters, one, like, character
Topic 2: movie, film, actors, good, one
Topic 3: >, <, br, film, one
Topic 4: film, one, story, films, love
Topic 5: show, series, episode, one, best
Topic 6: movie, great, film, one, story
Topic 7: movie, film, good, one, interesting
Topic 8: film, one, also, story, like
Topic 9: film, one, family, first, like
Topic 10: film, just, one, like, even
Topic 11: film, good, one, time, story
Topic 12: film, one, show, life, young
Topic 13: movie, just, like, see, one
Topic 14: film, films, one, movie, like
Topic 15: film, one, movie, like, got
Topic 16: film, one, much, like, even
Topic 17: movie, one, film, just, like
Topic 18: bad, movie, film, one, good
Topic 19: movie, like, show, just, one
Topic 20: film, one, much, movie, like
Topic 21: one, film, match, man, also
Topic 22: one, film, movie, movies, seen
Topic 23: new, one, movie, film, joe
Topic 24: film, comedy, good, one, like
Topic 25: film, mark, one, boys, like
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 56 (approx. per word bound = -7.654, relative change = 1.529e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 57 (approx. per word bound = -7.654, relative change = 1.414e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 58 (approx. per word bound = -7.654, relative change = 1.333e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 59 (approx. per word bound = -7.654, relative change = 1.357e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 60 (approx. per word bound = -7.654, relative change = 1.270e-05)
Topic 1: film, characters, one, like, character
Topic 2: movie, film, actors, good, one
Topic 3: >, <, br, film, one
Topic 4: film, one, story, films, love
Topic 5: show, series, episode, one, best
Topic 6: movie, great, film, one, story
Topic 7: movie, film, good, one, interesting
Topic 8: film, one, also, story, like
Topic 9: film, one, family, first, like
Topic 10: film, just, one, like, even
Topic 11: film, good, one, time, story
Topic 12: film, one, show, life, young
Topic 13: movie, just, like, see, one
Topic 14: film, films, one, movie, like
Topic 15: film, one, movie, like, got
Topic 16: film, one, much, like, even
Topic 17: movie, one, film, just, like
Topic 18: bad, movie, film, one, good
Topic 19: movie, like, show, just, one
Topic 20: film, one, much, movie, like
Topic 21: one, film, match, man, also
Topic 22: one, film, movie, movies, seen
Topic 23: new, one, movie, film, joe
Topic 24: film, comedy, good, one, like
Topic 25: film, one, mark, like, boys
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 61 (approx. per word bound = -7.654, relative change = 1.258e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 62 (approx. per word bound = -7.654, relative change = 1.380e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 63 (approx. per word bound = -7.654, relative change = 1.242e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 64 (approx. per word bound = -7.653, relative change = 1.124e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 65 (approx. per word bound = -7.653, relative change = 1.030e-05)
Topic 1: film, characters, one, like, character
Topic 2: movie, film, actors, good, one
Topic 3: >, <, br, film, one
Topic 4: film, one, story, films, love
Topic 5: show, series, episode, one, best
Topic 6: movie, great, film, one, story
Topic 7: movie, film, good, one, interesting
Topic 8: film, one, also, story, like
Topic 9: film, one, family, first, like
Topic 10: film, just, one, like, even
Topic 11: film, good, one, time, story
Topic 12: film, one, show, life, young
Topic 13: movie, just, like, see, one
Topic 14: film, films, one, movie, like
Topic 15: film, one, movie, like, got
Topic 16: film, one, much, like, even
Topic 17: movie, one, film, just, like
Topic 18: bad, movie, film, one, good
Topic 19: movie, like, show, just, one
Topic 20: film, one, much, movie, like
Topic 21: one, film, match, man, also
Topic 22: one, film, movie, movies, seen
Topic 23: new, one, movie, film, joe
Topic 24: film, comedy, good, one, like
Topic 25: film, one, mark, like, boys
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Completing Iteration 66 (approx. per word bound = -7.653, relative change = 1.075e-05)
....................................................................................................
Completed E-Step (2 seconds).
Completed M-Step.
Model Converged
Beginning Spectral Initialization
Calculating the gram matrix...
Using only 10000 most frequent terms during initialization...
Finding anchor words...
..................................................
Recovering initialization...
....................................................................................................
Initialization complete.
....................................................................................................
Completed E-Step (9 seconds).
Completed M-Step.
Completing Iteration 1 (approx. per word bound = -8.551)
....................................................................................................
Completed E-Step (7 seconds).
Completed M-Step.
Completing Iteration 2 (approx. per word bound = -7.652, relative change = 1.051e-01)
....................................................................................................
Completed E-Step (7 seconds).
Completed M-Step.
Completing Iteration 3 (approx. per word bound = -7.522, relative change = 1.702e-02)
....................................................................................................
Completed E-Step (6 seconds).
Completed M-Step.
Completing Iteration 4 (approx. per word bound = -7.486, relative change = 4.819e-03)
....................................................................................................
Completed E-Step (6 seconds).
Completed M-Step.
Completing Iteration 5 (approx. per word bound = -7.469, relative change = 2.238e-03)
Topic 1: characters, film, br, <, >
Topic 2: movie, director, actors, disappointed, good
Topic 3: <, >, br, film, one
Topic 4: film, excellent, woman, one, makes
Topic 5: character, best, one, film, well
Topic 6: movie, great, one, made, peter
Topic 7: movie, film, good, interesting, really
Topic 8: film, <, >, br, one
Topic 9: <, br, >, film, family
Topic 10: just, film, like, even, one
Topic 11: good, time, one, film, >
Topic 12: br, >, <, one, film
Topic 13: movie, >, br, <, see
Topic 14: >, br, <, film, films
Topic 15: film, like, one, movie, never
Topic 16: br, >, <, 2, film
Topic 17: movie, <, br, >, one
Topic 18: movie, bad, good, get, one
Topic 19: movie, like, br, >, <
Topic 20: >, br, <, movie, much
Topic 21: br, >, <, match, one
Topic 22: movie, one, movies, br, >
Topic 23: br, <, >, new, one
Topic 24: film, good, cast, comedy, like
Topic 25: movie, >, br, <, love
Topic 26: >, <, br, movie, now
Topic 27: film, <, >, br, time
Topic 28: >, <, br, one, movie
Topic 29: show, br, <, >, movie
Topic 30: br, >, <, film, films
Topic 31: one, like, movie, <, br
Topic 32: film, horror, br, >, <
Topic 33: <, >, br, people, like
Topic 34: movie, really, like, one, just
Topic 35: >, <, br, like, one
Topic 36: br, >, <, film, one
Topic 37: book, <, movie, br, >
Topic 38: movie, director, actors, >, br
Topic 39: >, br, <, show, one
Topic 40: film, br, <, >, show
Topic 41: br, <, >, film, story
Topic 42: br, >, <, film, one
Topic 43: >, <, br, movie, one
Topic 44: film, one, br, >, <
Topic 45: br, <, >, film, one
Topic 46: >, br, <, movie, life
Topic 47: film, <, >, br, one
Topic 48: film, >, br, <, one
Topic 49: episode, series, show, episodes, season
Topic 50: church, movie, joseph, smith, wife
....................................................................................................
Completed E-Step (6 seconds).
Completed M-Step.
Completing Iteration 6 (approx. per word bound = -7.460, relative change = 1.234e-03)
....................................................................................................
Completed E-Step (6 seconds).
Completed M-Step.
Completing Iteration 7 (approx. per word bound = -7.454, relative change = 7.622e-04)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 8 (approx. per word bound = -7.450, relative change = 5.127e-04)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 9 (approx. per word bound = -7.447, relative change = 3.489e-04)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 10 (approx. per word bound = -7.446, relative change = 2.530e-04)
Topic 1: characters, film, script, character, even
Topic 2: movie, director, actors, disappointed, film
Topic 3: >, <, br, film, one
Topic 4: film, excellent, one, makes, woman
Topic 5: character, best, film, one, well
Topic 6: movie, great, one, film, made
Topic 7: movie, film, good, interesting, really
Topic 8: film, one, also, plot, think
Topic 9: film, br, <, >, family
Topic 10: just, film, like, one, even
Topic 11: good, time, film, one, actually
Topic 12: br, >, <, film, one
Topic 13: movie, >, br, <, just
Topic 14: >, br, <, film, films
Topic 15: film, like, one, even, never
Topic 16: film, 2, one, much, br
Topic 17: movie, one, film, just, like
Topic 18: movie, bad, get, good, one
Topic 19: movie, like, tv, just, good
Topic 20: >, br, much, <, movie
Topic 21: match, br, >, <, one
Topic 22: movie, one, movies, seen, film
Topic 23: br, <, >, new, one
Topic 24: film, good, comedy, cast, like
Topic 25: movie, br, >, <, love
Topic 26: >, <, br, movie, now
Topic 27: film, time, just, one, first
Topic 28: one, movie, film, funny, >
Topic 29: show, good, see, just, also
Topic 30: film, br, >, <, films
Topic 31: one, like, movie, film, see
Topic 32: film, horror, one, films, good
Topic 33: <, >, br, people, like
Topic 34: movie, really, like, just, one
Topic 35: >, <, br, like, one
Topic 36: br, >, <, film, one
Topic 37: book, movie, novel, read, film
Topic 38: movie, director, actors, film, first
Topic 39: show, >, br, <, one
Topic 40: film, like, show, one, even
Topic 41: film, br, <, >, story
Topic 42: film, br, >, <, one
Topic 43: movie, one, film, just, >
Topic 44: film, one, story, play, heart
Topic 45: br, <, >, film, one
Topic 46: >, br, <, movie, life
Topic 47: film, one, many, <, >
Topic 48: film, one, >, br, <
Topic 49: episode, series, show, episodes, season
Topic 50: church, smith, movie, joseph, lds
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 11 (approx. per word bound = -7.444, relative change = 1.942e-04)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 12 (approx. per word bound = -7.443, relative change = 1.550e-04)
....................................................................................................
Completed E-Step (6 seconds).
Completed M-Step.
Completing Iteration 13 (approx. per word bound = -7.442, relative change = 1.307e-04)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 14 (approx. per word bound = -7.441, relative change = 1.050e-04)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 15 (approx. per word bound = -7.441, relative change = 9.156e-05)
Topic 1: characters, film, script, character, even
Topic 2: movie, film, director, disappointed, actors
Topic 3: >, <, br, film, one
Topic 4: film, excellent, one, makes, story
Topic 5: character, best, film, one, well
Topic 6: movie, great, one, film, made
Topic 7: movie, film, good, interesting, one
Topic 8: film, one, also, plot, think
Topic 9: film, br, <, >, family
Topic 10: just, film, like, one, even
Topic 11: good, time, film, one, actually
Topic 12: film, one, life, old, br
Topic 13: movie, see, just, bad, like
Topic 14: >, br, <, film, films
Topic 15: film, one, like, never, even
Topic 16: film, 2, one, much, like
Topic 17: movie, one, film, just, like
Topic 18: bad, movie, get, good, one
Topic 19: movie, like, tv, just, lot
Topic 20: much, movie, film, one, like
Topic 21: match, one, br, <, >
Topic 22: one, movie, movies, seen, film
Topic 23: new, one, joe, film, br
Topic 24: film, good, comedy, cast, see
Topic 25: movie, love, film, s, one
Topic 26: >, <, br, movie, now
Topic 27: film, time, just, one, first
Topic 28: one, movie, film, comedy, funny
Topic 29: show, good, see, also, just
Topic 30: film, films, one, best, br
Topic 31: one, like, movie, film, see
Topic 32: film, horror, one, films, just
Topic 33: <, >, br, people, documentary
Topic 34: movie, really, like, just, good
Topic 35: >, <, br, like, one
Topic 36: film, br, <, >, one
Topic 37: book, movie, novel, read, film
Topic 38: movie, director, film, actors, first
Topic 39: show, one, like, >, br
Topic 40: film, like, show, one, even
Topic 41: film, story, br, <, >
Topic 42: film, one, get, just, br
Topic 43: one, movie, film, just, david
Topic 44: film, one, story, heart, play
Topic 45: br, <, >, film, one
Topic 46: >, br, <, movie, life
Topic 47: film, one, many, also, war
Topic 48: film, one, like, hero, story
Topic 49: episode, series, show, episodes, season
Topic 50: church, movie, smith, joseph, lds
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 16 (approx. per word bound = -7.440, relative change = 8.058e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 17 (approx. per word bound = -7.439, relative change = 7.160e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 18 (approx. per word bound = -7.439, relative change = 7.070e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 19 (approx. per word bound = -7.438, relative change = 6.957e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 20 (approx. per word bound = -7.438, relative change = 6.370e-05)
Topic 1: characters, film, script, character, even
Topic 2: movie, film, director, actors, disappointed
Topic 3: >, <, br, film, one
Topic 4: film, excellent, one, makes, story
Topic 5: character, best, film, one, well
Topic 6: movie, great, one, film, made
Topic 7: movie, film, good, interesting, one
Topic 8: film, one, also, plot, think
Topic 9: film, br, <, >, family
Topic 10: just, film, like, one, even
Topic 11: good, time, film, one, actually
Topic 12: film, one, life, old, man
Topic 13: movie, see, bad, just, like
Topic 14: film, br, >, <, films
Topic 15: film, one, like, never, even
Topic 16: film, 2, one, much, like
Topic 17: movie, one, film, just, like
Topic 18: bad, movie, get, one, good
Topic 19: movie, like, tv, just, lot
Topic 20: much, movie, film, one, like
Topic 21: match, one, br, <, rock
Topic 22: one, movies, movie, seen, film
Topic 23: new, one, joe, film, time
Topic 24: film, good, comedy, cast, see
Topic 25: movie, love, film, s, one
Topic 26: >, <, br, movie, now
Topic 27: film, time, just, one, first
Topic 28: one, film, movie, comedy, funny
Topic 29: show, good, see, also, just
Topic 30: film, films, one, best, great
Topic 31: one, like, movie, film, people
Topic 32: film, horror, one, films, just
Topic 33: <, >, br, people, documentary
Topic 34: movie, really, like, just, good
Topic 35: >, like, <, br, one
Topic 36: film, one, good, bad, br
Topic 37: book, movie, novel, read, film
Topic 38: movie, director, film, actors, first
Topic 39: show, one, like, really, just
Topic 40: film, like, show, one, even
Topic 41: film, story, one, movie, role
Topic 42: film, one, get, just, real
Topic 43: one, movie, film, just, david
Topic 44: film, one, story, heart, play
Topic 45: br, <, >, film, one
Topic 46: br, >, <, movie, life
Topic 47: film, one, many, also, war
Topic 48: film, one, like, hero, films
Topic 49: episode, series, show, episodes, season
Topic 50: church, movie, smith, joseph, film
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 21 (approx. per word bound = -7.437, relative change = 5.772e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 22 (approx. per word bound = -7.437, relative change = 4.588e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 23 (approx. per word bound = -7.437, relative change = 4.576e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 24 (approx. per word bound = -7.436, relative change = 4.364e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 25 (approx. per word bound = -7.436, relative change = 3.793e-05)
Topic 1: characters, film, script, character, even
Topic 2: movie, film, director, actors, one
Topic 3: >, <, br, film, one
Topic 4: film, one, excellent, makes, woman
Topic 5: character, best, film, one, well
Topic 6: movie, great, one, film, made
Topic 7: film, movie, good, interesting, one
Topic 8: film, one, also, plot, think
Topic 9: film, family, br, <, >
Topic 10: film, just, like, one, even
Topic 11: good, time, film, one, actually
Topic 12: film, one, life, old, man
Topic 13: movie, bad, see, just, people
Topic 14: film, films, like, lost, br
Topic 15: film, one, like, never, even
Topic 16: film, 2, one, much, like
Topic 17: movie, one, film, just, school
Topic 18: bad, movie, get, one, good
Topic 19: movie, like, tv, just, lot
Topic 20: much, movie, film, one, like
Topic 21: match, one, rock, wwe, ring
Topic 22: one, movies, movie, film, seen
Topic 23: new, one, joe, film, time
Topic 24: film, good, comedy, cast, see
Topic 25: movie, love, film, s, one
Topic 26: >, <, br, movie, now
Topic 27: film, time, just, one, first
Topic 28: one, film, comedy, funny, movie
Topic 29: show, good, see, also, just
Topic 30: film, films, one, best, great
Topic 31: one, like, movie, film, people
Topic 32: film, horror, films, one, just
Topic 33: people, documentary, film, <, like
Topic 34: movie, really, like, just, good
Topic 35: like, film, one, >, br
Topic 36: film, one, good, bad, pretty
Topic 37: book, movie, novel, read, film
Topic 38: movie, director, film, actors, first
Topic 39: show, one, like, really, just
Topic 40: film, like, show, one, even
Topic 41: film, story, one, role, movie
Topic 42: film, one, get, just, real
Topic 43: one, movie, film, just, david
Topic 44: film, one, story, heart, play
Topic 45: br, <, >, film, one
Topic 46: br, >, <, movie, life
Topic 47: film, one, many, also, war
Topic 48: film, one, like, hero, films
Topic 49: episode, series, show, episodes, season
Topic 50: church, movie, smith, joseph, film
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 26 (approx. per word bound = -7.436, relative change = 3.347e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 27 (approx. per word bound = -7.436, relative change = 2.909e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 28 (approx. per word bound = -7.436, relative change = 2.210e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 29 (approx. per word bound = -7.435, relative change = 2.580e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 30 (approx. per word bound = -7.435, relative change = 2.434e-05)
Topic 1: characters, film, script, character, even
Topic 2: movie, film, director, actors, one
Topic 3: >, <, br, film, one
Topic 4: film, one, excellent, makes, woman
Topic 5: character, best, film, one, well
Topic 6: movie, great, film, one, made
Topic 7: film, movie, good, interesting, one
Topic 8: film, one, also, plot, think
Topic 9: film, family, one, br, <
Topic 10: film, just, like, one, even
Topic 11: good, time, film, one, actually
Topic 12: film, one, life, old, man
Topic 13: movie, bad, see, just, people
Topic 14: film, films, lost, like, one
Topic 15: film, one, never, even, like
Topic 16: film, 2, one, much, even
Topic 17: movie, one, film, just, school
Topic 18: bad, movie, get, one, good
Topic 19: movie, like, tv, just, lot
Topic 20: much, film, movie, one, like
Topic 21: match, one, rock, wwe, ring
Topic 22: one, movies, movie, film, seen
Topic 23: new, one, joe, film, time
Topic 24: film, comedy, good, cast, see
Topic 25: movie, love, film, s, one
Topic 26: >, <, br, movie, now
Topic 27: film, time, just, one, first
Topic 28: one, film, comedy, funny, movie
Topic 29: show, good, see, also, one
Topic 30: film, films, one, best, great
Topic 31: one, like, movie, film, people
Topic 32: film, horror, films, one, just
Topic 33: people, documentary, film, like, one
Topic 34: movie, really, like, just, good
Topic 35: like, film, one, just, game
Topic 36: film, one, good, bad, pretty
Topic 37: book, movie, novel, read, film
Topic 38: movie, film, director, actors, first
Topic 39: show, one, like, really, just
Topic 40: film, like, show, one, even
Topic 41: film, story, one, role, movie
Topic 42: film, one, get, just, real
Topic 43: one, film, movie, just, david
Topic 44: film, one, story, heart, play
Topic 45: br, <, >, film, one
Topic 46: br, >, <, movie, life
Topic 47: film, one, many, also, war
Topic 48: film, one, like, hero, films
Topic 49: episode, series, show, episodes, season
Topic 50: church, movie, smith, joseph, film
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 31 (approx. per word bound = -7.435, relative change = 2.752e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 32 (approx. per word bound = -7.435, relative change = 2.951e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 33 (approx. per word bound = -7.435, relative change = 2.276e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 34 (approx. per word bound = -7.434, relative change = 2.120e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 35 (approx. per word bound = -7.434, relative change = 2.347e-05)
Topic 1: characters, film, script, character, even
Topic 2: movie, film, director, actors, one
Topic 3: >, <, br, film, one
Topic 4: film, one, excellent, makes, woman
Topic 5: character, best, film, one, well
Topic 6: movie, great, film, one, also
Topic 7: film, movie, good, interesting, one
Topic 8: film, one, also, plot, think
Topic 9: film, family, one, br, <
Topic 10: film, just, like, one, even
Topic 11: good, time, film, one, actually
Topic 12: film, one, life, old, man
Topic 13: movie, bad, just, see, even
Topic 14: film, films, lost, like, one
Topic 15: film, one, never, even, like
Topic 16: film, 2, one, much, even
Topic 17: movie, one, film, just, school
Topic 18: bad, movie, get, one, even
Topic 19: movie, like, tv, just, lot
Topic 20: much, film, movie, one, like
Topic 21: match, one, rock, wwe, ring
Topic 22: one, movies, movie, film, seen
Topic 23: new, one, joe, film, time
Topic 24: film, comedy, good, cast, story
Topic 25: movie, love, film, s, one
Topic 26: >, <, br, movie, now
Topic 27: film, time, just, one, first
Topic 28: one, film, comedy, funny, humor
Topic 29: show, good, see, also, one
Topic 30: film, films, one, best, great
Topic 31: one, like, movie, film, people
Topic 32: film, horror, films, one, just
Topic 33: people, documentary, film, like, one
Topic 34: movie, like, really, just, good
Topic 35: like, film, one, just, game
Topic 36: film, one, good, bad, pretty
Topic 37: book, movie, novel, film, read
Topic 38: movie, film, director, actors, first
Topic 39: show, one, like, really, just
Topic 40: film, like, show, one, even
Topic 41: film, story, one, role, beautiful
Topic 42: film, one, get, just, real
Topic 43: one, film, movie, just, david
Topic 44: film, one, heart, story, play
Topic 45: film, br, <, >, one
Topic 46: movie, life, br, >, <
Topic 47: film, one, many, also, war
Topic 48: film, one, like, hero, films
Topic 49: episode, series, show, episodes, season
Topic 50: church, movie, smith, joseph, film
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 36 (approx. per word bound = -7.434, relative change = 2.298e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 37 (approx. per word bound = -7.434, relative change = 2.441e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 38 (approx. per word bound = -7.434, relative change = 2.119e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 39 (approx. per word bound = -7.434, relative change = 1.905e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 40 (approx. per word bound = -7.433, relative change = 1.407e-05)
Topic 1: characters, film, script, character, even
Topic 2: movie, film, director, actors, one
Topic 3: >, <, br, film, just
Topic 4: film, one, excellent, makes, woman
Topic 5: character, best, film, one, well
Topic 6: movie, great, film, one, also
Topic 7: film, movie, good, interesting, one
Topic 8: film, one, also, plot, think
Topic 9: film, family, one, love, story
Topic 10: film, just, like, one, even
Topic 11: time, good, film, one, actually
Topic 12: film, one, life, old, man
Topic 13: movie, bad, just, see, even
Topic 14: film, films, lost, like, one
Topic 15: film, one, never, even, like
Topic 16: film, 2, one, much, even
Topic 17: movie, one, film, school, just
Topic 18: bad, movie, get, one, even
Topic 19: movie, like, tv, just, lot
Topic 20: much, film, one, movie, like
Topic 21: match, one, rock, wwe, ring
Topic 22: one, movies, movie, film, seen
Topic 23: new, one, joe, film, time
Topic 24: film, comedy, good, cast, also
Topic 25: movie, love, film, s, one
Topic 26: >, <, br, movie, now
Topic 27: film, time, just, one, first
Topic 28: one, film, comedy, funny, humor
Topic 29: show, good, also, see, one
Topic 30: film, films, one, best, great
Topic 31: one, like, movie, film, people
Topic 32: film, horror, films, one, just
Topic 33: people, documentary, film, like, one
Topic 34: movie, like, really, just, good
Topic 35: like, film, one, just, game
Topic 36: film, one, good, bad, pretty
Topic 37: book, novel, movie, film, read
Topic 38: movie, film, director, actors, first
Topic 39: show, one, like, really, just
Topic 40: film, like, show, one, even
Topic 41: film, story, one, role, beautiful
Topic 42: film, one, get, just, real
Topic 43: one, film, movie, just, david
Topic 44: film, one, heart, story, play
Topic 45: film, br, <, >, one
Topic 46: life, movie, br, >, <
Topic 47: film, one, many, also, war
Topic 48: film, one, like, hero, films
Topic 49: episode, series, show, episodes, season
Topic 50: church, movie, smith, joseph, god
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 41 (approx. per word bound = -7.433, relative change = 1.954e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 42 (approx. per word bound = -7.433, relative change = 1.752e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 43 (approx. per word bound = -7.433, relative change = 2.002e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 44 (approx. per word bound = -7.433, relative change = 2.428e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 45 (approx. per word bound = -7.433, relative change = 1.768e-05)
Topic 1: characters, film, script, character, even
Topic 2: movie, film, director, actors, one
Topic 3: >, <, br, film, just
Topic 4: film, one, excellent, woman, makes
Topic 5: character, best, film, one, well
Topic 6: movie, great, film, one, also
Topic 7: film, movie, good, interesting, one
Topic 8: film, one, also, plot, think
Topic 9: film, family, one, love, story
Topic 10: film, just, like, one, even
Topic 11: time, good, film, one, actually
Topic 12: film, one, life, old, man
Topic 13: movie, bad, just, see, even
Topic 14: film, films, lost, like, one
Topic 15: film, one, never, even, like
Topic 16: film, 2, one, much, even
Topic 17: movie, one, film, school, just
Topic 18: bad, movie, get, one, even
Topic 19: movie, like, tv, just, lot
Topic 20: much, film, one, movie, like
Topic 21: match, one, rock, wwe, ring
Topic 22: one, movies, movie, film, seen
Topic 23: new, one, joe, film, time
Topic 24: film, comedy, good, cast, also
Topic 25: movie, love, film, s, one
Topic 26: movie, >, <, br, now
Topic 27: film, time, just, one, first
Topic 28: one, film, comedy, funny, humor
Topic 29: show, good, also, see, one
Topic 30: film, films, one, best, great
Topic 31: one, like, movie, film, people
Topic 32: film, horror, films, one, just
Topic 33: people, documentary, film, like, one
Topic 34: movie, like, really, just, good
Topic 35: like, film, one, just, game
Topic 36: film, one, bad, good, pretty
Topic 37: book, novel, film, read, movie
Topic 38: movie, film, director, actors, first
Topic 39: show, one, like, really, just
Topic 40: film, like, show, one, even
Topic 41: film, story, one, role, beautiful
Topic 42: film, one, get, real, just
Topic 43: one, film, movie, just, david
Topic 44: film, one, heart, story, play
Topic 45: film, br, <, >, one
Topic 46: life, movie, film, japanese, br
Topic 47: film, one, many, also, man
Topic 48: film, one, like, hero, films
Topic 49: episode, series, show, episodes, season
Topic 50: church, movie, smith, joseph, god
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 46 (approx. per word bound = -7.433, relative change = 2.570e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 47 (approx. per word bound = -7.432, relative change = 1.889e-05)
....................................................................................................
Completed E-Step (6 seconds).
Completed M-Step.
Completing Iteration 48 (approx. per word bound = -7.432, relative change = 1.374e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Completing Iteration 49 (approx. per word bound = -7.432, relative change = 1.427e-05)
....................................................................................................
Completed E-Step (5 seconds).
Completed M-Step.
Model Converged
plot(differentKs)
The plot is a mixed bag for us. Higher values of the held-out likelihood and semantic coherence both indicate better models, while lower values of residuals indicates a better model. It’s also important to note that it’s artificially easy to get more semantic coherence by having fewer topics (semantic coherence is a measure based on how well the top topic words identify the topics). If it was me, I’d probably settle at the midpoint here (25 topics). But there’s no magic solution. Instead, the decision is largely left up to you. That flexibility is nice, but it also means that *you need to be able to defend your choice of K**, because external audiences are going to want to know why you chose the number you did.