Reading Tea Leaves: How Humans Interpret Topic Models Jonathan Chang Jordan Boyd-Graber

by user

on 15 сентября 2016

Category: Documents

>> Downloads: 11

views

Report

Comments

Description

Download Reading Tea Leaves: How Humans Interpret Topic Models Jonathan Chang Jordan Boyd-Graber

Transcript

Reading Tea Leaves: How Humans Interpret Topic Models Jonathan Chang Jordan Boyd-Graber

Reading Tea Leaves: How Humans Interpret
Topic Models
Jonathan Chang Jordan Boyd-Graber
Sean Gerrish Chong Wang David M. Blei
Princeton University
Identity Guidelines
The signature
The Princeton signature should be
included on all official Princeton
publications.
Signature or wordmark
SIGNATURE
It need not appear in a large or
prominent position, but it should be
included to signal a publication’s
core relationship to Princeton.
6
NIPS 2009
Dec 9th, 2009
The signature should be used in all
situations that call for the Princeton
“logo,” for instance on promotional
materials for public events for which
Princeton is the primary sponsor.
The wordmark
The wordmark may be used alone in
some situations. When the wordmark
is used alone, the shield should also
appear somewhere on the publication
or item. It may be particularly useful to
incorporate the wordmark separately
from the shield in display settings or
in less formal situations.
WORDMARK
Chang, Boyd-Graber, Wang, Gerrish, Blei
Digital art for the Princeton signature
Reading Tea Leaves
Topic Models in a Nutshell
From an input corpus → words to topics
Corpus
Forget the Bootleg, Just
Download
the Heralded
Movie Legally
Multiplex
As
Linchpin
To of
Growth
The Shape
Cinema,
Transformed
At Crew
the Click
A Peaceful
Putsof
aWhere
MouseIts Mouth Is
Muppets
Stock Trades: A Better Deal
For The
Investors
Simple
three Isn't
big Internet
portals
begin
to
distinguish
Red Light, Green Light: A
among
themselves
as
2-Tone
L.E.D. to
shopping
malls
Simplify Screens
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Topic Models in a Nutshell
From an input corpus → words to topics
TOPIC 1
TOPIC 2
TOPIC 3
computer,
technology,
system,
service, site,
phone,
internet,
machine
sell, sale,
store, product,
business,
advertising,
market,
consumer
play, film,
movie, theater,
production,
star, director,
stage
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Evaluation
Corpus
Forget the Bootleg, Just
Download
the Heralded
Movie Legally
Multiplex
As
Linchpin
To of
Growth
The Shape
Cinema,
Transformed
At Crew
the Click
A Peaceful
Putsof
aWhere
MouseIts Mouth Is
Muppets
Stock Trades:
A Better Deal
For The
Investors
Isn't
Simple
three big Internet
portals
begin Green
to distinguish
Red Light,
Light: A
among
themselves
as
2-Tone
L.E.D. to
shopping
malls
Simplify Screens
Model A
-4.8
Model B
-15.16
Model C
-23.42
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Held-out Data
Sony Ericsson's Infinite
Hope
for a Turnaround
For
Search,
Murdoch Looks
to
a Deal
Microsoft
Price
WarWith
Brews
Between
Amazon and Wal-Mart
Evaluation
Held-out Log
Likelihood
Corpus
Forget the Bootleg, Just
Download
the Heralded
Movie Legally
Multiplex
As
Linchpin
To of
Growth
The Shape
Cinema,
Transformed
At
the
Click
A Peaceful Crew Putsof
aWhere
MouseIts Mouth Is
Muppets
Stock Trades:
A Better Deal
For The
Investors
Simple
three Isn't
big Internet
portals
begin
to distinguish
Red Light, Green
Light: A
among
themselves
as
2-Tone
L.E.D. to
shopping
malls
Simplify Screens
Model A
-4.8
Model B
-15.16
Model C
-23.42
Held-out Data
Sony Ericsson's Infinite
Hope
for a Turnaround
For
Search,
Murdoch Looks
to
a Deal
Microsoft
Price
WarWith
Brews
Between
Amazon and Wal-Mart
Measures predictive power, not latent structure
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Qualitative Evaluation of the Latent Space
[Hofmann, 1999]
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Qualitative Evaluation of the Latent Space
[Blei et will
al.,give2003]
The William Randolph Hearst Foundation
$1.25 million to Lincoln Center, Metropolitan Opera Co., New York Philharmonic and Juilliard School. “Our board felt that we had a
real opportunity to make a mark on the future of the performing arts with these grants an act
every bit as important as our traditional areas of support in health, medical research, education
and the social services,” Hearst Foundation President Randolph A. Hearst said Monday in
announcing the grants. Lincoln Center’s share will be $200,000 for its new building, which
Chang,young
Boyd-Graber,
Gerrish,new
Blei publicReading
Tea Leaves
will house
artists Wang,
and provide
facilities.
The Metropolitan Opera Co. and
Qualitative Evaluation of the Latent Space
sequentially
nal posterior:
L
,
, αm)
(4)
c assignments
le (Nt )\l,n is
t in the tuple,
mpled.
on document
direct transla-
DA
DE
EL
EN
ES
FI
FR
IT
NL
PT
SV
centralbank europæiske ecb s lån centralbanks
zentralbank ezb bank europäischen investitionsbank darlehen
!"#$%&' !"#$%&'( )%*!"+), %)! )%*!"+),( !"#$%&%(
bank central ecb banks european monetary
banco central europeo bce bancos centrales
keskuspankin ekp n euroopan keskuspankki eip
banque centrale bce européenne banques monétaire
banca centrale bce europea banche prestiti
bank centrale ecb europese banken leningen
banco central europeu bce bancos empréstimos
centralbanken europeiska ecb centralbankens s lån
DA
DE
EL
EN
ES
FI
FR
IT
NL
PT
SV
børn familie udnyttelse børns børnene seksuel
kinder kindern familie ausbeutung familien eltern
$'+-+# $'+-+.* /+)/01*%+' /+)/01*%+'( 0/*%2( $'+-+),(
children family child sexual families exploitation
niños familia hijos sexual infantil menores
lasten lapsia lapset perheen lapsen lapsiin
enfants famille enfant parents exploitation familles
bambini famiglia figli minori sessuale sfruttamento
kinderen kind gezin seksuele ouders familie
crianças família filhos sexual criança infantil
barn barnen familjen sexuellt familj utnyttjande
[Mimno et al., 2009]
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Qualitative Evaluation of the Latent Space
(a) Topic labeled as SSL
.io.Serializable {
mes are split to
nce factor calcuthat is extracted
h occurs in comtifiers such as in
me. These differur set of location
system, classes
are more likely
t for that class.
Keyword
ssl
expr
init
engine
var
ctx
ptemp
mctx
lookup
modssl
ca
Probability
0.373722
0.042501
0.033207
0.026447
0.022222
0.023067
0.017153
0.013773
0.012083
0.011238
0.009548
(b) Topic labeled as Logging
Keyword
log
request
mod
config
name
headers
autoindex
format
cmd
header
add
Probability
0.141733
.036017
0.0311
0.029871
0.023725
0.021266
0.020037
0.017578
0.01512
0.013891
0.012661
Table 2: Sample Topics extracted from Apache
source code
[Maskeri et
al.,Petstore
2008]
5.2 Topic Extraction
For
In order to investigate the effect of naming on topic extraction results we considered Petstore, a J2EE blueprint
implementation by Sun Microsystems. Being a reference
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Info. Extraction
Information Retrieval
Lexical Semantics
MUC Terrorism
Metaphor
Morphology
Named Entities*
Paraphrase/RTE
Parsing
Plan-Based Dialogue
Probabilistic Models
Prosody
Semantic Roles*
Yale School Semantics
Sentiment
Speech Recognition
Spell Correction
Statistical MT
Statistical Parsing
Summarization
Syntactic Structure
TAG Grammars*
Unification
WSD*
Word Segmentation
WordNet*
system text information muc extraction template names patterns pattern domain
document documents query retrieval question information answer term text web
semantic relations domain noun corpus relation nouns lexical ontology patterns
slot incident tgt target id hum phys type fills perp
metaphor literal metonymy metaphors metaphorical essay metonymic essays qualia analogy
word morphological lexicon form dictionary analysis morphology lexical stem arabic
entity named entities ne names ner recognition ace nes mentions mention
paraphrases paraphrase entailment paraphrasing textual para rte pascal entailed dagan
parsing grammar parser parse rule sentence input left grammars np
plan discourse speaker action model goal act utterance user information
model word probability set data number algorithm language corpus method
prosodic speech pitch boundary prosody phrase boundaries accent repairs intonation
semantic verb frame argument verbs role roles predicate arguments
knowledge system semantic language concept representation information network concepts base
subjective opinion sentiment negative polarity positive wiebe reviews sentence opinions
speech recognition word system language data speaker error test spoken
errors error correction spelling ocr correct corrections checker basque corrected detection
english word alignment language source target sentence machine bilingual mt
dependency parsing treebank parser tree parse head model al np
sentence text evaluation document topic summary summarization human summaries score
verb noun syntactic sentence phrase np subject structure case clause
tree node trees nodes derivation tag root figure adjoining grammar
feature structure grammar lexical constraints unification constraint type structures rule
word senses wordnet disambiguation lexical semantic context similarity dictionary
chinese word character segmentation corpus dictionary korean language table system
synset wordnet synsets hypernym ili wordnets hypernyms eurowordnet hyponym ewn wn
Qualitative Evaluation of the Latent Space
Table 2: Top 10 words for 43 of the topics. Starred topics are hand-seeded.
[Hall et al., 2008]
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Topics are shown to users during web search.
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Users can refine queries through topics.
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Key Points
1
2
3
4
“Reading Tea Leaves” alternative: measuring interpretability
Direct, quantitative human evaluation of latent space
Testing interpretability on different models and corpora
Disconnect with likelihood
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Key Points
3
4
New York Times
W
! !
!
0.80
What we care about
2
“Reading Tea Leaves” alternative: measuring interpretability
Direct, quantitative human evaluation of latent space
Testing interpretability on different models and corpora
Disconnect with likelihood
Better
1
!
!
0.75
0.70
!
0.65
!1.0
What we're measuring
!
!
!1.5
Better
!
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
! !
Evaluating Topic Interpretability
Interpretability is a human judgement
We will ask people directly
Experiment Goals
Quick
Fun
Consistent
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Evaluating Topic Interpretability
Interpretability is a human judgement
We will ask people directly
Experiment Goals
Quick
Fun
Consistent
We turn to Amazon Mechanical Turk
Two tasks: Word Intrusion and Topic Intrusion
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Task One: Word Intrusion
TOPIC 1
TOPIC 2
TOPIC 3
computer,
technology,
system,
service, site,
phone,
internet,
machine
sell, sale,
store, product,
business,
advertising,
market,
consumer
play, film,
movie, theater,
production,
star, director,
stage
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Task One: Word Intrusion
1
Take the highest probability words from a topic
Original Topic
dog, cat, horse, pig, cow
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Task One: Word Intrusion
1
Take the highest probability words from a topic
Original Topic
dog, cat, horse, pig, cow
2
Take a high-probability word from another topic and add it
Topic with Intruder
dog, cat, apple, horse, pig, cow
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Task One: Word Intrusion
1
Take the highest probability words from a topic
Original Topic
dog, cat, horse, pig, cow
2
Take a high-probability word from another topic and add it
Topic with Intruder
dog, cat, apple, horse, pig, cow
3
We ask Turkers to find the word that doesn’t belong
Hypothesis
If the topics are interpretable, users will consistently choose true
intruder
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Task One: Word Intrusion
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Task One: Word Intrusion
Order of words was shuffled
Which intruder was selected varied
Model precision: percentage of users who clicked on intruder
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Task Two: Topic Intrusion
Red Light, Green
Light: A
2-Tone L.E.D. to
Simplify Screens
TOPIC 1
"TECHNOLOGY"
Internet portals
begin to distinguish
among themselves
as shopping malls
Stock Trades: A
Better Deal For
Investors Isn't
Simple
Forget the
Bootleg, Just
Download the
Movie Legally
The Shape of
Cinema,
Transformed At
the Click of a
Mouse
Multiplex Heralded
As Linchpin To
Growth
TOPIC 3
"ENTERTAINMENT"
Chang, Boyd-Graber, Wang, Gerrish, Blei
TOPIC 2
"BUSINESS"
A Peaceful Crew
Puts Muppets
Where Its Mouth Is
Reading Tea Leaves
Task Two: Topic Intrusion
1
Display document title and first 500 characters to Turkers
2
Show the three topics with highest probability and one topic
chosen randomly
3
Have the user click on the the set of words that is out of place
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Task Two: Topic Intrusion
1
Display document title and first 500 characters to Turkers
2
Show the three topics with highest probability and one topic
chosen randomly
3
Have the user click on the the set of words that is out of place
Hypothesis
If the association of topics to a document is interpretable, users
will consistently choose true intruding topic
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Task Two: Topic Intrusion
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Task Two: Topic Intrusion
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Task Two: Topic Intrusion
1.0
per-document
topic probability
Topics
(sorted by probability)
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Task Two: Topic Intrusion
1.0
per-document
topic probability
Topics
(sorted by probability)
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Task Two: Topic Intrusion
1.0
per-document
topic probability
Topics
(sorted by probability)
Intruder
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Task Two: Topic Intrusion
1.0
per-document
topic probability
Click
Topic Log Odds:
log(0.05 / 0.05) =
0.0
Topics
(sorted by probability)
Intruder
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Task Two: Topic Intrusion
1.0
per-document
topic probability
Click
Topic Log Odds:
log(0.05 / 0.15) =
-1.1
Topics
(sorted by probability)
Intruder
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Task Two: Topic Intrusion
1.0
per-document
topic probability
Click
Topic Log Odds:
log(0.05 / 0.5) =
-2.3
Topics
(sorted by probability)
Intruder
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Three Topic Models
Different assumptions lead to different topic models
Free parameter fit with smoothed EM (pLSI
variant) [Hofmann, 1999]
Dirichlet: latent Dirichlet allocation (LDA) [Blei et al., 2003]
Normal with covariance: correlated topic model
(CTM) [Blei and Lafferty, 2005]
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Corpora
8477 articles
Sample of 10000 articles
8269 types
15273 types
1M tokens
3M tokens
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Corpora
8477 articles
Sample of 10000 articles
8269 types
15273 types
1M tokens
Corpora properties
3M tokens
Well structured (should begin with summary paragraph)
Real-world
Many different themes
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Experiments
1
Fit pLSI, LDA, and CTM to both corpora
2
Each model had 50, 100, or 150 topics
3
50 topics from each condition presented to 8 workers
4
100 documents form each condition presented to 8 workers
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Word Intrusion: Which Topics are Interpretable?
15
10
5
committee
legislation
proposal
republican
taxis
fireplace
garage
house
kitchen
list
americans
japanese
jewish
states
terrorist
artist
exhibition
gallery
museum
painting
0
Number of Topics
New York Times, 50 LDA Topics
0.000
0.125
0.250
0.375
0.500
0.625
0.750
0.875
1.000
Model Precision
Model Precision: percentage of correct intruders found
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Word intrusion: Models with Interpretable Topics
50 topics
100 topics
150 topics
New York Times
1.0
0.8
0.4
0.2
0.0
1.0
●
0.8
Wikipedia
Model Precision
0.6
0.6
0.4
●
0.2
●
●
●
●
●
●
0.0
CTM
LDA
pLSI
CTM
Chang, Boyd-Graber, Wang, Gerrish, Blei
LDA
pLSI
CTM
Reading Tea Leaves
●
●
●
LDA
pLSI
Which documents have clear topic associations?
25
Microsoft
Word
Lindy Hop
15
20
John
Quincy
Adams
5
10
Book
0
Number of Documents
Wikipedia, 50 LDA Topics
!3.5
!3.0
!2.5
!2.0
!1.5
!1.0
Topic Log Odds
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
!0.5
0.0
Which Models Produce Interpretable Topics
●
150 topics
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
LDA
pLSI
●
●
●
●
●
CTM
LDA
pLSI
CTM
Chang, Boyd-Graber, Wang, Gerrish, Blei
LDA
pLSI
CTM
Reading Tea Leaves
Wikipedia
0
−1
−2
−3
−4
−5
−6
−7
100 topics
New York Times
Topic Log Odds
50 topics
0
−1
−2
−3
−4
−5
Held-out Likelihood
Corpus
New York Times
Wikipedia
Topics
50
100
150
50
100
150
Chang, Boyd-Graber, Wang, Gerrish, Blei
pLSI
-7.3384
-7.2834
-7.2382
-7.5378
-7.4748
-7.4355
LDA
-7.3214
-7.2761
-7.2477
-7.5257
-7.4629
-7.4266
Reading Tea Leaves
CTM
-7.3335
-7.2647
-7.2467
-7.5332
-7.4385
-7.3872
0.80
!
!
!
!
Interpretability
and Likelihood
0.75
w York Times
!
Wikipedia
! !
0.70
Better
!1.5
0.75
! !
! !
!
!
!
!
!
!
!
!
! !
!
! !
!
!
0.70
!2.0
0.65
!2.5
!1.0
!
!7.32
!
!
!
! !
!1.5
!7.28
!7.30
!7.28
! !
Held-out Likelihood
!7.26
!7.24
!
50
150
!7.52
! !
!
Predictive
Log Likelihood
!
!
!
!
!
!7.26
!7.24
!7.52 !7.50 !7.48 !7.46 !7.44 !7.42 !7.40
!
!
within
a model,
higher likelihood 6= higher
interpretability
Predictive
Log Likelihood
!2.0
!
!
Reading Tea Leaves
!
100
Better
Chang,!2.5
Boyd-Graber, Wang, Gerrish, Blei
!
Number of topics
!
!
W
!
! !
!
Model
!
! CTM
!
!
!
! LDA
! pLSI
Topic Log Odds
Model Precision
!
New York Times
!
Model Precision
!1.0
0.80
!
Model
Precision on New York
!!
! Times
0.65
!
!
!7.50
!
!7.48
0.80
!
!
!!
!
!!
!
Interpretability
and Likelihood
0.75
!
w York Times
Wikipedia
! !
!
0.70
!
!
!
0.65
!1.0
! !
Better
!
!1.5
!
!2.0
!
!
!
!
!
!! !!
!
!
!7.52
!7.32
!7.50
!7.52
!7.50
Predictive
Predictive Log
Log Likelihood
Likelihood
!
!7.30
!7.48
!7.48
!7.46
!7.28
!7.44
!7.46
!7.44
!
!7.26
!7.42
!7.42
Held-out Likelihood
!7.40
!7.24
!7.40
LDA
100
! !100
150
pLSI
! !150
Number of topics
!
!
!7.26
!7.24
!7.52
!7.50
!7.48
!7.46
!7.44
150
!7.52
Predictive Log Likelihood
!7.42
!7.40
across
models,
higher likelihood 6= higher interpretability
Predictive
Log Likelihood
Chang, Boyd-Graber, Wang, Gerrish, Blei
50
100
Better
!7.28
Reading Tea Leaves
!
!
Topic Log Odds
!7.24
!7.24
!
!
! !
!
!
!2.5
!7.26
!7.26
!
LDA
LDA
pLSI
pLSI
Number
Modelof
Number
of topics
topics
50
!
! !
CTM
! 50
! !
Topic
TopicLog
LogOdds
Odds
Topic Log Odds
! !
Model
Model
CTM
!
!
CTM
!
!
!
!
!
!
!
!
!
Model Precision
Topic
Log Odds on Wikipedia
!
!
!
!
!7.28
!7.28
!
!
!
Model
ModelPrecision
Precision
!
!
!7.50
!7.48
Conclusion
Disconnect between evaluation and use
Means of evaluating an unsupervised method
For topic models, direct measurement of interpretability
Surprising relationship between interpretability and likelihood
Measure what you care about
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Future Work
Influence of inference techniques and hyperparmeters
Investigate shape of likelihood / interpretability curve
Model human intuition
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Workshop
Applications for Topic Models:
Text and Beyond
7:30am - 6:30pm Friday
Westin: Callaghan
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves
Blei, D., Ng, A., and Jordan, M. (2003).
Latent Dirichlet allocation.
JMLR, 3:993–1022.
Blei, D. M. and Lafferty, J. D. (2005).
Correlated topic models.
In NIPS.
Hall, D., Jurafsky, D., and Manning, C. D. (2008).
Studying the history of ideas using topic models.
In EMNLP.
Hofmann, T. (1999).
Probabilistic latent semantic analysis.
In UAI.
Maskeri, G., Sarkar, S., and Heafield, K. (2008).
Mining business topics in source code using latent dirichlet allocation.
In ISEC ’08: Proceedings of the 1st conference on India software engineering conference, pages 113–120,
New York, NY, USA. ACM.
Mimno, D., Wallach, H., Yao, L., Naradowsky, J., and McCallum, A. (2009).
Polylingual topic models.
In Snowbird Learning Workshop. Clearwater, FL.
Chang, Boyd-Graber, Wang, Gerrish, Blei
Reading Tea Leaves