Detecting Genre Shift

Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

Natural Language Processing and Machine Learning • Extracting findings from scientific papers • Genetic epidemiology (development domain) • PubMed search produces thousands of papers • Manually reviewed to extract findings • Findings determine relevant papers/studies • Automate this process with ML/NLP methods • Create searchable database of findings • Allow machine inference over findings • Suggest new scientific hypotheses

Genre Shift in Statistical NLP … told that John Paul Stevens is retiring this summer … … President Barack Obama is urging members to … … President Barack Obama is urging members to … Named Entity Recognition

Supervised Machine Learning for Named Entity Recognition Today the Atlantic Ocean is in an uproar and North Carolina remains in a state of anxiety.

Supervised Machine Learning for Named Entity Recognition

Genre Shift in Statistical NLP … told that John Paul Stevens is retiring this summer … … PRESIDENT BARACK OBAMA IS URGING MEMBERS TO… Named Entity Recognition ???

This is a Pervasive Problem • Extracting regulatory pathways from online bioinformatics journals using a parser trained on the WSJ • Finding faces in images of disaster victims using a model trained on “mug shot” images • Identifying RNA sequences that regulate gene expression in a lab in Baltimore using a model trained on data gathered in a lab in Germany When things change in a way that’s harmful, we’d like to know!

Data Streams Change Over Time Sentiment classification from movie reviews • Natural drift • Users unaware of system limitations

Detecting Genre Shift Genre shift hurts system performance (accuracy) Two problems Detect changes in stream of numbers (A-distance) Convert document stream to stream of informative numbers (margin)

Detecting Genre Shift Genre shift hurts system performance (accuracy) • Measure accuracy directly • Requires labeled examples! • Look for changes in feature distributions • Words become more/less common • New words appear

Measuring Changes in Streams:The A-Distance P P’ A nonparametric, distribution independent measure of changes in univariate, real-valued data streams (Kifer, Ben-David, and Gherke, 2004)

Measuring Changes in Streams:The A-Distance P P’ > ε

Changes in Document Streams X … President Barack Obama is urging members to …

Changes in Document Streams X 4 Obama 4 1 1 embassy … President Barack Obama is urging members to …

Changes in Document Streams W X 1.6 Obama 4 1.6 * 4 + 0.1 * 1 + … = 3.7 1 0.1 embassy … President Barack Obama is urging members to …

Changes in Document Streams W X 1.6 Obama 4 1.6 * 4 + 0.1 * 1 + … = 3.7 1 0.1 embassy … President Barack Obama is urging members to … • WX = margin • sign of WX is class label (+/-) • magnitude of WX is “certainty” in label

Why Margins? • We have an easy way of producing them from unlabeled examples! • We want to track feature changes • Margins are linear combinations of feature values • Removing important features yields smaller margins • Only track features that matter, features with zero (small) weight don’t affect margin (much) • Spoiler alert! Tracking margins works really well for unsupervised detection on genre shifts.

Accuracy vs. Margins DVD to Electronics

Accuracy vs. Margins DVD to Electronics Average in block Average over last 100 instances

Accuracy vs. Margins DVD to Electronics

Confidence Weighted Margins • Margins can be viewed as measure of confidence • We detect when confidence in classifications drops • Confidence Weighted (CW) learning refines this idea • Gaussian distribution over weight vectors • Mean of weight vector: μ in RN • Diagonal co-variance matrix: σ in RNxN • Low variance  high confidence • Normalized margin: μx / (xTσx)0.5 • Called VARIANCE in slides that follow μ σ = 0.02 1.6 σ = 1.74 0.1

Experiments • Datasets • Sentiment classification between domains (Blitzer et al., 2007) • DVDs, electronics, books, kitchen appliances • Spam classification between users (Jiang and Zhai, 2007) • Named entity classification between genres (ACE 2005) • News articles, broadcast news, telephone, blogs, etc. • Algorithms • Baselines: SVM, MIRA, CW • Our method: VARIANCE

Experiments • Simulated domain shifts between each pair of genres • 38 pairs, 10 trials each with different random instance orderings • 500 source examples • 1500 target examples • False change • 11 datasets with no shift, 10 trials with different random instance orderings • If no shift found then detection recorded as end of target examples when computing averages

Comparing Algorithms Good for our approach! Good for baseline Instances from point of shift

SVM vs. VARIANCE

Summary of Results Thus Far • VARIANCE detected shifts faster than … • SVM 34 times out of 38 • MIRA 26 times out of 38 • CW 27 times out of 38

Gradual Shifts

What if you have labels? • STEPD: a Statistical Test of Equal Proportions to Detect concept drift (Nishida and Yamauchi, 2007) • Monitors accuracy of classifier from stream of labeled examples • Parameters: window size, W, and threshold, α

Comparison to STEPD

What about false positives?

The A-Distance: Choosing Parameters P A n > ε

The A-Distance: Choosing Parameters • A-distance paper gives bounds on FPs and FNs • Bounds depend on n and e • Bounds do not depend on tiling! • So loose as to be meaningless • No guidance on how to choose tiling • What if tiles lie outside support of data?

Better Bounds • PA = true probability of a point falling in tile A • h = number of points that actually fell in A • pA = h/n = ML estimate of PA • Define P’A, h’, and p’A for second window • Suppose PA = P’A, then any change detected is a false positive What is the probability that |pA – p’A| > e/2? > ε

Posterior Over PA • B(a, b) is the Beta function over a + b Bernoulli trials • a trials have one outcome (point lands in tile A) • b trials have the other (point lands in some other tile)

False Positives: Two Cases

Don’t worry, I’m not going to explain this (much)

Probability of a FP (n = 200)

Probability of FN

Minimizing Expected Loss

Moving Forward Twitter Transcribed Broadcast News Genre Classifier Newswire

Genre Shift “Fix” … told that John Paul Stevens is retiring this summer … … PRESIDENT BARACK OBAMA IS URGING MEMBERS TO… Named Entity Recognition

Genre Shift “Fix” … told that John Paul Stevens is retiring this summer … … PRESIDENT BARACK OBAMA IS URGING MEMBERS TO… … President Barack Obama is urging members to … Named Entity Recognition

Conclusion • Changes in margins convey useful information about changes in classification accuracy • No need for labeled examples! • The A-distance applied to margin streams finds genre shifts with few false positives/negatives • Confidence weighted margins normalized by variance detect shifts faster than SVM, MIRA, or (non-normalized) CW margins • Our approach even works with gradual shifts and compares favorably to shift detectors that use labeled examples

Thank you!

Detecting Genre Shift

Detecting Genre Shift

Presentation Transcript

GENRE

GENRE

Genre

Genre

Genre

GENRE

Genre

Genre

Genre

Genre

Genre

genre

Detecting Cartoons a Case Study in Automatic Video-Genre Classification

Genre

Genre

Genre

Genre

Genre

GENRE

Genre

Genre