The Greedy Prepend Algorithm for Decision List Induction

The Greedy Prepend Algorithm for Decision List Induction Deniz Yuret Michael de la Maza

Overview • Decision Lists • Greedy Prepend Algorithm • Opus search and UCI problems • Version space search and secondary structure prediction • Limited look-ahead search and Turkish morphology disambiguation

Introduction to Decision Lists • Prototypical machine learning problem: • Decide democrat or republican for 435 representatives based on 16 votes. Class Name: 2 (democrat, republican) 1. handicapped-infants: 2 (y,n) 2. water-project-cost-sharing: 2 (y,n) 3. adoption-of-the-budget-resolution: 2 (y,n) 4. physician-fee-freeze: 2 (y,n) 5. el-salvador-aid: 2 (y,n) 6. religious-groups-in-schools: 2 (y,n) … 16. export-administration-act-south-africa: 2 (y,n)

Introduction to Decision Lists • Prototypical machine learning problem: • Decide democrat or republican for 435 representatives based on 16 votes. 1. If adoption-of-the-budget-resolution = y and anti-satellite-test-ban = n and water-project-cost-sharing = y then democrat 2. If physician-fee-freeze = y then republican 3. If TRUE then democrat

Alternative Representations • Decision trees:

Alternative Representations • CNF: • DNF:

Alternative Representations • For 0 < k < n and n > 2, k-CNF(n) U k-DNF(n) is a subset of k-DL(n) • For 0 < k < n and n > 2, k-DT(n) is a subset of k-CNF(n) ∩ k-DNF(n) • k-DT(n) is a subset of k-DL(n) Rivest 1987

Decision List Induction • Start with an empty decision list or a default rule. • Keep adding the best rule that covers the unclassified and misclassified cases. Design Decisions: • Where to add the new rules (front, back) • Criteria for best rule • Search algorithm for best rule

The Greedy Prepend Algorithm GPA(data) • dlist = NIL • default-class = most-common-class(data) • rule = [ if true then default-class ] • while gain(rule, dlist, data) > 0 • do dlist = prepend(rule, dlist) • rule = max-gain-rule(dlist, data) • return dlist

The Greedy Prepend Algorithm • Starts with a default rule that picks the most common class • Prepends subsequent rules to the front of the decision list • The best rule is the one with maximum gain (increase in number of correctly classified instances) • Several search algorithms implemented

+ - Rule Search • The default rule predicts all instances to belong to the most common category False Assignments Correct Assignments Training Set Partition with respect to the Base Rule

+ - Rule Search • At each step add the maximum gain rule + - - + Partition with respect to the Next Rule Partition with respect to the Decision List

Opus Search: Simple tree

Opus Search: Fixed order tree

Opus Search: Optimal pruning

GPA-Opus on UCI Problems

A Generic Prediction Algorithm: Sequence to Structure MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD ??????????????????????????????????????

A Generic Prediction Algorithm: Sequence to Structure MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD -?????????????????????????????????????

A Generic Prediction Algorithm: Sequence to Structure MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD --????????????????????????????????????

A Generic Prediction Algorithm: Sequence to Structure MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD ---???????????????????????????????????

A Generic Prediction Algorithm: Sequence to Structure MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD ----H-----????????????????????????????

A Generic Prediction Algorithm: Sequence to Structure MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD ----H-----H???????????????????????????

A Generic Prediction Algorithm: Sequence to Structure MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD ----H-----HH??????????????????????????

A Generic Prediction Algorithm: Sequence to Structure MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD ----H-----HHHHHHHHHH------EEEEE------?

A Generic Prediction Algorithm: Sequence to Structure MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD ----H-----HHHHHHHHHH------EEEEE-------

GPA Rules • The first three rules of the sequence-to-structure decision list • 58.86% performance (of 66.36%)

GPA Rule 1 • Everything => Loop

GPA Rule 2

GPA Rule 3

GPA-Opus not feasible for secondary structure prediction • 9 positions • 20 possible amino-acids per position • Size of rule space: • With only pos=val type attributes: 21^9 • If we include disjunctions: 2^180

GPA Version Space Search Searching for a candidate rule: • Pick a random instance • If the instance is currently misclassified and candidate rule corrects it: generalize candidate rule to include instance • If the instance is currently correct and candidate rule changes classification: specialize candidate rule to exclude instance

GPA Secondary Structure Prediction Results • PhD 72.3 • NNSSP 71.7 • GPA 69.2 • DSC 69.1 • Predator 69.0

Morphological Analyzer for Turkish masalı • masal+Noun+A3sg+Pnon+Acc (= the story) • masal+Noun+A3sg+P3sg+Nom (= his story) • masa+Noun+A3sg+Pnon+Nom^DB+Adj+With (= with tables) • Oflazer, K. (1994). Two-level description of Turkish morphology. Literary and Linguistic Computing • Oflazer, K., Hakkani-Tür, D. Z., and Tür, G. (1999)Design for a turkish treebank. EACL’99 • Kenneth R. Beesley and Lauri Karttunen, Finite State Morphology, CSLI Publications, 2003

126 unique features 9129 unique IGs ∞ unique tags 11084 distinct tags observed in 1M word training corpus stem features features IG inflectional group (IG) derivational boundary tag Features, IGs and Tags masa+Noun+A3sg+Pnon+Nom^DB+Adj+With

Morphological disambiguation • Task: pick correct parse given context • masal+Noun+A3sg+Pnon+Acc • masal+Noun+A3sg+P3sg+Nom • masa+Noun+A3sg+Pnon+Nom^DB+Adj+With • Uzun masalı anlat Tell the long story • Uzun masalı bitti His long story ended • Uzun masalı oda Room with long table

Morphological disambiguation • Task: pick correct parse given context • masal+Noun+A3sg+Pnon+Acc • masal+Noun+A3sg+P3sg+Nom • masa+Noun+A3sg+Pnon+Nom^DB+Adj+With Key Idea Build a separate classifier for each feature.

If (W = çok) and (R1 = +DA) Then W has +Det If (L1 = pek) Then W has +Det If (W = +AzI) Then W does not have +Det If (W = çok) Then W does not have +Det If TRUE Then W has +Det “pek çok alanda” (R1) “pek çok insan” (R2) “insan çok daha” (R4) GPA on Morphological Disambiguation

GPA-Opus not feasible Attributes for a five word window: • The exact word string (e.g. W=Ali'nin) • The lowercase version (e.g. W=ali'nin) • All suffixes (e.g. W=+n, W=+In, W=+nIn, W=+'nIn, etc.) • Character types (e.g. Ali'nin would be described with W=UPPER-FIRST, W=LOWER-MID, W=APOS-MID, W=LOWERLAST) Average 40 features per instance.

GPA limited look-ahead search • New rules are restricted to adding one new feature to existing rules in the decision list

GPA Turkish morphological disambiguation results • Test corpus: 1000 words, hand tagged • Accuracy: 95.87% (conf. int: 94.57-97.08) • Better than the training data !?

Contributions and Future Work • Established GPA as a competitive alternative to SVM’s, C4.5 etc. • Need theory on why the best-gain rule does well. • Need to study robustness to irrelevant or redundant attributes. • Need to speed up the application of the resulting decision lists (convert to FSM?)

The Greedy Prepend Algorithm for Decision List Induction