290 likes | 424 Views
Course Review (Part 1). LING 572 Fei Xia 1/19/06. Outline. Recap Homework 1 Project Part 1. Recap. Recap. FSA and HMM DT, DL, TBL. A learning algorithm. Modeling: Representation Decomposition Parameters Properties Training: Simple counting, hill-climbing, greedy algorithm, …
E N D
Course Review (Part 1) LING 572 Fei Xia 1/19/06
Outline • Recap • Homework 1 • Project Part 1
Recap • FSA and HMM • DT, DL, TBL
A learning algorithm • Modeling: • Representation • Decomposition • Parameters • Properties • Training: • Simple counting, hill-climbing, greedy algorithm, … • Pruning and filtering • Smoothing issues
A learning algorithm (cont) • Decoding: • Simply verify condition: DT, DL, TBL • Viterbi: FSA and HMM • Pruning during the search • Relation with other algorithms: • Ex: DNF, CNF, DT, DL and TBL • Ex: WFA and HMM, PFA and HMM
NLP task • Choose a ML method: e.g., DT, TBL • Modeling: • Ex: TBL: What kinds of features? • Ex: HMM: What are the states? What are the output symbols? • Training: e.g., DT • Select a particular algorithm: ID3, C4.5 • Choose pruning/filtering/smoothing strategies, thresholds, quality measures, etc. • Decoding: • Pruning strategies
Hw1 • Problem 3 & 4: State-emission and arc-emission HMMs. • Problem 5: Viterbi algorithm • Problem 2: HMM • Problem 1: FSA
Problem 3: State-emission HMM Arc-emission HMM (a) (b) • Given a path X1, X2, ..., Xn+1 in HMM1 • The path in HMM2 is X1, X2, ..., Xn+1.
Problem 3 (cont) (c)
Problem 4 (cont) (b) Given a path X1, X2, …., Xn+1 in HMM1, the path in HMM2 is X1_X1, X1_X2, …., Xn_Xn+1 (c)
Problem 5 (cont) Cost(i, j) is the max prob for a path from i to j which produces nothing. To calculate Cost(i, j), let where N is the number of states in HMM.
Problems 1 & 2: Important tricks Constants can be moved outside the sum signs:
Tricks (cont) The order of sums can be changed:
Tricks (cont) • The order of sum and product:
Problem 2: HMM • Prove by induction: • When the length is 0: • When the length is n-1, we assume that
f f f q0 q1 qN q1 qN q1 qN q2 q2 q2 Problem 1 (cont) ...
Carmel: a WFA package WFA Input/output symbols Carmel best path
tj: P(tj | ti) tj ti t/w: P(w | t) q Bigram tagging • FST1: Initial states: {BOS} Final states: {EOS} • FST2:
t/w: P(w | t) q Trigram tagging t2: P(t2 | t1,t0) • FST1: Initial state: {BOS-BOS} Final state: {EOS-EOS} • FST2: t0t1 t1t2
Minor details • BOS and EOS: • No need for special treatment for BOS • EOS: • Add two “EOS”s at the end of a sentence, or • Replace input symbol “EOS” with ε (a.k.a. *e*).