160 likes | 330 Views
Self-Training & Co-Training Overview. Meeting 17 — Mar 19, 2013 CSCE 6933 Rodney Nielsen. Self-Training. L L 0 < X (0) , Y (0) > Until stopping-criteria h ( x ) f ( L ) U * select ( U , h ) L L 0 + < U * , h ( U * )>. Base Learner. Textbook assumes a hard-label
E N D
Self-Training & Co-Training Overview Meeting 17 — Mar 19, 2013 CSCE 6933 Rodney Nielsen
Self-Training • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x) f(L) • U* select(U, h) • L L0 + <U*, h(U*)>
Base Learner • Textbook assumes a hard-label • But must output some means of generating a classification confidence
Example Selection • Probability • Probability ratio or probability margin • Entropy
Stopping Criteria • T rounds, • Repeat until convergence, • Use held out validation data, or • k-fold cross validation
Seed • Seed Data vs. Seed Classifier • Training on seed data does not necessarily result in a classifier that perfectly labels the seed data • Training on data output by a seed classifier does not necessarily result in the same classifier • Constraints
Indelibility Indelible • L <X(0), Y(0)> • Until stopping-criteria • h(x) f(L) • U* select(U, h) • L L + <U*, h(U*)> • U U – U* • Original: Y(U) can change • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x) f(L) • U* select(U, h) • L L0 + <U*, h(U*)>
Persistence Indelible • L <X(0), Y(0)> • Until stopping-criteria • h(x) f(L) • U* select(U, h) • L L + <U*, h(U*)> • U U – U* • Persistent: X(L) can’t change • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x) f(L) • U* U*+select(U, h) • L L0 + <U*, h(U*)> • U U – U*
Throttling Throttled • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x) f(L) • U* select(U, h, k) • L L0+ <U*, h(U*)> Select k examples from U, with the greatest confidence • Original: Threshold • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x) f(L) • U* select(U, h, θ) • L L0+ <U*, h(U*)> • Select all examples from U, with confidence > θ
Balanced Balanced (&Throttled) • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x) f(L) • U* select(U, h, k) • L L0+ <U*, h(U*)> Select k+ positive& k- negativeexs; often k+=k- or they are proportional to N+ & N- • Throttled • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x) f(L) • U* select(U, h, k) • L L0+ <U*, h(U*)> • Select kexamples from U, with greatest confidence
Preselection Preselect Subset of U • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x) f(L) • U’ select(U, φ) • U* select(U’, h, θ) • L L0+ <U*, h(U*)> Select exs from U’, a subset of U (typically random) • Original: Test all of U • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x) f(L) • U* select(U, h, θ) • L L0+ <U*, h(U*)> • Select exs from all of U
Co-training • X = X1 × X2 ; two different views of the data • x = (x1, x2) ; i.e., each instance is comprised of two distinct sets of features and values • Assume each view is sufficient for correct classification
Co-Training Algorithm 1 Table 1: Blum and Mitchell, 1998
Questions • ???