Self-Training & Co-Training Overview

Self-Training & Co-Training Overview Meeting 17 — Mar 19, 2013 CSCE 6933 Rodney Nielsen

Self-Training • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x)  f(L) • U* select(U, h) • L  L0 + <U*, h(U*)>

Base Learner • Textbook assumes a hard-label • But must output some means of generating a classification confidence

Example Selection • Probability • Probability ratio or probability margin • Entropy

Stopping Criteria • T rounds, • Repeat until convergence, • Use held out validation data, or • k-fold cross validation

Seed • Seed Data vs. Seed Classifier • Training on seed data does not necessarily result in a classifier that perfectly labels the seed data • Training on data output by a seed classifier does not necessarily result in the same classifier • Constraints

Indelibility Indelible • L <X(0), Y(0)> • Until stopping-criteria • h(x)  f(L) • U* select(U, h) • L  L + <U*, h(U*)> • U  U – U* • Original: Y(U) can change • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x)  f(L) • U* select(U, h) • L  L0 + <U*, h(U*)>

Persistence Indelible • L <X(0), Y(0)> • Until stopping-criteria • h(x)  f(L) • U* select(U, h) • L  L + <U*, h(U*)> • U  U – U* • Persistent: X(L) can’t change • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x)  f(L) • U* U*+select(U, h) • L  L0 + <U*, h(U*)> • U  U – U*

Throttling Throttled • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x)  f(L) • U* select(U, h, k) • L  L0+ <U*, h(U*)> Select k examples from U, with the greatest confidence • Original: Threshold • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x)  f(L) • U* select(U, h, θ) • L  L0+ <U*, h(U*)> • Select all examples from U, with confidence > θ

Balanced Balanced (&Throttled) • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x)  f(L) • U* select(U, h, k) • L  L0+ <U*, h(U*)> Select k+ positive& k- negativeexs; often k+=k- or they are proportional to N+ & N- • Throttled • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x)  f(L) • U* select(U, h, k) • L  L0+ <U*, h(U*)> • Select kexamples from U, with greatest confidence

Preselection Preselect Subset of U • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x)  f(L) • U’ select(U, φ) • U* select(U’, h, θ) • L  L0+ <U*, h(U*)> Select exs from U’, a subset of U (typically random) • Original: Test all of U • L L0 <X(0), Y(0)> • Until stopping-criteria • h(x)  f(L) • U* select(U, h, θ) • L  L0+ <U*, h(U*)> • Select exs from all of U

Co-training • X = X1 × X2 ; two different views of the data • x = (x1, x2) ; i.e., each instance is comprised of two distinct sets of features and values • Assume each view is sufficient for correct classification

Co-Training Algorithm 1 Table 1: Blum and Mitchell, 1998

Questions • ???

Self-Training & Co-Training Overview