160 likes | 287 Views
Announcements. Exam grading Projects Next: Generative Models Chapter 6 Bayesian Learning. Shattering. We say that a set S of examples is shattered by a set of functions H if for every partition of the examples in S into positive and negative examples
E N D
Announcements • Exam grading • Projects • Next: Generative Models • Chapter 6 Bayesian Learning CS446-Spring 06
Shattering • We say that a set S of examplesis shattered by a set of functions H if • for every partition of the examples in S into positive and negative examples • there is a function in H that gives exactly these labels to the examples • (Intuition: A rich set of functions shatters large sets of points) CS446-Spring 06
Shattering • We say that a set S of examplesis shattered by a set of functions H if • for every partition of the examples in S into positive and negative examples • there is a function in H that gives exactly these labels to the examples • (Intuition: A rich set of functions shatters large sets of points) • Left bounded intervals on the real axis:[0,a), for some real number a>0 + + + + + - - a 0 CS446-Spring 06
Shattering • We say that a set S of examplesis shattered by a set of functions H if • for every partition of the examples in S into positive and negative examples • there is a function in H that gives exactly these labels to the examples • (Intuition: A rich set of functions shatters large sets of points) • Left bounded intervals on the real axis:[0,a), for some real number a>0 • Sets of two points cannot be shattered • (we mean: given two points, you can label them in such a way that • no concept in this class that will be consistent with their labeling) + + + + + - + + + + + - - a a - + 0 0 CS446-Spring 06
Shattering • We say that a set S of examplesis shattered by a set of functions H if • for every partition of the examples in S into positive and negative examples • there is a function in H that gives exactly these labels to the examples • Intervals on the real axis:[a,b], for some real numbers b>a This is the set of functions (concept class) considered here - - + + + + + - - b a CS446-Spring 06
Shattering • We say that a set S of examplesis shattered by a set of functions H if • for every partition of the examples in S into positive and negative examples • there is a function in H that gives exactly these labels to the examples • Intervals on the real axis:[a,b], for some real numbers b>a • All sets of one or two points can be shattered • but sets of three points cannot be shattered - - - - + + + + + - + + + + + - - b b + - + b a CS446-Spring 06
Shattering • We say that a set S of examplesis shattered by a set of functions H if • for every partition of the examples in S into positive and negative examples • there is a function in H that gives exactly these labels to the examples • Half-spaces in the plane: + + + - - - + - CS446-Spring 06
Shattering • We say that a set S of examplesis shattered by a set of functions H if • for every partition of the examples in S into positive and negative examples • there is a function in H that gives exactly these labels to the examples • Half-spaces in the plane: • sets of one, two or three points can be shattered • but there is no set of four points that can be shattered + + + + - - - - + - - + CS446-Spring 06
VC Dimension • An unbiased hypothesis space H shatters the entire instance space X, i.e, • it is able to induce every possible partition on the set of all possible instances. • The larger the subset X that can be shattered, the more expressive a • hypothesis space is, i.e., the less biased. CS446-Spring 06
VC Dimension • We say that a set S of examplesis shattered by a set of functions H if • for every partition of the examples in S into positive and negative examples • there is a function in H that gives exactly these labels to the examples • The VC dimension of hypothesis space H over instance space X • is the size of the largest finite subset of X that is shattered by H. • If there exists a subset of size d can be shattered, then VC(H) >=d • If no subset of sizedcan be shattered, then VC(H) < d • VC(Half intervals) = 1 (no subset of size 2 can be shattered) • VC( Intervals) = 2 (no subset of size 3 can be shattered) • VC(Half-spaces in the plane) = 3 (no subset of size 4 can be shattered) CS446-Spring 06
Sample complexity with VC Dimension • Using VC(H) as a measure of expressiveness we have the following • for infinite hypothesis spaces. • Given a sample D of m examples • If we can find some h H that is consistent with all m examples • with • Then with probability at least (1-),h has error less than . • (again when m is polynomial we have a PAC learning algorithm; • to be efficient, we need to produce the hypothesis h efficiently. • Note: to shatter m examples |H|>2m, so log(|H|)¸VC(H) CS446-Spring 06
Homework • H = Axis parallel rectangles in R2 • Four real numbers define a rectangle • |H| is infinite • Five sample rectangles from H are shown • What is the VC dimension of H • Can we PAC learn? • Can we efficiently PAC learn? CS446-Spring 06
VC Dimension & Learning • Infinite |H| does not mean unbounded expressivity • Exhaust the representational capacity of H • VC(H) is a worst-case capacity measure • Distribution and labelings over X may not be unfavorable CS446-Spring 06
VC(H) Growth Functionlog(labelings) vs. |S| 1,000,000 All Labelings 10,000 labelings(|S|) Labelings Possible by H 100 1 5 |S| 10 15 20 CS446-Spring 06
Suppose… • All hH are very low accuracy, say < 0.1% correct • VC(H) is 100 • Training set S contains 80 labeled examples What is the probability that an arbitrary h gets the first training example right? What is the probability that an arbitrary h gets all 80 training examples right? What is the best some hH can possibly do on all 80 elements of S? CS446-Spring 06
Some Interesting Concept Classeswhat is the VC dimension? • Signum(sin(x)) on the real line R • Convex polygons in the plane RxR • d-input linear threshold unit in Rd CS446-Spring 06