560 likes | 905 Views
INTRODUCTION. CS446-Fall 06. 2. . Office hours: after most classes and Thur @ 3Text: Mitchell's Machine LearningMidterm:Oct. 4Final:Dec. 12each a thirdHomeworks / projectsSubmit at the beginning of classLate penalty: 20% / day up to 3 daysProgramming, some in-class assignmentsClass web site soonCheating: none allowed! We adopt dept. policy.
E N D
1. CS 446: Machine Learning Gerald DeJong
mrebl@.uiuc.edu
3-0491
3320 SC
Recent approval for a TA to be named later
2. INTRODUCTION CS446-Fall 06 2 Office hours: after most classes and Thur @ 3
Text: Mitchells Machine Learning
Midterm: Oct. 4
Final: Dec. 12 each a third
Homeworks / projects
Submit at the beginning of class
Late penalty: 20% / day up to 3 days
Programming, some in-class assignments
Class web site soon
Cheating: none allowed! We adopt dept. policy
3. INTRODUCTION CS446-Fall 06 3 Please answer these and hand in now Name
Department
Where (If?*) you had Intro AI course
Who taught it (esp. if not here)
1) Why interested in Machine Learning?
2) Any topics you would like to see covered?
* may require significant additional effort
4. INTRODUCTION CS446-Fall 06 4 Approx. Course Overview / Topics Introduction: Basic problems and questions
A detailed examples: Linear threshold units
Basic Paradigms:
PAC (Risk Minimization); Bayesian Theory; SRM (Structural Risk Minimization); Compression; Maximum Entropy;
Generative/Discriminative; Classification/Skill;
Learning Protocols
Online/Batch; Supervised/Unsupervised/Semi-supervised; Delayed supervision
Algorithms:
Decision Trees (C4.5)
[Rules and ILP (Ripper, Foil)]
Linear Threshold Units (Winnow, Perceptron; Boosting; SVMs; Kernels)
Probabilistic Representations (nave Bayes, Bayesian trees; density estimation)
Delayed supervision: RL
Unsupervised/Semi-supervised: EM
Clustering, Dimensionality Reduction, or others of student interest As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms.
5. INTRODUCTION CS446-Fall 06 5 What to Learn Classifiers: Learn a hidden function
Concept Learning: chair ? face ? game ?
Diagnosis: medical; risk assessment
Models: Learn a map (and use it to navigate)
Learn a distribution (and use it to answer queries)
Learn a language model; Learn an Automaton
Skills:
Learn to play games; Learn a Plan / Policy
Learn to Reason; Learn to Plan
Clusterings:
Shapes of objects; Functionality; Segmentation
Abstraction
Focus on classification (importance, theoretical richness, generality,) As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms.
6. INTRODUCTION CS446-Fall 06 6 What to Learn? Direct Learning: (discriminative, model-free[bad name])
Learn a function that maps an input instance to the sought after property.
Model Learning: (indirect, generative)
Learning a model of the domain; then use it to answer various questions about the domain
In both cases, several protocols can be used
Supervised learner is given examples and answers
Unsupervised examples, but no answers
Semi-supervised some examples w/answers, others w/o
Delayed supervision As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms.
7. INTRODUCTION CS446-Fall 06 7 Supervised Learning Given: Examples (x,f (x)) of some unknown function f
Find: A good approximation to f
x provides some representation of the input
The process of mapping a domain element into a representation is called Feature Extraction. (Hard; ill-understood; important)
x 2 {0,1}n or x 2 <n
The target function (label)
f(x) 2 {-1,+1} Binary Classification
f(x) 2 {1,2,3,.,k-1} Multi-class classification
f(x) 2 < Regression Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
8. INTRODUCTION CS446-Fall 06 8 Example and Hypothesis Spaces
9. INTRODUCTION CS446-Fall 06 9 Supervised Learning: Examples Disease diagnosis
x: Properties of patient (symptoms, lab tests)
f : Disease (or maybe: recommended therapy)
Part-of-Speech tagging
x: An English sentence (e.g., The can will rust)
f : The part of speech of a word in the sentence
Face recognition
x: Bitmap picture of persons face
f : Name the person (or maybe: a property of)
Automatic Steering
x: Bitmap picture of road surface in front of car
f : Degrees to turn the steering wheel Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
10. INTRODUCTION CS446-Fall 06 10
11. INTRODUCTION CS446-Fall 06 11
12. INTRODUCTION CS446-Fall 06 12 Hypothesis Space Complete Ignorance:
How many possible functions?
216 = 56536 over four input features.
After seven examples how many possibilities for f?
29 possibilities remain for f
How many examples until we figure out which is correct?
We need to see labels for all 16 examples!
Is Learning Possible? Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
13. INTRODUCTION CS446-Fall 06 13 Another Hypothesis Space Simple Rules: There are only 16 simple
conjunctive rules of the form y=xi xj xk...
No simple rule explains the data. The same is true for simple clauses Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
14. INTRODUCTION CS446-Fall 06 14 Third Hypothesis Space m-of-n rules: There are 29 possible rules
of the form y = 1 if and only if at least m
of the following n variables are 1
Found a consistent hypothesis. Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
15. INTRODUCTION CS446-Fall 06 15 Views of Learning Learning is the removal of our remaining uncertainty:
Suppose we knew that the unknown function was an m-of-n Boolean function, then we could use the training data to infer which function it is.
Learning requires guessing a good, small hypothesis class:
We can start with a very small class and enlarge it until it contains an hypothesis that fits the data.
We could be wrong !
Our prior knowledge might be wrong: y=x4 ? one-of (x1, x3) is also consistent
Our guess of the hypothesis class could be wrong
If this is the unknown function, then we will make errors when we are given new examples, and are asked to predict the value of the function Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
16. INTRODUCTION CS446-Fall 06 16 General strategy for Machine Learning H should respect our prior understanding:
Excess expressivity makes learning difficult
Expressivity of H should match our ignorance
Understand flexibility of std. hypothesis spaces:
Decision trees, neural networks, rule grammars, stochastic models
Hypothesis spaces of flexible size; Nested collections of hypotheses.
ML succeeds when these interrelate
Develop algorithms for finding a hypothesis h that fits the data
h will likely perform well when the richness of H is less than the information in the training set
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
17. INTRODUCTION CS446-Fall 06 17 Terminology Training example: An pair of the form (x, f (x))
Target function (concept): The true function f (?)
Hypothesis: A proposed function h, believed to be similar to f.
Concept: Boolean function. Example for which f (x)= 1 are positive examples; those for which f (x)= 0 are negative examples (instances) (sometimes used interchangeably w/ Hypothesis)
Classifier: A discrete valued function. The possible value of f: {1,2,K} are the classes or class labels.
Hypothesis space: The space of all hypotheses that can, in principle, be output by the learning algorithm.
Version Space: The space of all hypothesis in the hypothesis space that have not yet been ruled out. Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
18. INTRODUCTION CS446-Fall 06 18 Key Issues in Machine Learning Modeling
How to formulate application problems as machine learning problems ?
Learning Protocols (where is the data coming from, how?)
Project examples: [complete products]
EMAIL
Given a seminar announcement, place the relevant information in my outlook
Given a message, place it in the appropriate folder
Image processing:
Given a folder with pictures; automatically rotate all those that need it.
My office:
have my office greet me in the morning and unlock the door (but do it only for me!)
Context Sensitive Spelling: Incorporate into Word Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
19. INTRODUCTION CS446-Fall 06 19 Key Issues in Machine Learning Modeling
How to formulate application problems as machine learning problems ?
Learning Protocols (where is the data coming from, how?)
Representation:
What are good hypothesis spaces ?
Any rigorous way to find these? Any general approach?
Algorithms:
What are good algorithms?
How do we define success?
Generalization Vs. over fitting
The computational problem
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
20. INTRODUCTION CS446-Fall 06 20 Example: Generalization vs Overfitting What is a Tree ?
A botanist Her brother
A tree is something with A tree is a green thing
leaves Ive seen before
Neither will generalize well
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
21. INTRODUCTION CS446-Fall 06 21 Self-organize into Groups of 4 or 5 Assignment 1
The Badges Game
Prediction or Modeling?
Representation
Background Knowledge
When did learning take place?
Learning Protocol?
What is the problem?
Algorithms Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
22. INTRODUCTION CS446-Fall 06 22 Linear Discriminators I dont know {whether, weather} to laugh or cry
How can we make this a learning problem?
We will look for a function
F: Sentences? {whether, weather}
We need to define the domain of this function better.
An option: For each word w in English define a Boolean feature xw :
[xw =1] iff w is in the sentence
This maps a sentence to a point in {0,1}50,000
In this space: some points are whether points
some are weather points As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms.
23. INTRODUCTION CS446-Fall 06 23 Whats Good? Learning problem:
Find a function that
best separates the data
What function?
Whats best?
How to find it?
A possibility: Define the learning problem to be:
Find a (linear) function that best separates the data
As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms.
24. INTRODUCTION CS446-Fall 06 24 Exclusive-OR (XOR) (x1 x2) (:{x1} :{x2})
In general: a parity function.
xi 2 {0,1}
f(x1, x2,, xn) = 1
iff ? xi is even
This function is not
linearly separable. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms.
25. INTRODUCTION CS446-Fall 06 25 Sometimes Functions Can be Made Linear x1 x2 x4 x2 x4 x5 x1 x3 x7
Space: X= x1, x2,, xn
input Transformation
New Space: Y = {y1,y2,} = {xi,xi xj, xi xj xj} As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms.
26. INTRODUCTION CS446-Fall 06 26 Data are not separable in one dimension
Not separable if you insist on using a specific class of functions
Feature Space
27. INTRODUCTION CS446-Fall 06 27 Blown Up Feature Space Data are separable in <x, x2> space
28. INTRODUCTION CS446-Fall 06 28 A General Framework for Learning Goal: predict an unobserved output value y 2 Y
based on an observed input vector x 2 X
Estimate a functional relationship y~f(x)
from a set {(x,y)i}i=1,n
Most relevant - Classification: y ? {0,1} (or y ? {1,2,k} )
(But, within the same framework can also talk about Regression, y 2 <
What do we want f(x) to satisfy?
We want to minimize the Loss (Risk): L(f()) = E X,Y( [f(x)?y] )
Where: E X,Y denotes the expectation with respect to the true distribution. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms.
29. INTRODUCTION CS446-Fall 06 29 A General Framework for Learning (II) We want to minimize the Loss: L(f()) = E X,Y( [f(X)?Y] )
Where: E X,Y denotes the expectation with respect to the true distribution.
We cannot do that. Why not?
Instead, we try to minimize the empirical classification error.
For a set of training examples {(Xi,Yi)}i=1,n
Try to minimize the observed loss
(Issue I: when is this good enough? Not now)
This minimization problem is typically NP hard.
To alleviate this computational problem, minimize a new function a convex upper bound of the classification error function
I(f(x),y) =[f(x) ?y] = {1 when f(x)?y; 0 otherwise} As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms.
30. INTRODUCTION CS446-Fall 06 30 Learning as an Optimization Problem A Loss Function L(f(x),y) measures the penalty incurred by a classifier f on example (x,y).
There are many different loss functions one could define:
Misclassification Error:
L(f(x),y) = 0 if f(x) = y; 1 otherwise
Squared Loss:
L(f(x),y) = (f(x) y)2
Input dependent loss:
L(f(x),y) = 0 if f(x)= y; c(x)otherwise.
As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms. As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information
In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were
Generated, and learning was done in these terms.
31. INTRODUCTION CS446-Fall 06 31 How to Learn? Local search:
Start with a linear threshold function.
See how well you are doing.
Correct
Repeat until you converge.
There are other ways that do not
search directly in the
hypotheses space
Directly compute the hypothesis?
32. INTRODUCTION CS446-Fall 06 32 Learning Linear Separators (LTU) f(x) = sgn {x w - ?} = sgn{?i=1n wi xi - ? }
x= (x1 ,x2, ,xn) 2 {0,1}n
is the feature based
encoding of the data point
w= (w1 ,w2, ,wn) 2 <n
is the target function.
? determines the shift with
respect to the origin
33. INTRODUCTION CS446-Fall 06 33 Expressivity f(x) = sgn {x w - ?} = sgn{?i=1n wi xi - ? }
Many functions are Linear
Conjunctions:
y = x1 x3 x5
y = sgn{1 x1 + 1 x3 + 1 x5 - 3}
At least m of n:
y = at least 2 of {x1 ,x3, x5 }
y = sgn{1 x1 + 1 x3 + 1 x5 - 2}
Many functions are not
Xor: y = x1 x2 x1 x2
Non trivial DNF: y = x1 x2 x3 x4
But some can be made linear
34. INTRODUCTION CS446-Fall 06 34 Canonical Representation f(x) = sgn {x w - ?} = sgn{?i=1n wi xi - ? }
sgn {x w - ?} sgn {x w}
Where:
x = (x, -?) and w = (w,1)
Moved from an n dimensional representation to an (n+1) dimensional representation, but now can look for hyperplans that go through the origin.
35. INTRODUCTION CS446-Fall 06 35 LMS: An online, local search algorithm A local search learning algorithm requires:
Hypothesis Space:
Linear Threshold Units
Loss function:
Squared loss
LMS (Least Mean Square, L2)
Search procedure:
Gradient Descent
36. INTRODUCTION CS446-Fall 06 36 Good treatment in Bishop, Chp 3
Classic Weiner filtering solution; text omits 0.5 factor;
In any case we use the gradient and eta (text) or R (these notes) to modulate the step sizeGood treatment in Bishop, Chp 3
Classic Weiner filtering solution; text omits 0.5 factor;
In any case we use the gradient and eta (text) or R (these notes) to modulate the step size
37. INTRODUCTION CS446-Fall 06 37 Gradient Descent We use gradient descent to determine the weight vector that minimizes Err (w) ;
Fixing the set D of examples, E is a function of wj
At each step, the weight vector is modified in the direction that produces the steepest descent along the error surface.
38. INTRODUCTION CS446-Fall 06 38
39. INTRODUCTION CS446-Fall 06 39
40. INTRODUCTION CS446-Fall 06 40
41. INTRODUCTION CS446-Fall 06 41
42. INTRODUCTION CS446-Fall 06 42
43. INTRODUCTION CS446-Fall 06 43
44. INTRODUCTION CS446-Fall 06 44
45. INTRODUCTION CS446-Fall 06 45
46. INTRODUCTION CS446-Fall 06 46
47. INTRODUCTION CS446-Fall 06 47
48. INTRODUCTION CS446-Fall 06 48
49. INTRODUCTION CS446-Fall 06 49 Fisher Linear Discriminant This is a classical method for discriminant analysis.
It is based on dimensionality reduction finding a better representation for the data.
Notice that just finding good representations for the data may not always be good for discrimination. [E.g., O, Q]
Intuition:
Consider projecting data from d dimensions to the line.
Likely results in a mixed set of points and poor separation.
However, by moving the line around we might be able to find an orientation for which the projected samples are well separated.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
50. INTRODUCTION CS446-Fall 06 50 Fisher Linear Discriminant Sample S= {x1, x2, xn } 2 <d
P, N are the positive, negative examples, resp.
Let w 2 <d. And assume ||w||=1. Then:
The projection of a vector x on a line in the direction w, is wt x.
If the data is linearly separable, there exists a good direction w.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
51. INTRODUCTION CS446-Fall 06 51 Finding a Good Direction Sample mean (positive, P; Negative, N):
Mp = 1/|P| ?P xi
The mean of the projected (positive, negative) points
mp = 1/|P| ?P wt xi= 1/|P| ?P yi = wt Mp
Is simply the projection of the sample mean.
Therefore, the distance between the projected means is:
|mp - mN|= |wt (Mp- MN )| Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
52. INTRODUCTION CS446-Fall 06 52 Finding a Good Direction (2) Scaling w isnt the solution. We want the difference to be large relative to some measure of standard deviation for each class.
S2p = ?P (y-mp )2 s2N = ?N (y-mN )2
1/ ( S2p + s2N ) within class scatter: estimates the variances of the sample.
The Fischer linear discriminant employs the linear function wt x for which
J(w) = | mP mN|2 / S2p + s2N
is maximized.
How to make this a classifier?
How to find the optimal w?
Some Algebra Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
53. INTRODUCTION CS446-Fall 06 53 J as an explicit function of w (1) Compute the scatter matrices:
Sp = ?P (x-Mp )(x-Mp )t SN = ?N (x-MN )(x-MN )t
and
SW = Sp + Sp
We can write:
S2p = ?P (y-mp )2 = ?P (wt x -wt Mp )2 =
= ?P wt (x- Mp ) (x- Mp )t w = wt Sp w
Therefore:
S2p + S2N = wt SW w
SW is the within-class scatter matrix. It is proportional to the sample covariance matrix for the d-dimensional sample. Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
54. INTRODUCTION CS446-Fall 06 54 J as an explicit function of w (2) We can do a similar computation for the means:
SB = (MP-MN )(MP-MN )t
and we can write:
(mP-mN )2 = (wt MP-wt MN )2 =
= wt (MP-MN) (MP-MN) t w = wt SB w
Therefore:
SB is the between-class scatter matrix. It is the outer product of two vectors and therefore its rank is at most 1.
SB w is always in the direction of (MP-MN )
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
55. INTRODUCTION CS446-Fall 06 55 J as an explicit function of w (3) Now we can compute explicitly: We can do a similar computation for the means:
J(w) = | mP mN|2 / S2p + s2N = wt SB w / wt SW w
We are looking for a the value of w that maximizes this expression.
This is a generalized eigenvalue problem; when SW is nonsingular, it is just a eigenvalue problem. The solution can be written without solving the problem, as:
w=S-1W (MP-MN )
This is the Fisher Linear Discriminant.
1: We converted a d-dimensional problem to a 1-dimensional problem and suggested a solution that makes some sense.
2: We have a solution that makes sense; how to make it a classifier? And, how good it is? Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
56. INTRODUCTION CS446-Fall 06 56 Fisher Linear Discriminant - Summary It turns out that both problems can be solved if we make assumptions. E.g., if the data consists of two classes of points, generated according to a normal distribution, with the same covariance. Then:
The solution is optimal.
Classification can be done by choosing a threshold, which can be computed.
Is this satisfactory? Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
57. INTRODUCTION CS446-Fall 06 57 Introduction - Summary We introduced the technical part of the class by giving two examples for (very different) approaches to linear discrimination.
There are many other solutions.
Questions 1: But this assumes that we are linear. Can we learn a function that is more flexible in terms of what it does with the features space?
Question 2: Can we say something about the quality of what we learn (sample complexity, time complexity; quality)
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.
Badges game
Dont give me the answer
Start thinking about how to write a program that will figure out whether my name has + or next to it.