290 likes | 611 Views
Outline. What do we mean with classification, why is it usefulMachine learning- basic conceptSupport Vector Machines (SVM)Linear SVM
E N D
1. An Introduction to Support Vector Machine Classification
2. Outline What do we mean with classification, why is it useful
Machine learning- basic concept
Support Vector Machines (SVM)
Linear SVM basic terminology and some formulas
Non-linear SVM the Kernel trick
An example: Predicting protein subcellular location with SVM
Performance measurments
3. Classification Everyday, all the time we classify things.
Eg crossing the street:
Is there a car coming?
At what speed?
How far is it to the other side?
Classification: Safe to walk or not!!!
5. Classification tasks in Bioinformatics
6. Problems in classifying biological data Often high dimension of data.
Hard to put up simple rules.
Amount of data.
Need automated ways to deal with the data.
Use computers data processing, statistical analysis, try to learn patterns from the data (Machine Learning)
8. Black box view ofMachine Learning
9. Tennis example 2
10. Linear Support Vector Machines
11. Linear SVM 2
12. Definitions
13. Maximizing the margin
14. The Lagrangian trick
15. Problems with linear SVM
16. Non-linear SVM 1
17. Non-linear svm2
18. Solving the optimization problem In many cases any general purpose optimization package that solves linearly constrained equations will do.
Newtons method
Conjugate gradient descent
Other methods involves nonlinear programming techniques.
19. Overtraining/overfitting
20. Overtraining/overfitting 2 Example with a gardener.Example with a gardener.
21. A practical example, protein localization Proteins are synthesized in the cytosol.
Transported into different subcellular locations where they carry out their functions.
Aim: To predict in what location a certain protein will end up!!!
22. Subcellular Locations
23. Method Hypothesis: The amino acid composition of proteins from different compartments should differ.
Extract proteins with know subcellular location from SWISSPROT.
Calculate the amino acid composition of the proteins.
Try to differentiate between: cytosol, extracellular, mitochondria and nuclear by using SVM
24. Input encoding
25. Cross-validation
26. Performance measurments
27. Results We definetely get some predictive power out of our models.
Seems to be a difference in composition of proteins from different subcellular locations.
Another questions: What about nuclear proteins. Is there a difference between DNA-binding proteins and others???
28. Conclusions We have (hopefully) learned some basic concepts and terminology of SVM.
We know about the risk of overtraining and how to put a measure on the risk of bad generalization.
SVMs can be useful for example in predicting subcellular location of proteins.
29. You cant input anything into a learning machine!!!
30. References