150 likes | 162 Views
Strengthening I-ReGEC classifier. G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National Research Council. Supervised learning. Supervised learning refers to the capability of a system to learn from a set of
E N D
Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National Research Council
Supervised learning • Supervised learning refers • to the capability of a system • to learn from a set of • input/output couples: • Training Set.
Classification • Consists of determining a model that it • allows to group elements according to • determined features • The groups are the classes
Evaluation of classification methods • Accuracy It’s ability’s pointer of prediction model • Speed Some methods employ little time than others • Robustness The defined rules and the accuracy do not change considerable with various set • Scalability Possibility to classify dataset of great dimensions
Goals • To render more efficient the examples’ choice during the training • Delete the redundant examples or insufficient informative contribution • Strengthening the training set, deleting the obsolete knowledge Building an efficient, scalabile and generalizable model
Classification techniques • Decision tree (Optimal Tree) Based on tree • Bayesian Networks (Slow in training) Compute posterior probabilities with Bayes’ theorem • Neurals Networks (Slow in training) Simulate the behavior of the biological systems • Support Vector Machine (SVM) Calculate hyperplanes
SVM: The state of the art • Find an examples set (support vectors) • representatives for classes Support vector Linear case Nonlinear case Separation margin Optimal Hyperplane
Regec • Two Hyperplanes representative for classes (GEPSVM’s family) Based on Genralized Eigenvalue
I-Regec • Select k points for each class with a clustering technique • (K-means) |S| = 2xK • Classify the test-set with the S points • Add misclassified points in incremental • mode to the S set • On proceede until the finish of misclassified points
Strengthening • Apply I-ReGEC in order to obtain the training set • Each iteration delete a point from training set • Apply I-ReGEC in each iteration with new input set S • Strengthening the set (save new S) if accuracy is improved
Microarray and matrix CLASSES FEATURES E X AM P L E S Gene expression
Results and Diagrams Golub 2D I-Regec Strengthening Golub 3D I-Regec Strengthening
Conclusions • The examples choice became more efficient • The reduntants or obsolete examples have been deleted • The training set are “strengthened”
Future work • In order to optimize the execution time, the Strengthening technique would to go integrated into I-Regec.