1 / 11

Universal Learning Machines (ULM)

Universal Learning Machines (ULM). Włodzisław Duch and Tomasz Maszczyk Department of Informatics, Nicolaus Copernicus University , Toruń, Poland ICONIP 2009 , Bangkok. Plan. Meta-learning Learning from others ULM algorithm Types of features Illustrative results Conclusions

chace
Download Presentation

Universal Learning Machines (ULM)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Universal Learning Machines (ULM) Włodzisław Duch and Tomasz Maszczyk Department of Informatics, Nicolaus Copernicus University, Toruń, Poland ICONIP 2009, Bangkok

  2. Plan • Meta-learning • Learning from others • ULM algorithm • Types of features • Illustrative results • Conclusions Learn from others, not only from your own mistakes! Then you will always have free lunch!

  3. Meta-learning Meta-learning means different things for different people. Some people call “meta” learning of many models (ex. Weka), ranking them, arcing, boosting, bagging, or creating an ensemble in many ways  optimization of parameters to integrate models. Here meta-learning means learning how to learn. Goal: replace experts who search for the best models making a lot of experiments – there is no free lunch, but why to cook yourself? Search space of models is too large to explore it exhaustively, design system architecture to support knowledge-based search. One direction towards universal learning: use any method you like, but take the best from your competition! Best what? Best fragments of models, combinations of features.

  4. CI and “no free lunch” “No free lunch" theorem: no single system may reach the best results for all possible distributions of data. • Decision trees & rule-based systems: best for data with logical structure, require sharp decision borders, fail on problems where linear discrimination provides accurate solution. • SVM in kernelized form works well when complex topology is required but may miss simple solutions that rule-based systems find, fails when sharp decision borders are needed, fail on complex Boolean problems. The key to general intelligence: • specific information filters that make learning possible; • chunking mechanisms that combine partial results into higher-level mental representations. More attention should be paid to generation of useful features.

  5. ULM idea ULM is composed from two main modules: • feature constructors, • simple classifiers. In machine learning features are used to calculate: • linear combinations of feature values, • calculate distances (dissimilarites), scaled (includes selection) Is this sufficient? • No, non-linear functions of features carry information that cannot be easily recovered by CI methods. • Kernel approaches: linear solutions in the kernel space, implicitly add new features based on similarity K(X,SV). • => Create potentially useful, redundant set of futures. • How? Learn what other models do well!

  6. Binary features Binary features: • B1: unrestricted projections; MAP classifiers, p(C|b); 2NCregions, complexity O(1) • B2: Binary: restricted by other binary features; complexes b1ᴧ b2 … ᴧbk; complexity O(2k) • B3: Binary: restricted by distance; bᴧ r1є [r1-, r1+] ... ᴧ rkє [rk-, rk+]; separately for each b value. • Ex: b=1, r1є [0, 1] take vectors only from this slice N1: Nominal – like binary. r1 b

  7. Real features • R1: Line, original features xi, sigmoidal transformation s(xi)for contrast enhancement; search for 1D patterns (k-sep intervals). • R2: Line, like R1 but restricted by other features, for example zi= xi(X) only for |xj| < tj. • R3: Line, zi= xi(X)like R2 but restricted by distance • R4: Line – linear combinations z=W.X optimized by projection pursuit (PCA, ICA, QPC ...). • P1: Prototypes: weighted distance functions, or specialized kernels zi= K(X,Xi). • M1: Motifs, based on correlations between elements rather than input values.

  8. Datasets

  9. B1 Features Example of B1 features taken from segments of decision trees. Other features that frequently proved useful on this data: P1 prototypes, enhanced contrast R1. These features used in various learning systems greatly simplify their models and increase their accuracy.

  10. Conclusions • Systematic explorations of features transformations allows for discovery of simple models that more sophisticated learning systems may miss; results always improve and models simplify! • Some benchmark problems have been found rather trivial, and have been solved with a single binary feature, one constrained nominal feature, or one new feature constructed as a projection on a line connecting means of two classes. • Kernel-based features offer an attractive alternative to current kernel-based SVM approaches, offering multiresolution and adaptive regularization possibilities, combined with LDA or SVNT. • Analysis of images, multimedia streams or biosequences may require even more sophisticated ways of constructing features starting from available input features. • Learn from others, not only on your own errors!

More Related