Data Mining 2 (ex Análisis Inteligente de Datos y Data Mining ) Lluís A. Belanche

Data Mining 2 (ex Análisis Inteligente de Datos y Data Mining) Lluís A. Belanche

www.lsi.upc.edu/... /~belanche/docencia/aiddm/aiddm.html /~avellido/teaching/data_mining.htm

Contents of the course (hopefully) • 1. Introduction & methodologies • 2. Exploratory DM through visualization • 3. Pattern recognition: introduction • 4. Pattern recognition: the Gaussian case • 5. Feature extraction • 6. Feature selection & weighing • 7. Error estimation • 8. Linear methods are nice! • 9. Probability in Data Mining • 10. Latency, generativity, manifolds and all that • 11. Application of GTM: from medicine to ecology • 12. DM Case studies Sorry guys! … no fuzzy systems …

Error estimation

Feature extraction, selection and weighing have many uses

Linear classifiers are nice! (I)

Linear classifiers are nice! (II) F Transformation F (x) = [ F1(x), F2(x), … Fm(x) ] with x = [ x1, x2, …, xn ] Useful for “ascending” (m>n) or “descending” (m>n) with 0 < m,n < oo (integers) … an example?

Linear classifiers are nice! (III) F Nets F (x) = [ F1(x), F2(x), … Fm(x) ] with x = [ x1, x2, …, xn ] F(x) x 

Utility • This is a very powerful setting • Let us suppose: r>s  increase in dimension increase in expressive power, ease the task for almost any learning machine r<s  decrease in dimension visualization, compactation, noise reduction, removal of useless information Contradictory  !?

On intelligence … • What is Intelligence? • What is the function of Intelligence?  to ensure survival in nature • What are the ingredients of intelligence? • Perceive in a changing world • Reason under partial truth • Plan & prioritize under uncertainty • Coordinate different simultaneous tasks • Learn under noisy experiences

Parking a Car (difficult or easy?) “Generally, a car can be parked rather easily because the final position of the car is not specified exactly. It it were specified to within, say, a fraction of a millimeter and a few seconds of arc, it would take hours of maneuvering and precise measurements of distance and angular position to solve the problem.”  High precision carries a high cost.

The primordial soup Fuzzy Logic • Belief • Networks Neural Networks Soft Computing Chaos & Fractals Evolutionary Algorithms Rough Sets

What could MACHINE LEARNING possibly be? • In the beginning, there was a set of examples … • To exploit imprecision, uncertainty, robustness, data dependencies, learning and/or optimization ability, to achieve a working solution to a problem which is hard to solve. • To find an exact (approximate) solution to an imprecisely (precisely) formulated problem.

So what is the aim? • The challenge is to put these capabilities into use by devising methods of computation which lead to an acceptable solution at the lowest possible cost. • This should be the guiding principle

Rough Sets RS Different methods = different roles Fuzzy Logic : the algorithms for dealing with imprecision and uncertainty Neural Networks : the machinery for learning and function approximation with noise Evolutionary Algorithms : the algorithms for adaptive search and optimization uncertainty arising from the granularity in the domain of discourse

Examples of soft computing • TSP: 105 cities, • accuracy within 0.75%, 7 months • accuracy within 1%, 2 days • Compare • “absoulute best for sure” with “very good with very high probability”

Are you one of the top guns? • Consider … • Search space of size s • Draw N random samples • What is the probability p that at least one of them is in the top t ? • Answer: p = 1 – (1-t/s)N • Example: s= 1012, N=100.000, t=1.000  1 in 10.000 !

On Algorithms • what is worth? Specialized algorithms: best performance for special problems Generic algorithms: good performance over a wide range of problems Generic Algorithms Efficiency Specialized Algo. P Problems

Words are important ! • What is a theory ? • What is an algorithm ? • What is an implementation ? • What is a model ? • What does “non-linear” mean ? • What does “non-parametric” mean ?

The problem of induction • Classical problem in Philosophy • Example: 1,2,3,4,5,? • A more through example: JT

What are the conditions for successful learning? • Training data (sufficiently) representative • Principle of similarity • Target function within capacity of the learner • Non-dull learning algorithm • Enough computational resources • A correct (or close to) learning bias

And the Oscar goes to … The real problem is not whether machines think, but whether men do. B.F. Skinner, Contingencies of Reinforcement

Data Mining 2 (ex Análisis Inteligente de Datos y Data Mining ) Lluís A. Belanche

Data Mining 2 (ex Análisis Inteligente de Datos y Data Mining ) Lluís A. Belanche

Presentation Transcript

DCS 802 Data Mining Apriori Algorithm

DATA MINING Introductory and Advanced Topics Part II

Knime: a data mining platform

Data Mining: Concepts and Techniques — Slides for Textbook — — Chapter 6 —

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Data Mining

Data Mining Classification: Basic Concepts,

Data Mining Chapter 1

DATA MINING LECTURE 3

Special Topics in Educational Data Mining

Data Mining: Concepts and Techniques — Chapter 5 — Mining Frequent Patterns

Data Mining Algorithms for Recommendation Systems

Weka – A Data Mining Toolkit

Data Mining: Concepts and Techniques

CENG 464 Introduction to Data Mining

CS 490 Sample Project Mining the Mushroom Data Set

Spatial Data Mining: Accomplishments and Research Needs

Data Mining: Concepts and Techniques

DATA WAREHOUSING AND DATA MINING

Association Rule Mining

15-826: Multimedia Databases and Data Mining