Machine Learning

Machine Learning Basic definitions: concept: often described implicitely(„good politician“) using examples, i.e. training data hypothesis: an attempt to describe the concept in an explicite way concept / hypothesis are presented in the corresponding language hypothesis is verified using testing data background knowledge provides info about the context (properties of environment) learning algorithm searches the space of hypothesis to find consistent and complete h., the space is restricted by introducing bias

Goal of inductive ML • Suggest a hypothesis characterizing concept in a given domain (= the set of objects in this domain) implicitely described through a limited set of classified examplesE+ and E-. • The hypothesis: • has to cover E+ while avoiding E- • be applicable to objects which do not belong to E+ and E-.

Basic notions •  - domain of the concept K, ie. K. • E  a set of training examples is complemented by a classifcation, i.e. a function cl:E -->yes, no. • E+ denotes all elements of E classified as yes • E+ and E- are a disjoint cover of the set E

Example 1 „computer game“: Is there a way how to distinguish quickly a friendly robot from the others? Friendly r. Unfriendly r.

Concept Language and Background Knowledge • Examples of concept language: • A set of real or idealised examples expressed in the object language that represent each of the concepts learned (Nearest Neighbour) • attribute-value pairs (propositional logic) • relational concepts (first order logic) • One can extend the concept language with user-defined concepts or background knowledge. • BK plays an important role in Inductive Logic Programming (ILP) • The use of certain BK predicates may be a necessary condition for learning the right hypothesis. • Redundant or irrelevant BK slows down the learning.

Example 1: hypothesis and its testing H1 in the form of a decision tree if neck( r) = bow then „friendly” = nothing then if head_shape ( r) = triangle then „friendly“ else „unfriendly“ = tie then if body_shape( r) = square then „unfriendly“ else if head_shape( r) = circle then „friendly“ else „unfriendly“

Example 1: hypothesis and its testing

Hypothesis - attempt for a formal description • Both examples and hypothesis have to be specified in a language. Hypothesis has the form of a formula (X) with a single free variable X. • Let us define extensionExtof a hypotheis (X) wrt. the domain  as the set of all elements of , which meet the condition , tj.Ext= o: (o) platí  • Properties of hypothesis • hypothesis is complete (úplná), iff E+ Ext • h. is consistent, if it covers no negative examples, i.e. Ext E- =  • h.  is correct, if it is complete and consistent

How many correct hypothesis can be designed for a fixed training set E? • Fact: the number of possible concepts is much more than possible hypothesis (a formula) • concequence: most of the concepts cannot be characterized by a corresponding hypothesis - we have to accept the hypothesis, which are “approximately correct“ only. • Uniqueness of an “approximately correct“ hypothesis cannot be ensured.

Choice of a hypthesis and Ockham´s rasor • Williamu of Ockham recommends the way how to compare the hypothesis: „Entia non sunt multiplicanda praeter necessitatem“, • „Einstein: „… the language should not be sompler than necessary.“

Machine Learning Biases • The concept/hypothesis language specifies the language bias, which limits the set of all concepts/hypotheses that can be expressed/considered/learned. • The preference bias allows us to decide between two hypotheses (even if they both classify the training data equally). • The search bias defines the order in which hypotheses will be considered. • Important if one does not search the whole hypothesis space.

Preference Bias, Search Bias & Version Space Hypothesis are partially ordered Version space: searches for the subset of hypotheses that have zero training error. most gen. concept _ _ + + + most spec. concept + _ _

Types of learning • skill refinement (swimming, biking, ...) • knowledge acquisition • Rote Learning (chess, checkers), the aim is to find an appropriate heuristic function evaluating the current state of the game, e.g. MIN-MAX approach • Case-Based Reasoning: past experience is stored in a database. To solve a new problem, the systém searches the DB to find „the closest (the most similar) case“ - its solution is modified for the current problem • Advice Taking, learning to use "interpret" or "operacionalize" an abstract advice – search for „applicability conditions“ • Induction. Difference Analysis: candidate-elimination or version space approach, decision trees induction etc.

Decision tree induction Given: Training examples uniformly described by a single set of the same attributes and classified into a small set of classes (most often into 2 classes: positive X negative examples) Find: a decision tree allowing to characterize the new species Simple example: robots described by 5 discrete atributes and classified into 2 classes (friendly, unfriendly) • Is_smiling Î{no, yes}, • Holding Î{sword, balloon, flag}, • Has_tie Î{no, yes}, • Head_shape Î{round, square, octagone}, • Body_shape Î{round, square, octagone}.

TDIDT: Top-Down Ind. of Decision Trees given: S ... the set of classified examples goal: design a decision tree DT ensuring the same classification as S 1. The root is denoted by S 2. Find the "best" attribute at to be used for splitting the current set S 3. Split the set S into the subsets S1, S2, ..., Snwrt. value of at (all examples in the subset Si have the same value at = vi ). This set denotes a node of the DT 4. For each Sido: If all examples in Sibelong to the same class  or then create a leaf with the same label, else go to 1 with S = Si

TDIDT: How to choose the "best" attribute? minimize the entropy (Shanon) H(Si) = - pi+ log pi+ - pi- log pi- pi+=the probability that a random example in Si is  , estimated by frequency Let the attribute at split S into the subsets S1, S2, ..., Sn. The entropy of this system is defined H(S,at) = i n = 1 P(S i ) H (Si ) where P(S i ) is probability of the event S i , approx. by relative size |S i | / |S| Chooseatwith the minimalH(S,at)

Learning to fly simulator F16 [Samuel, 95] Design an automatic controller for F16 for following complex task: 1. Start up and rise upto the heigth 2000 feet 2. Fly 32000 feet north 3. Turn right 330° 4. When 42000 feet from the starting point (direction N-S) turn left and head towards the starting point, the rotation is finished when the course is between 140° and 180°. 5. Adjust the flight direction so that it is paralel to the landing course, tolerance 5° for flight direction and 10° for wing twist wrt. horizont 6. Decrease the heigth and move towards the start of the landing path 7. Lend Training data: 3 skilled pilots performed the assigned mission, each 30 times Each flight is described by 1000 vectors characterizing ( total of 90000 training examples): · Position and state of the plane · Pilot’s control action

Learning to fly simulator F16 [Samuel, 95] Position and state • on_gound boolean: is the plane on the ground? • g_limit boolean: acceleration limit exceeded? • wing_stall (is the plane stabile?), twist (int: 0°-360°, wings wrt. horizont) • elevation (angle „body wrt. horizont“), azimuth, roll_speed (wings deflection), elevation_speed, azimuth_speed , airspeed, climbspeed, E/W distance, N/S distance, fuel (weight of current supply) Control: • rollers and elevator: position of horizontal/ vertical deflection • thrust integer: 0-100%, force • flaps integer: 0°, 10° or 20°, wing twist Each of the 7 phases calls for a specific type of control. The training data are divided into 7 disjunctive sets which are used to design specific decision trees (independently for each task phase and each control action). Control ensured by 7 * 4 decison trees.

Tasks adressed by ML applications • Classification/prediction • diagnosis (troubleshooting motor pumps, medicine,.., SKICAT - astronomical cataloguing) • execution/control (GASOIL - separation of hydrocarbons) • configuration/design (Siemens: equipment c., Boeing) • language understanding • vision and speech • planning and schedulling • Why? Important speed up of the development and maintenace • 180 man-years to develop ES XCON with 8000 rules, 30 m-y needed for maint. • 1 man-year to develop BP GASOIL (MLbased) with 2800 rules, 0,1 m-y needed for maint.

Machine Learning