300 likes | 439 Views
Pattern Evaluation and Process Control. Wei-Min Shen Information Sciences Institute University of Southern California. Outline. Intuition of Interestingness Principles for Measuring Interestingness Existing Measurement Systems Minimal Description Length Principle
E N D
Pattern Evaluation and Process Control Wei-Min Shen Information Sciences Institute University of Southern California UCLA Data Mining Short Course
Outline • Intuition of Interestingness • Principles for Measuring Interestingness • Existing Measurement Systems • Minimal Description Length Principle • Methods for Process Control UCLA Data Mining Short Course
Why Is a Pattern “Interesting”? • I did not know X before • It contradicts my thinking (surprise) • It is supported by the majority of the data • It is an exception of the usual cases • Occam’s Razor: Simple is better • More? UCLA Data Mining Short Course
The Types of Classification Rule • Let h be a hypothesis and e the evidence, then respect to any given tuple, we have • Characteristic rule: he • Discriminate rule: eh • e and h can be interpreted as sets of tuples satisfying e and h respectively UCLA Data Mining Short Course
A Few Definitions • Given a discriminate rule R: eh • |e| is the cover of the rule • |he|/|e| is the confidence, reliability, orcertainty factor of the rule • R is “X% complete”: if |he|/|h| = X% (e satisfies X% of |h|) • R is “Y% discriminate”: if |¬he|/|¬h| = (100-Y)% (e satisfies (100-Y)% of |¬h|) UCLA Data Mining Short Course
Principles for Measuring “I” • 1. I = 0 if h and e are statistically independent • e and h have no relation at all • 2. I monotonically with |he| when |h|, |¬h|, and |e| remain the same • I relates to reliability UCLA Data Mining Short Course
Principles for Measuring “I” • 3. I monotonically with |h|(or|e|) when |he|, |e| (or |h|), and |¬h| remain the same • I relates to completeness • 4. I monotonically with |e| when reliability |he|/|e|, |h|, and |¬h| remain the same • I relates to cover when reliability is the same UCLA Data Mining Short Course
Treat Discriminate and Characteristic Rules Differently • Principles 1,2,3,4 apply to both discriminate and characteristic rules • 5.Treat discriminate and characteristic rules differently • Rule E H Discrim Complete • A Fever Flu 80% 30% • B Sneeze Flu 30% 80% • As discriminate rule I(A) > I(B) • As characteristic rule I(B) > I(A) UCLA Data Mining Short Course
Existing Measurement Systems • RI (Piatetsky-Shapiro 91) • J (Smyth and Goodman 92) • CE (Hong and Mao 91) • IC++ (Kamber and Shinghal 96) UCLA Data Mining Short Course
IC++ Measurement for Characteristic Rules • Given h, e, let rule d: eh and rule c: he • Nec(d) = P(¬e|h)/P(¬e|¬h) • Suf(d) = P(e|h)/P(e|¬h) • for he, C++= if 0Nec(d)<1 then (1-Nec(d))*P(h), else 0. • for h¬e, C+-= if 0Suf(d)<1 then (1-Suf(d))*P(h), else 0. • for¬he, C-+= if 0<Nec(d)< then (1-1/Nec(d))*P(¬h), else 0. • for¬h¬e, C--= if 0<Suf(d)< then (1-1/Suf(d))*P(¬h), else 0. UCLA Data Mining Short Course
Minimal Description Length Principle • The goodness of a theory or hypothesis (H) relative to a set a data (D) is measured: • The sum of • The length of H • The length of explanation of D using H • Assuming both use the optimal coding schema UCLA Data Mining Short Course
The Derivation of MDL • Based on probability theory, the best hypothesis H with respect to D is: • the max of P(H)P(D|H) • or the max of logP(H) + logP(D|H) • or the min of -logP(H) - logP(D|H) • Since the optimal encode of a set is related to the probability of the elements, so we have MDL • the min of |coding1(H)| + |coding2(D|H)| UCLA Data Mining Short Course
An Illustration of MDL One line theory: explanation length = 294.9 Two line theory: explanation length = 298.7 UCLA Data Mining Short Course
Fit Points with Lines • Theory = lines (#,angle,length,center) • Explanation: for each point: • the line it belongs to • the position on the line • the distance to line • Notice that the current coding is (x,y) • It is different if we choose coding (r,theta) UCLA Data Mining Short Course
Process Control • The Goal: to predict future from past • The Given: the past data sequence • The methods: • Adaptive Control Theory • Chaotic theory • State Machines UCLA Data Mining Short Course
Chaotic Theory • The data sequence may appear chaotic • The underlying model may be very simple • Extreme sensitive to initial condition • Difficult to make long term prediction • Short term prediction is possible UCLA Data Mining Short Course
An Example Chaotic Sequence 1.0 s(k) 0.5 0.0 20 40 60 80 100 Time step k The simple logistic map model: sk+1= ask (1 - sk), where a=4 UCLA Data Mining Short Course
Steps of Using Chaotic Theory • Reconstruction of state space: • xk = [xk, xk-, …, xk-(m-1)]T • where is a time delay, m is the embedding dimension • Taken’s theorem:, one can always find an embedding dimension m2[d]+1, where [d] is the integer part of the attractor’s dimension, to preserve the invariant measures • Central task: chose m and UCLA Data Mining Short Course
State Machine Approach • Identify the number of states by clustering all points in the sequence • Construct a transition function by learning from the sequence UCLA Data Mining Short Course
Construction & Synchronization • Environment = (A, P, Q, r) where |P|<|Q| • Model = (A, P, S, t) • Visibly equivalent • Perfect • Synchronized • The Construction problem • when and how to construct new model states • The Synchronization problem • how to determine which model state is current UCLA Data Mining Short Course
Learning with a Reset Button • Two environmental states p and q (they may appear the same to the learner) are different if and only if there exists a sequence e of actions that leads from p and q to states that are visibly different • The interaction with the environment • Membership Query • Equivalence Query: “yes” or a counter example UCLA Data Mining Short Course
Observation Table • Model states: {row(s) : s in S} • Initial state: row(l) • Final state: {row(s) : s in S and T(s)=1 • Transitions: (row(s),a) = row(sa) • Closed table: s,as’ row(sa)=row(s’) • Consistent table: row(s)=row(s’) row(sa)=row(s’a) E (experiments) States (actions from init state) T: Observations S Transitions SxA UCLA Data Mining Short Course
L* Algorithm • Initialize T for and each action in A • Loop Use membership queries to make T complete, closed, and consistent If EQ(T)=w /* an counter example */ then add w and all its prefixes into S;Until EQ(T)=yes. UCLA Data Mining Short Course
The Little Prince Example • A counter example ftf for M3 (Fig 5.3), the model ends at rose, but the real observation is volcano • An inconsistency in T4 (Tab 5.5), where row(f)=row(ft), but row(ff) row(ftf). UCLA Data Mining Short Course
Homing Sequence • L* is limited by a reset button • Homing Sequence h: if two observation sequences of executing h are the same, then these two sequences lead to the same state • Let q<h> be observation sequence, andqh the ending state, then h is defined as • for all p, q: [p<h>=q<h>] [ph=qh] • e.g., {fwd} is a homing seq for Little Prince UCLA Data Mining Short Course
Properties of Homing Seq • Every FDA has a homing sequence • Can be constructed from a FDA by appending actions (<n) that distinguish a pair of states • The length of this construction is n2 • There are FDA whose shortest h is n2 long • h can be used as a reset • h cannot guarantee go to a fixed state UCLA Data Mining Short Course
L* with a Homing Sequence h • Every time a reset is needed, repeat h until you see the desired observation sequence • Or for each possible observation sequence of h, make a copy of L* (see Fig 5.6) UCLA Data Mining Short Course
Learning the Homing Sequence • If h is not a homing sequence, then we may discover that the same observation sequence produced by executing h may lead us to two different states, p and q, for there is a sequence of actions x that p<x> q<x> • then, a better approximation of homing sequence is hx UCLA Data Mining Short Course
L* + Learning h • Assume a homing sequence h, initially h= • When h is shown to be incorrect, extend h, and discard all copies of L* and start again • When h is incorrect, then there exists x such that qh<x>ph<x>, even if q<h>=p<h> UCLA Data Mining Short Course
Learning h and the Model • Revist and Shapire’s algorithm (Fig 5.7) • Little Prince Example (notice the inconsistency produced by ff in Fig 5.10) UCLA Data Mining Short Course