250 likes | 369 Views
Machine Learning Supports Processes. Prof. Dr. Katharina Morik TU Dortmund, Computer Science LS VIII http://www-ai.cs.uni-dortmund.de. Overview. Learning programs from examples -- ILP Real programs Restricted logic programs Learning in XML Schema learning
E N D
Machine Learning Supports Processes Prof. Dr. Katharina Morik TU Dortmund, Computer Science LS VIII http://www-ai.cs.uni-dortmund.de
Overview • Learning programs from examples -- ILP • Real programs • Restricted logic programs • Learning in XML • Schema learning • Frequent sequences in web applications • Collaborative clustering of folksonomies • Learning about programs using meta-data • Co-update of files • Bug prediction: where, how often
The Old Dream • Programming a computer by examples • Alain Turing • Gordon Plotkin • Wray Buntine • Jörg-Uwe Kietz, Stefan Wrobel • Stephen Muggleton • Jean-Francois Puget, Céline Rouveirol member(A, [A|B]):- member(A, [B|C]):- member(A,C).
... but • the cut !Control structures are too hard to learn. • Negation: not(H E) ≠ H not(E)Negation by failure is not sufficient for planning (Puget 1989) • Humans easily express control structures. • Rule learning instead of learning programs.
Rule Learning from Examples Given: • a set of examples E={E+,E-} in LE, • a set of facts in LB Find a set of rules H in LH such that • M+(B E) M(H), i.e., H is true in all minimal models of B and E (correct) • For all h H there is e E such that not(B, E \{e} e), but B, E \{e}, h e (necessary) • For each h LH fulfilling 1) and 2), H h holds. (complete) • H is minimal. (not redundant)
Difficulties • Rule learning is more difficult than concept learning. • Inheriting difficulties of induction from deduction: D is more general than C if D C • Whether an example clause is covered by a hypothesis clause is hard to decide.
B: mother(ann,bart). mother(ann, britta). father(arno,bart). father(arno, britta). mother(britta,celine). father(bernd,celine). parent(bernd,celine). parent(britta,celine). D is generative and deterministic w.r.t. B, hence D C can efficiently be computed bound by depth (i) and arity(j). D=grandma(X,Z):- mother(X,Y),mother(Y,Z). {Y/britta} C=grandma(ann,celine):- mother(ann,britta), mother(britta,celine), father(arno,britta), father(bernd,celine). Example • D‘=grandma(X,Z):- mother(X,Y),parent(Y,Z). • {Y/bart} {Y/bernd} • {Y/britta} {Y/britta} • D‘ is generative and indeterministic w.r.t. B, hence D C • is NP in the general case. If the indeterministic part is restricted to • k literals, here 2-llocal, learnability is polynomial.
The Borderline • Learnability: ij-deterministic clauses (Muggleton et al. 1992) and k-l-local indeterministic clauses are polynomially learnable (Kietz 1996). D0:- DDET, DNONDET LOCi shares no variables with DDET, there is no LOCj LOCi k ≥ |LOCi | (at most k literals)
The Old Dream Revisited Logic-based approach to robotics: • no maps necessary • no correct measures necessary but only relations • acting as long as a perception feature is valid • learnable representation at all levels of abstraction • easy communication with humans Morik, Katharina and Klingspor, Volker and Kaiser, Michael (editors). Making Robots Smarter -- Combining Sensing and Action through Robot Learning. Kluwer Academic Press, 1999. Klingspor, Volker and Morik, Katharina and Rieger, Anke. Learning Concepts from Sensor Data of a Mobile Robot. In Machine Learning, Vol. 23, No. 2/3, pages 305-332, 1996.
Navigation of a Mobile Robot chain clauses induction of operational concepts example generation, signal to symbol rule compilation planning, plan execution measurements real-time planning, high-level commands cheap sensors, low-level behaviors
Experiments • 25 tours given, 18 tours in rooms not used for training • Goal: crossing doorway • 23 tours successful (door found, doorway passed) • 21 tours: all perception features correctly recognized • 2 tours: obstacle not recognized • along_door: 1 false positive, 1 false negative.
1st Lessons -- ILP • Spatiotemporal relations can excellently be expressed. • First-order logic is difficult for engineers, though easy for linguists and philosophers. • Calculations and numerical relationships can hardly be expressed.
Lessons: ILP • Prerequisites needed: • Inference engine • Preprocessing tools constraining logic further: • Sort taxonomy • Predicate taxonomy • Declarative bias • MOBAL (Morik et al. 1993) ~ 30 person years
XML is it! • Business processes are described in XML. • Web applications are based on XML. • Software development tools use XML (Castor, JAXB). • Data integration relies on XML. • Even machine learning experiments are expressed in XML (RapidMiner). • DTDs (subset of XML schema) are context-free grammars with regular expressions at the right hand side. These RE are deterministic.
Learning regular expressions • Given XML, induce the underlying schema through forming regular expressions covering all instances. • If every tag occurs just once, first a Single Occurrence Automaton is inferred which is transformed into a regular expression. (Bex et al. 2006)
a d b c e Example Input: bacacdacde, cbacdbacde, abccaadcde ((b?(a+c)+)+d)+e Z.B.: authors, citation, (volume|month), year, pages?,(title|description)?,xrefs? Forming 2-Grams, drawing edges between The 2 symbols (Garcia, Vidal 1999) Simplifying automaton, Applying rewrite rules
Frequent Sequences • GSP (Srikant, Agrawal1996) • finds frequent sequences of event types. • Event types are described by items. • (GET), (URL, p1, p2), (POST, URL, q1,q2,q3) • Application defines attributes of items, GPS defines event types by means of items.
Web Applications Security • Positive model (allowed access) to be acquired from observed logs (audit) • XML language for • Resource tree • Parameters (regular expressions) • Finding frequent sequences of resource requests using GSP (Srikant, Agrawal 1996) Bockermann 2007 [(GET, form.html)(POST, register.pl, salut, name)]
games photoshop shopping imported photography programming mac java linux javascript free news reference howto music php Clustering based on Taggings User, Tags, Resources
Multi-objective Term Clustering • Given (users U, resources R, terms T) and relations Y U R T • Find a hierarchical clustering of term sets, each containing a set of resources • Frequent Term-Based Clustering (Ester et al. 2002) • Selecting Pareto-optimal clusterings via NSGA • Orthogonal criteria of completeness vs. childcount, coverage vs. overlap Kaspari, Wurst 2007
Lessons: XML • Schema induction can be used for schema cleaning and enhancement. • Frequent sequences of web usage data deliver an XML positive access model. • Multi-objective frequent termset clustering delivers pareto-optimal navigation structures.
Co-update of files • For N= 4700 source files from a telephone switch system (PBX) N over 2 pairs are formed. • A pair is classified relevant if it is co-updated. • Each pair has attributes: same extension, common prefix length, number of shared types/routines... Word vectors for documentation and bug reports. • Decision tree learning -- best results with bug report words. Shirabad,Lethbridge, Matwin 2004
Predicting Bug Reports • Predicting the class, where failure is likely by a learned decision tree. • Predicting the number of future defects by a regression tree. • Number of revisions and reported problems are the best features. Bernstein et al. 2007
Process Optimization? • Directly XML-based:Finding independent sub-processes • Based on meta-descriptions of processes: • Finding appropriate feature sets • Optimizing according to which objective function?