260 likes | 421 Views
Predicting Unix Commands With Decision Tables and Decision Trees. Kathleen Durant Third International Conference on Data Mining Methods and Databases September 25, 2002 Bologna, Italy. How Predictable Are a User’s Computer Interactions?. Command sequences The time of day
E N D
Predicting Unix Commands With Decision Tables and Decision Trees Kathleen Durant Third International Conference on Data Mining Methods and Databases September 25, 2002 Bologna, Italy
How Predictable Are a User’s Computer Interactions? • Command sequences • The time of day • The type of computer your using • Clusters of command sequences • Command typos
Characteristics of the Problem • Time sequenced problem with dependent variables • Not a standard classification problem • Predicting a nominal value rather than a Boolean value • Concept shift
Dataset • Davison and Hirsh – Rutgers university • Collected history sessions of 77 different users for 2 – 6 months • Three categories of users: professor, graduate, undergraduate • Average number of commands per sessions: 2184 • Average number of distinct commands per session : 77
Rutgers Study • 5 different algorithms implemented • C4.5 a decision-tree learner • An omniscient predictor • The most recent command just issued • The most frequently used command of the training set • The longest matching prefix to the current command • Most successful – C4.5 • Predictive accuracy 38%
Typical History Session 96100720:13:31 green-486 vs100 BLANK 96100720:13:31 green-486 vs100 vi 96100720:13:31 green-486 vs100 ls 96100720:13:47 green-486 vs100 lpr 96100720:13:57 green-486 vs100 vi 96100720:14:10 green-486 vs100 make 96100720:14:33 green-486 vs100 vis 96100720:14:46 green-486 vs100 vi
WEKA System • Provides • Learning algorithms • Simple format for importing data –ARFF format • Graphical user interface
History Session in ARFF Format @relation user10 @attribute ct-2 {BLANK,vi,ls,lpr,make,vis} @attribute ct-1 {BLANK,vi,ls,lpr,make,vis} @attribute ct0 {vi,ls,lpr,make,vis} @data BLANK,vi,ls vi, ls, lpr ls,lpr, make lpr, make, vis make, vis, vi
Learning Techniques • Decision tree using 2 previous commands as attributes • Minimize size of the tree • Maximize information gain • Boosted decision trees - AdaBoost • Decision table • Match determined by k nearest neighbors • Verification by 10-fold cross validation • Verification by splitting data into training/test sets • Match determined by majority
makes time = -2 dir lss time = -1 ls vi pwd make gcc emacs ls more gcc make pwd more emacs ls emacs make man pine time = 0 Learning a Decision Tree Command Values
Boosting a Decision Tree Decision Tree SolutionSet
Learning a Decision Table K - Nearest Neighbors (IBk) Example
Prediction Metrics • Macro-average – average predictive accuracy per person • What was the average predictive accuracy for the users in the study ? • Micro-average – average predictive accuracy for the commands in the study • What percentage of the commands in the study did we predict correctly?
Results: Decision Trees • Decision trees – expected results • Compute-intensive algorithm • Predictability results are similar to simpler algorithms • No interesting findings • Duplicated the Rutger’s study results
Results: AdaBoost • AdaBoost – very disappointing • Unfortunately none or few boosting iterations performed • Only 12 decision trees were boosted • Boosted trees predictability only increased by 2.4% on average • Correctly predicted 115 more commands than decision trees ( out of 118,409 wrongly predicted commands) • Very compute intensive and no substantial increase in predictability percentage
Results: Decision Tables • Decision table – satisfactory results • good predictability results • relatively speedy • Validation is done incrementally • Potential candidate for an online system
Summary of Prediction Results • Ibk decision table produced the highest micro-average • Boosted decision trees produced the highest macro-average • Difference was negligible • 1.37% - micro-average • 2.21% - macro-average
Findings • Ibk decision tables can be used in an online system • Not a compute-intensive algorithm • Predictability is better or as good as decision trees • Consistent results achieved on fairly small log sessions (> 100 commands) • No improvement in prediction for larger log sessions (> 1000 commands) • due to concept shift
Summary of Benefits • Automatic typo correction • Savings in keystrokes is on average 30% • Given an average command length is 3.77 characters • Predicted command can be issued with 1 keystroke
The algorithm. Let Dt(i) denote the weight of example i in round t. Initialization: Assign each example (xi, yi) E the weight D1(i) := 1/n. For t = 1 to T: Call the weak learning algorithm with example set E and weight s given by Dt. Get a weak hypothesis ht : X . Update the weights of all examples. Output the final hypothesis, generated from the hypotheses of rounds 1 to T. AdaBoost Description
43 42 41 40 39 38 37 36 35 Decision table Decision table Decision table Decision trees AdaBoost using Ibk using majority using match percentage split Macro-average Micro-average Complete Set of Results
Command at time = t-2 Command at t-1 Command at t-1 Command at t-1 make dir grep grep grep grep Learning a Decision Tree … ls make dir pwd ls emacs pwd grep ls Predicted Commands time = t