180 likes | 273 Views
User Profiling for Intrusion Detection in Windows NT. Tom Goldring R23. What are we doing?. Observe normal behavior of computer users Build models from training data Score new sessions against these models. Why do it?. User is authenticated by behavior, therefore very hard to spoof
E N D
User Profiling for Intrusion Detection in Windows NT Tom Goldring R23
What are we doing? • Observe normal behavior of computer users • Build models from training data • Score new sessions against these models.
Why do it? • User is authenticated by behavior, therefore very hard to spoof • Detect malicious insider
Comparison with Program Profiling • We believe User Profiling is a harder problem • People do not come with “specs” • So user behavior is much less predictable • In fact, a certain level of anomalous activity is to be expected and must be taken into account • In a point and click environment, some users look very much alike.
Which data source? • Command line activity • can pretty well guess what user is doing • but: misses windows, scripts • on many systems, it’s an endangered species if not already extinct • System calls • very fine granularity • But: (machine behavior) / (human behavior) very high • Next to impossible to guess what the user is doing • Process table • best of both worlds, plus tree structure • But: we still need to filter out machine behavior and reduce the data so that we can reconstruct what the user did
The good news • Adding window titles tothe process informatiion gives superior data • now very easy to filter out system noise by matching process id’s with that of active window • solves the “explorer” problem • anyone can read the data and tell what the user is doing • a wealth of new information, e.g. subject lines of emails, names of web pages, files and directories
But … • Our data now consists of successive window titles with process information in between • So we have a mixture of two different types of data, making feature selection somewhat less obvious. • Ideally, feature values should • be different for different users, but • be similar for different sessions belonging to the same user.
Some Candidate Features • time between windows • time between new windows • # windows open at once (sampled at some time interval) • # windows open at once, weighted by time open • # words in window title • (# WA words in window title) / (#words in window title)
Modeling and Scoring • Using a feature set we like, convert each session into a feature stream • We now have a standard classification problem, e.g. we might use • naïve Bayes • For a feature matrix, some candidates are • random forests • support vector machines
What’s next • Build in specific methods for major application programs • Monitor keystrokes and mouse movements • Characterize insider misuse.