50 likes | 203 Views
Intelligent Detection of Malicious Script Code. CS194, 2007-08 Benson Luk Eyal Reuveni Kamron Farrokh Advisor: Adnan Darwiche. Goals for the Quarter. Phase I Set up machine for testing environment Ensure that “whitelist” is clean Phase II
E N D
Intelligent Detection of Malicious Script Code CS194, 2007-08 Benson Luk Eyal Reuveni Kamron Farrokh Advisor: Adnan Darwiche
Goals for the Quarter Phase I Set up machine for testing environment Ensure that “whitelist” is clean Phase II Modify crawler to output only necessary data. This means: Grab only necessary information from webcrawling results Listen into Internet Explorer’s Javascript interpreter and output relevant behavior
Completed Tasks Phase I Configured machine with Norton Antivirus and Heritrix web crawler Webcrawler will be used to grab additional URLs, and Norton Antivirus will be used to verify that a URL has not launched an attack Created a Python script to ensure that visited sites are clean Captures Norton’s web attack logs before and after loading a site in Internet Explorer, then compares the logs for new entries and signals whether or not a site’s data should be discarded Phase II Configured Heritrix to run specific crawls that target a set of domains, and output minimal information The purpose is to gather as many URLs with scripts as possible for a large sample base Created a parser for Heritrix logs to filter out irrelevant websites For example, we are omitting URLs that point to images since they will not contain scripts
Pending Tasks and Difficulties Phase I Ensure whitelist is clean This can be a time-consuming task given the massive size of the list; we are going to start with a small subset of the list for now With our scripts we can also check for cleanliness as we load URLs Acquire a larger hard drive for the computer, as to be able to store the data from the crawls We have been unable to run a large crawl on the machine due to low hard drive space Phase II Figure out how to “listen in” on the Javascript interpreter in Internet Explorer and output relevant information about the scripts currently running This requires intimate knowledge of Internet Explorer and will likely consume too much time to develop from the ground up
Direction for Next Quarter • Obtain resources and/or software from Symantec for listening in on Javascript interpreter • Install a larger hard drive, ~750 GB • Design and create a database to store information about the scripts • Research and design an intelligent learning algorithm to read in and analyze the data