1 / 19

Benchmarking Anomaly-Based Detection Systems

Benchmarking Anomaly-Based Detection Systems. Written by: Roy A. Maxion Kymie M.C.Tan Presented by: Yi Hu. Agenda. Introduction Benchmarking Approach Structure in categorical data Constructing the benchmark datasets Experiment one Experiment two Conclusion & Suggestion.

smiriam
Download Presentation

Benchmarking Anomaly-Based Detection Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Benchmarking Anomaly-Based Detection Systems Written by: Roy A. Maxion Kymie M.C.Tan Presented by: Yi Hu

  2. Agenda • Introduction • Benchmarking Approach • Structure in categorical data • Constructing the benchmark datasets • Experiment one • Experiment two • Conclusion & Suggestion

  3. Introduction • Application of detection of anomaly; • Problems; • Difference in data regularities • Environment variation

  4. Benchmarking Approach • Methodology that can provide quantitative results of running an anomaly detector on various datasets containing different structure. • Address: environment variation - structuring of the data

  5. Structure in Categorical Data • Perfect regularity and perfect randomness(0—perfect regularity; 1—perfect randomness) • Entropy to measure the randomness

  6. Benchmarking datasets • Training data(background) • Testing data(background+anomaly) • Anomaly data

  7. Benchmarking datasets (cont’d) • Defining the sequence ; • Alphabet symbols.(English) • Alphabet size.(2,4,6,8,10– 5 suites) • Regularity.(0~1 at 0.1 intervals) • Sequence length.(All datasets-500,000 characters)

  8. Defining the anomalies • Anomalies: • Foreign-symbol anomalies;(Q from A,B,C,D,E) • Foreign n-gram anomalies;(CC, not the input of A,B,C,D , but it is the bi-gram of datasets) • Rare n-gram anomalies;(Usually <0.05)

  9. Generating the training and test data • 500,000 random numbers in table • 11 transition matrices used to produce the desired regularities. • Regularity indices between 0~1, with .1 increments

  10. Generating the anomaly • Independent of generating the test data. • Each of the anomaly types is generated in a different way.

  11. Injecting the anomalies into test data • The system determines the maximum number of anomalies.(Not more than .24% un-injected data.) • Select the injection intervals.

  12. Experiment one: • Data sets: • Training dataset with rare-4-gram anomalies less than 5% occurrence; • All variables were held constant except for dataset regularity; • Total 275 benchmark datasets, 165 of which were anomaly-injected;

  13. Experiment one: • steps: • Training the detector—11 training datasets and 55 training session are conducted; • Testing the detector—For each of the 5 alphabet sizes, the detector was run on 33 test datasets, 11 for each anomaly type. • Scoring the detection outcomes—event outcomes; ground truth;threshold; scope and presentation of results

  14. Experiment one: • ROC analysis: • Relative operating characteristic curve; • Compare two aspects of detection systems: hits---Y axis and false alarm--- X axis

  15. Experiment one: • Results: None of the curves overlap until they reach the 100% hit rate, demonstrating that regularity does influence detector performance. If regularity had no effect, all the ROC curve will superimpose each others.

  16. Experiment one: • Results: The false alarm rate rises as the regularity index grows(data become more and more random) also shows regularity do affect the detection performance.

  17. Experiment two • Natural Dataset: Y-axis: regularity index X-axis: users Data are taken from an undergraduate student computer. This diagram demonstrate clearly that regularity is a characteristic of natural data;

  18. Conclusion • In the experiments conducted here, all variables were held constant except regularity, and it was established that a strong relationship exist between detector accuracy and regularity. • An anomaly detector cannot be evaluated on the basis of its performance on a dataset of one regularity. • Different regularity occur not only between different users and environment, but also within user sessions.

  19. Suggestion • Overcoming this obstacle may require a mechanism to swap anomaly detectors or change the parameters of the current anomaly detector whenever regularity changes.

More Related