1 / 16

N-Gram-based Dynamic Web Page Defacement Validation

N-Gram-based Dynamic Web Page Defacement Validation. Woonyon Kim Aug. 23, 2004 NSRI, Korea. Contents. Introduction Related Works N-Gram Frequency Index N-Gram-based Index Distance Experiments Conclusions. Introduction. Defacement of Web Sites CSI/FBI 2001

hafwen
Download Presentation

N-Gram-based Dynamic Web Page Defacement Validation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. N-Gram-based Dynamic Web Page Defacement Validation Woonyon Kim Aug. 23, 2004 NSRI, Korea

  2. Contents • Introduction • Related Works • N-Gram Frequency Index • N-Gram-based Index Distance • Experiments • Conclusions

  3. Introduction • Defacement of Web Sites • CSI/FBI 2001 • 38 % of web sites were hacked. • 21% of hacked sites were not aware of their own defacements. • Zone-h • The defaced web pages are rapidly increased year by year. (.kr domain : about 200% increase) • Current solutions • Hash-based detection system for minimizing damage • Intrusion-tolerant system for contiguous service • Problems of current solutions • Current solutions use hash code as validation metric. Hash code can’t support dynamic characteristics.

  4. Introduction • N-Gram-based Index Distance (NGID) • A validation metric of dynamically changing web pages • The sum of absolute differences of frequency probability of N-Grams that can be found from both indexes. • NGID represents the similarity of two web pages. • NGID can be used to validate web pages with dynamic components or static.

  5. Related Works • Hash-based validation system • Detecting web page defacements by comparing two hash codes • Hash code is useful metric for large and static web pages. • Hash code can’t work properly on the dynamically changing web pages. • Intrusion-tolerant system • Hash code is used to validate web pages. • It also has limitation on dynamic web pages.

  6. N-Gram Frequency Index (1) • N-Gram • An N-character slice of a string • For example “TEXT” • 2-Gram : TE, EX, XT • N-Gram Frequency Index • An index file that is sorted from the most frequent N-Grams to the least frequent ones • It cuts off N-Grams below at a particular rank. So, minor changes are ignored. And this feature of N-Gram Frequency Index supports dynamics.

  7. N-Gram Frequency Index (2) • How to generate • Count all N-Grams frequencies in a web page. • Sort N-Grams from the most frequent to the least. • Cut off N-Grams below at a particular rank. • Sum up the frequencies of the remained N-Grams. • Compute the probability of each N-Gram frequency. • Save the N-Grams, frequency of the N-Grams, the probability of N-Grams into an index file.

  8. N-Gram-based Index Distance(NGID) • The sum of absolute difference of frequency probability of same N-Grams that can be found from both web pages. • A metric for detectingwhether a web page is defaced or not.

  9. N-Gram-based Index Distance • Evaluation is done by comparing NGID to validation threshold • Evaluation • Valid : NGID <= Validation Threshold • Invalid : NGID > Validation Threshold

  10. Experiments • Assumptions • Select 100 web pages • Choose 0.1 for Validation Threshold of NGID. • Procedure for false positive • Connect to a selected web page at a time in remote place. • Download a page and save it a file. • Validate it using NGID. • Validate it using Hash Code. • Above four steps are recursively applied. • Every 30-minute in a day

  11. Experiments • False Positive

  12. Experiments • False Positive

  13. Experiments • NGID valueas time flows 2 1 The time of contents update

  14. Experiments • Procedure for false negative • Collecting 50 web pages that are normal pages and hacked pages from zone-h. • Validate it using NGID. • Validate it using Hash Code. • Result of Hash code • 50-web pages are detected to be defaced. • The number of false negative is 0.

  15. Experiments • False Negative

  16. Conclusions • N-Gram-based Index Distance • A metric to evaluate dynamic web page defacement. • NGID can validate dynamically changing web pages. • Future Works • Need a learning model to resolve a validation threshold of each web page. • Need a feedback mechanism of normal index.

More Related