1 / 32

Detecting Visually Similar Web Pages: Application to Phishing Detection

Detecting Visually Similar Web Pages: Application to Phishing Detection. TEH-CHUNG CHEN SCOTT DICK JAMES MILLER Of University of Alberta Presented By: Rutvij Shah 2534739. Main Concept.

bunny
Download Presentation

Detecting Visually Similar Web Pages: Application to Phishing Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detecting Visually Similar Web Pages:Application to Phishing Detection TEH-CHUNG CHEN SCOTT DICK JAMES MILLER Of University of Alberta Presented By: Rutvij Shah 2534739

  2. Main Concept • Construction of a question which says web page difference measurement and implementation of an approach to provide an answer to this question are principal themes

  3. Why we need Phishing? • Techniques for similarity detection: • Web search engines • Automated categorization systems • phishing/spam filtering mechanisms • to prevent users from becoming the victims of malicious activities by filtering out suspicious Web pages with embedded similarity identification technology aimed at detecting malicious pages.

  4. SIMILARITY SIGNATURE • Feature-Based Similarity Measures • What Can’t We Count On for Visual Similarity Identification? • Two pages considered “identical” by users would exhibit vastly different “fingerprints” when feature-based techniques are employed.

  5. THEORETICAL FOUNDATION • Gestalt Theory • The theoretical basis for approach. • Gestalt visual psychology is based around a number of simple laws: figure/ground, proximity, closure, similarity, and continuation.

  6. Inattentional Blindness • IB can be summed up as the phenomenon of “looking without seeing.” • When IB happens, even though an individual’s eyes are wide open and various objects are imaged on their retinas, individuals seem to perceive nothing.

  7. Supersignals • Supersignals can be thought of as trying to provide an explanation of an individual’s behavior when they encounter a complex, but familiar, situation.

  8. OBJECTIFICATION OF THE SIMILARITY METRIC • Kolmogorov complexity can be viewed as the limiting case for compression technology. • claim that Normalized Information Distance (NID) can “discover all similarities between two arbitrary entities; and represents object similarity according to the dominating shared features between two objects.”

  9. Normalized Compression Distance • It is described as a parameter-free distance metric • Compression Algorithms and Supersignals • Gzip: Its reliability, speed, and simplicity make it the most popular compressor. • Bzip2: It is a fast compressor which uses the blocksorting algorithm

  10. APPLICATION TO ANTI-PHISHING TECHNOLOGIES • What is Phishing? • Phishing is a type of online identity theft in which sensitive information is obtained by misleading people to access a malicious Web page. • Motivation • Existing Anti-Phishing Solutions

  11. Motivation

  12. Existing Anti-Phishing Solutions • They are closely related to anti-spam solutions • Anti-phishing toolbars are the most popular. • Determines the currently viewed URL and send it to the blacklist or whitelist database for filtering. • The result that may be assurance or an alert warning will be delivered back to the user

  13. Another trick known as “DNS/URL redirection or domain forwarding”. • Fool the B/W databases by rapidly changing the DNS/URL IP address mapping in a dynamic DNS domain server. Mutual authentication: • The client can make sure they are browsing the legitimate Web site by setting up secure connection with the server

  14. A Key Characteristic of Phishing Web Sites • Phisher’s goal is to make the phishing Website resemble to the legitimate Website. • PhishTank is used to provide “accurate and actionable” information to the anti-phishing community.

  15. Legitimate Amazon

  16. Phishing Amazon

  17. EMPIRICAL EVALUATION • The Twelve-Pairs Experiment • The objective of this experiment is to see if we can group twelve legitimate WebPages and twelve phishing pages each targeting one of these pages together in pairs. • Design and Methodology. • It compares with all the sample websites. • Lower NCD values indicate greater similarity.

  18. Interpretation of Results • The “-L” in this table refers to the legitimate Web site of that brand, while “-P” denotes a phishing Web page targeting that brand. • Here, RBC-L is most similar to RBC-P in this group of Web pages.

  19. Design and Methodology Quartet tree visualization for 12 pairs experiment.

  20. The Clustering Experiment • This experiment examines the performance of the NCD similarity technique when the groups of highly similar Web sites are not balanced in size. • This experiment examines the performance of the NCD similarity technique when the groups of highly similar Web sites are not balanced in size.

  21. Design and Methodology: This is similar to the Twelve-Pairs Experiment’s Design and Methodology. • Interpretation of Results:

  22. The Large-Scale Experiment • Objective: To similarity-based anti-phishing technique to a realistic test. • Expected result: A statistically significant difference in the means of the two populations, specifically with the mean of the latter group being lower.

  23. Design • Goal: To examine how the NCD similarity technique would perform in a realistic, browser-level anti-phishing scenario. • When we visit a Web site, we automatically execute an image capture, followed by a comparison (using the NCD similarity technique) against all Web sites in the whitelist. If there is a strong similarity to one of the whitelisted sites (i.e.,theNCD is unusually low), we signal an alert.

  24. Methodology • Interpretation of Results

  25. Effectiveness as an Anti-Phishing Classifier

  26. Robustness against Countermeasures The effects of local noise on NCD values

  27. Nonstructural Distortions a) Phish before 40% of the pixels have been changed. (b) Phish after 40% of the pixels have been changed.

  28. Structural Distortions

  29. Conclusion • The concepts of Gestalt theory and supersignals provide us with a theoretical rationale for the conjecture that Web pages must be treated as indivisible entities (i.e., a whole) to be congruent to human perceptions. • We use the domain of anti-phishing technology to derive test scenarios for our experiments, as visual similarity between a phishing page and its target is an essential part of the phishing scam.

  30. Thank you…

More Related