1 / 42

Typo-Squatting: a Nuisance or a Threat to Your Traffic?

Typo-Squatting: a Nuisance or a Threat to Your Traffic?. Mishari Almishari. Outline. Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion. Introduction - Motivation. Traffic is important to web domains!

irma
Download Presentation

Typo-Squatting: a Nuisance or a Threat to Your Traffic?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Typo-Squatting: a Nuisance or a Threat to Your Traffic? Mishari Almishari

  2. Outline • Introduction • Background • Methodology • Parked Domain Classifier • Measurements • Future Work • Related Work • Conclusion

  3. Introduction - Motivation • Traffic is important to web domains! • no point of launching without incoming traffic • Loosing/Gaining traffic means loosing/gaining money • One way to price the ADS is Pay Per Click Model • Traffic Diversion could be a serious threat to a domain

  4. Introduction - Motivation • Typos may attract traffic • Users vulnerable to making typos • Users may forget about visiting target domain • Threat to Target Domain! • Intentionally registering such typo domains is called Typo-squatting

  5. Introduction - Goal • To study how much traffic typo-squatters can get from target domains • Are those domains attracting much traffic? • There are many typo-squatting domains registered (Banerjee et al., 08) • Search engines typo-corrections and browser auto-completions! • How much traffic target domains are loosing? • Is it of negligible ratio or a serious threat? • Do users go back to target domains or get distracted?

  6. Introduction - Contribution • Automatic and accurate identification of typo-squatting domains (Measurement Methodology) • Bound on how much traffic target domains are loosing towards typo-squatting domains (Measurement Results)

  7. Outline • Introduction • Background • Methodology • Parked Domain Classifier • Measurements • Related Work • Future Work • Conclusion

  8. Background – Domain Parking Domain Parking is the practice of showing a temporary page for an unused domain before launching it

  9. Background - Domain Parking

  10. Background – Domain Parking

  11. Background – Domain Parking

  12. Background – Domain Parking • Domain Parking Service • Parks and hosts unused domains • Monetize the traffic by showing ads • Many Typo-squatting domains are parked domains (Wang et al, 06), (Keats, 07)

  13. Outline • Introduction • Background • Methodology • Parked Domain Classifier • Measurements • Future Work • Related Work • Conclusion

  14. Methodology • Data Collection • Identifying Typo-Squatting Domains

  15. Methodology - Data Collection Our Machine UCI Resolver UCI NET INTERNET USER QUERY DATE TIME HASHED-IP DOMAIN TYPE CLASS

  16. Methodology – Identify Typo-squatting Domain • Identify Similar Domains • Single Error Typo • Single error accounts for 90-95% of spelling/typo errors (Pollock et al, 83) • www.walmart.com and www.wamart.com • gTLD substitution • www.amazon.com and www.amazon.org

  17. Methodology – Identify Typo-squatting Domains • But Similar domain is not enough! • www.abc.com and www.abd.com • www.walmart.com and www.walkmart.com • www.usps.com and www.usps.org • Random Sample • More than 54% are not Typo-squatting Need to Identify Hijacking Intention

  18. Methodology – Identify Typo-squatting Domain • Identify Hijacking Indicator • Parked Domain (Ads – listing) • ~ 88% • Forwarding to other domains • ~ 8% • Others: Inappropriate Content, … Parked Domain as the indicator

  19. Methodology – Identify Typo-squatting Domain Similar Domain Parked Domain AND Typo-Squatting Domain

  20. Methodology – Identify Typo-squatting Domain • How to identify Parked Domain? • Parked Domain Classifier • 96% • Presence of Parking signatures • Well-known parking signatures (domain names/urls)

  21. Methodology - Summary Identify Similar Domains Identify Parked Domains List of Typo-squatting Domains

  22. Outline • Introduction • Background • Methodology • Parked Domain Classifier • Measurements • Future Work • Related Work • Conclusion

  23. Parked Domain Classifier Build Data Set Extract Core Features Combine Into Classifier

  24. Data Set • Data Set consists of 2,800 domains • 700 are parked domain • Collected from MS Strider Website • 2,100 are non-parked domains • Collected From the fourteen Yahoo Directory Top Categories

  25. Feature Selection • Heuristically, Identify common features in parked domain • Compute the distribution of those features for verification • Common Link Ratio Max

  26. Combining Features Into Classifier • Tried Different Classifier Algorithms • Decision Tree • SVM • K-Nearest Neighbor • Random Forest • The best performance

  27. Outline • Introduction • Background • Methodology • Parked Domain Classifier • Measurements • Future Work • Related Work • Conclusion

  28. DATA Sets • DNS Traces • Four Months • ~ 30 million domains (~ 2 billion hits) (~ 30,000 users) • Target Domain Set • Alexa’s Top 500 popular domains • ~53,000,000 hits

  29. Typo-Squatting Domains & Hits • 1,332 typo-squatting • 13,431 hits (~ 110 a day) • Is it Large or Small? • 500 Target Domains • 4 Month Period • ~ 30,000 users • Given Similar Ratio may translate to non-trivial number • 30,000 => 110 Per Day • 300,000 => 1,100 Per Day • 3000,000 => 11,000 (X 365 = ~ 4,000,000 A YEAR)

  30. Typo-squatting Ratio • 0.025% of total number of queries • (89% , ≤ 1%) (70%, ≤ 0.1%) ( 57%, ≤ 0.01%)

  31. User Correction Ratio – Alexa-500 • 54% of typo-squatting queries are corrected • ~ 51% squatted target domains have most squat hits corrected

  32. Potential Hit Loss • Potential Hit Loss Ratio = 0.012% • (92% , ≤1%) (78%, ≤ 0.1%) (64%, ≤ 0.01%)

  33. Potential Money Loss • ~75% do not point to target domains • Referring Typo-Sqt Ratio = 0.008% • (96%, ≤1%) (91%, ≤ 0.1%) ( 81%, ≤ 0.01%)

  34. Typo-Squatting Distribution • 19 % of all Typo-squatting hits

  35. Typo Characterization • Most Typos are single errors (95% VS 5%) • Most gTLD sub are “com” to “org” (50%) • Add – 37 % are of non-adjacent keys • Sub – 77% are of non-adjacent keys • Sub – 13% of substitutions are “a” and “o” • Spelling error

  36. Typo-squatting Domains – TP60 • 15,499 hits • 0.045%of total number of queries • (76%, ≤ 1%) (60%, ≤ 0.5%)

  37. Outline • Introduction • Background • Methodology • Parked Domain Classifier • Measurements • Future Work • Related Work • Conclusion

  38. Future Work • How much of the ads budget go to squatters? • Enhance our identification technique • See, if the results hold at other ISPs • Typo Modeling for getting traffic back

  39. Outline • Introduction • Background • Methodology • Parked Domain Classifier • Measurements • Future Work • Related Work • Conclusion

  40. Related Work • MS Strider Project [Wang et al. Sruti06] • McAfee Study [Keats McAfee White Paper 07] • JAAL project [Banerjee et al. Infocom 08]

  41. Outline • Introduction • Background • Methodology • Parked Domain Classifier • Measurements • Future Work • Related Work • Conclusion

  42. Conclusion • Accurately and automatically identify typo-squatting domains • How much traffic go to typo-squatters • Bound on how much traffic the target domain is loosing towards typo-squatting • inconsequential

More Related