420 likes | 566 Views
Typo-Squatting: a Nuisance or a Threat to Your Traffic?. Mishari Almishari. Outline. Introduction Background Methodology Parked Domain Classifier Measurements Future Work Related Work Conclusion. Introduction - Motivation. Traffic is important to web domains!
E N D
Typo-Squatting: a Nuisance or a Threat to Your Traffic? Mishari Almishari
Outline • Introduction • Background • Methodology • Parked Domain Classifier • Measurements • Future Work • Related Work • Conclusion
Introduction - Motivation • Traffic is important to web domains! • no point of launching without incoming traffic • Loosing/Gaining traffic means loosing/gaining money • One way to price the ADS is Pay Per Click Model • Traffic Diversion could be a serious threat to a domain
Introduction - Motivation • Typos may attract traffic • Users vulnerable to making typos • Users may forget about visiting target domain • Threat to Target Domain! • Intentionally registering such typo domains is called Typo-squatting
Introduction - Goal • To study how much traffic typo-squatters can get from target domains • Are those domains attracting much traffic? • There are many typo-squatting domains registered (Banerjee et al., 08) • Search engines typo-corrections and browser auto-completions! • How much traffic target domains are loosing? • Is it of negligible ratio or a serious threat? • Do users go back to target domains or get distracted?
Introduction - Contribution • Automatic and accurate identification of typo-squatting domains (Measurement Methodology) • Bound on how much traffic target domains are loosing towards typo-squatting domains (Measurement Results)
Outline • Introduction • Background • Methodology • Parked Domain Classifier • Measurements • Related Work • Future Work • Conclusion
Background – Domain Parking Domain Parking is the practice of showing a temporary page for an unused domain before launching it
Background – Domain Parking • Domain Parking Service • Parks and hosts unused domains • Monetize the traffic by showing ads • Many Typo-squatting domains are parked domains (Wang et al, 06), (Keats, 07)
Outline • Introduction • Background • Methodology • Parked Domain Classifier • Measurements • Future Work • Related Work • Conclusion
Methodology • Data Collection • Identifying Typo-Squatting Domains
Methodology - Data Collection Our Machine UCI Resolver UCI NET INTERNET USER QUERY DATE TIME HASHED-IP DOMAIN TYPE CLASS
Methodology – Identify Typo-squatting Domain • Identify Similar Domains • Single Error Typo • Single error accounts for 90-95% of spelling/typo errors (Pollock et al, 83) • www.walmart.com and www.wamart.com • gTLD substitution • www.amazon.com and www.amazon.org
Methodology – Identify Typo-squatting Domains • But Similar domain is not enough! • www.abc.com and www.abd.com • www.walmart.com and www.walkmart.com • www.usps.com and www.usps.org • Random Sample • More than 54% are not Typo-squatting Need to Identify Hijacking Intention
Methodology – Identify Typo-squatting Domain • Identify Hijacking Indicator • Parked Domain (Ads – listing) • ~ 88% • Forwarding to other domains • ~ 8% • Others: Inappropriate Content, … Parked Domain as the indicator
Methodology – Identify Typo-squatting Domain Similar Domain Parked Domain AND Typo-Squatting Domain
Methodology – Identify Typo-squatting Domain • How to identify Parked Domain? • Parked Domain Classifier • 96% • Presence of Parking signatures • Well-known parking signatures (domain names/urls)
Methodology - Summary Identify Similar Domains Identify Parked Domains List of Typo-squatting Domains
Outline • Introduction • Background • Methodology • Parked Domain Classifier • Measurements • Future Work • Related Work • Conclusion
Parked Domain Classifier Build Data Set Extract Core Features Combine Into Classifier
Data Set • Data Set consists of 2,800 domains • 700 are parked domain • Collected from MS Strider Website • 2,100 are non-parked domains • Collected From the fourteen Yahoo Directory Top Categories
Feature Selection • Heuristically, Identify common features in parked domain • Compute the distribution of those features for verification • Common Link Ratio Max
Combining Features Into Classifier • Tried Different Classifier Algorithms • Decision Tree • SVM • K-Nearest Neighbor • Random Forest • The best performance
Outline • Introduction • Background • Methodology • Parked Domain Classifier • Measurements • Future Work • Related Work • Conclusion
DATA Sets • DNS Traces • Four Months • ~ 30 million domains (~ 2 billion hits) (~ 30,000 users) • Target Domain Set • Alexa’s Top 500 popular domains • ~53,000,000 hits
Typo-Squatting Domains & Hits • 1,332 typo-squatting • 13,431 hits (~ 110 a day) • Is it Large or Small? • 500 Target Domains • 4 Month Period • ~ 30,000 users • Given Similar Ratio may translate to non-trivial number • 30,000 => 110 Per Day • 300,000 => 1,100 Per Day • 3000,000 => 11,000 (X 365 = ~ 4,000,000 A YEAR)
Typo-squatting Ratio • 0.025% of total number of queries • (89% , ≤ 1%) (70%, ≤ 0.1%) ( 57%, ≤ 0.01%)
User Correction Ratio – Alexa-500 • 54% of typo-squatting queries are corrected • ~ 51% squatted target domains have most squat hits corrected
Potential Hit Loss • Potential Hit Loss Ratio = 0.012% • (92% , ≤1%) (78%, ≤ 0.1%) (64%, ≤ 0.01%)
Potential Money Loss • ~75% do not point to target domains • Referring Typo-Sqt Ratio = 0.008% • (96%, ≤1%) (91%, ≤ 0.1%) ( 81%, ≤ 0.01%)
Typo-Squatting Distribution • 19 % of all Typo-squatting hits
Typo Characterization • Most Typos are single errors (95% VS 5%) • Most gTLD sub are “com” to “org” (50%) • Add – 37 % are of non-adjacent keys • Sub – 77% are of non-adjacent keys • Sub – 13% of substitutions are “a” and “o” • Spelling error
Typo-squatting Domains – TP60 • 15,499 hits • 0.045%of total number of queries • (76%, ≤ 1%) (60%, ≤ 0.5%)
Outline • Introduction • Background • Methodology • Parked Domain Classifier • Measurements • Future Work • Related Work • Conclusion
Future Work • How much of the ads budget go to squatters? • Enhance our identification technique • See, if the results hold at other ISPs • Typo Modeling for getting traffic back
Outline • Introduction • Background • Methodology • Parked Domain Classifier • Measurements • Future Work • Related Work • Conclusion
Related Work • MS Strider Project [Wang et al. Sruti06] • McAfee Study [Keats McAfee White Paper 07] • JAAL project [Banerjee et al. Infocom 08]
Outline • Introduction • Background • Methodology • Parked Domain Classifier • Measurements • Future Work • Related Work • Conclusion
Conclusion • Accurately and automatically identify typo-squatting domains • How much traffic go to typo-squatters • Bound on how much traffic the target domain is loosing towards typo-squatting • inconsequential