450 likes | 569 Views
Typo-Squatting: a Nuisance or a Threat to Your Traffic?. Mishari Almishari. Outline. Introduction Background Methodology Parked Domain Classifier Data Sets Results Future Work Related Work Conclusion. Introduction - Motivation. Traffic is important to domains!
E N D
Typo-Squatting: a Nuisance or a Threat to Your Traffic? Mishari Almishari
Outline • Introduction • Background • Methodology • Parked Domain Classifier • Data Sets • Results • Future Work • Related Work • Conclusion
Introduction - Motivation • Traffic is important to domains! • no point of launching without incoming traffic • Loosing/Gaining traffic => loosing/gaining money • One way to price the ADS is PPC => how important traffic • Traffic Diversion could be a serious threat to a domain
Introduction - Motivation • Typos may divert the traffic • Users vulnerable to making typos • Users may forget about visiting target domain • Threat to Target Domain! • Intentionally registering such typo domains is called Typo-squatting
Introduction - Goal • To study how much traffic typo-squatters can get from target domains • Are those domains attracting much traffic? • Search engines typo-corrections! • Browser auto-completions! • How much traffic target domains is loosing? • Is it of negligible ratio or a serious threat? • Do users go back to target domains or get distracted?
Introduction - Challenges • How to identify typo-squatting domains? • Does Typo mean Typo-squatting? • Short Domains • www.abc.com and www.abd.com • Longer Domains • www.walmart.com and www.walkmart.com • If not, how can we? • Hijacking indicator
Introduction - Contribution • Automatic and accurate identification of typo-squatting domains • show how much traffic target domains are loosing towards typo-squatting domains
Outline • Introduction • Background • Methodology • Parked Domain Classifier • Data • Results • Related Work • Future Work • Conclusion
Background – Domain Parking Domain Parking showing a temporary page for an unused domain before launching them
Background – Domain Parking • Domain Parking Service • Parks and hosts unused domains • Monetize the traffic by showing ads • Many Typo-squatting domains are parked domains (Wang et al, 06), (Keats, 07)
Outline • Introduction • Background • Methodology • Parked Domain Classifier • Data • Results • Future Work • Related Work • Conclusion
Methodology • Data Collection • Identifying Typo-Squatting Domains
Methodology - Data Collection • DNS traces @ UCI Revolvers • Internal requests to domain names • DNS query proceeds http request • Caching limitation • Our study represents a lower-bound
Methodology – Identify Typo-squatting Domain • Identify Similar Domains • Single Error Typo • Single error accounts for 90-95% of spelling errors • www.walmart.com and www.walkmart.com • gTLD substitution • www.amazon.com and www.amazon.org
Methodology – Identify Typo-squatting Domains • But Similar domain is not enough! • www.walmart.com and www.walkmart.com • Random Sample • More than 54% are not Typo-squatting
Methodology – Identify Typo-squatting Domain • Identify Hijacking Indicator • Inappropriate Content • Domain For Sale • Forwarding to other domains • Ads – listing (Parked Domain) • More than 80%
Methodology – Identify Typo-squatting Domain Similar Domain Parked Domain AND Typo-Squatting Domain
Methodology – Identify Typo-squatting Domain • How to identify Parked Domain? • Parked Domain Classifier • Presence of Parking signatures • Well-known parking signatures (domain names/urls)
Methodology - Summary Identify Similar Domains Identify Parked Domains List of Typo-squatting Domains
Outline • Introduction • Background • Methodology • Parked Domain Classifier • Data • Results • Future Work • Related Work • Conclusion
Parked Domain Classifier Build Data Set Extract Core Features Combine Into Classifier
Data Set • Data Set consists of 2,800 domains • 700 are parked domain • Collected from MS Strider Website • 2,100 are non-parked domains • Collected From the fourteen Yahoo Directory Top Categories
Feature Selection • Heuristically, Identify common features in parked domain • Compute the distribution of those features for verification • Common Link Ratio Max
Combining Features Into Classifier • Tried Different Classifier Algorithms • Decision Tree • SVM • K-Nearest Neighbor • Random Forest • The best performance
Outline • Introduction • Background • Methodology • Parked Domain Classifier • Data Sets • Results • Future Work • Related Work • Conclusion
DATA Sets • DNS Traces • Four Months • Anonymous • CNAME and A • ~ 30 million domains (~ 2 billion hits) (~ 30,000 users) • Target Domain Set • Alexa’s Top 500 popular domains
Typo-Squatting Domains & Hits • 1,332 typo-squatting • 13,431 hits • Is it Large or Small? • 500 Target Domains • 4 Month Period • ~ 30,000 users • Given Similar Ratio may translate to large number • 30,000 => 13,000 • 300,000 => 130,000 • 3000,000 => 1,300,000
Typo-squatting Ratio • 0.025% of total number of queries • 89% LE 1% (70% LE 0.1%) ( 57% LE 0.01%)
User Correction Ratio – Alexa-500 on average, 54% of typo-squatting queries are corrected
Potential Hit Loss • 0.012% • 92% LE 1% (78% LE 0.1%) (64% LE 0.01%)
Potential Money Loss • 0.008% • 96% LE % (91% LE 0.1%) ( 81% LE 0.01%)
Non-existing Similar Domains • 463potential typo-squatting • 8,285 potential hits • 0.015% of total number of queries • 96% LE 1% (83% LE 0.1%) (66% LE 0.01%)
Typo-squatting Domains – TP60 • 629 typo-squatting • 15,499 hits • 0.045%of total number of queries • 76% LE 1% (60% LE 0.5%)
Top Ten Typo-squatting Domains • 19 % of all Typo-squatting hits
Top Ten Target Domains • Responsible of 55% to all typo-squatting queries of Alexa-500 • 50 Million hits of “www.facebook.com”
Typo Characterization • Most Typos are single errors (95% VS 5%) • Most gTLD sub are “com” to “org” (50%) • Add - 63% are of adjacent keys • Sub – 23% are of adjacent keys • Sub – 13% of substitutions are “a” and “o” • Spelling error
Outline • Introduction • Background • Methodology • Parked Domain Classifier • Data Sets • Results • Future Work • Related Work • Conclusion
Future Work • How much target domains are paying squatters? • Enhance our identification technique • Typo Modeling for getting traffic back • Why People go to Parked Domains? • How can you increase the traffic
Outline • Introduction • Background • Methodology • Parked Domain Classifier • Data Sets • Results • Future Work • Related Work • Conclusion
Related Work • MS Strider Project [Wang et al. Sruti06] • McAfee Study [Keats McAfee White Paper 07] • JAAL project [Banerjee et al. Infocom 08]
Outline • Introduction • Background • Methodology • Parked Domain Classifier • Data Sets • Results • Future Work • Related Work • Conclusion
Conclusion • Accurately and automatically identify typo-squatting domains • How much traffic go typo-squatters • Bound on how much traffic the target domain is loosing towards typo-squatting • inconsequential