All Your Contacts Are Belong to Us: Automated Identity Theft Attacks on Social Networks

All Your Contacts Are Belong to Us: Automated Identity Theft Attacks on Social NetworksWWW 2009 Madrid Track: Security and Privacy/ Session: Web Security Leyla Bilge, Thorsten Strufe, DavideBalzarotti, EnginKirdaPresentation: Nick Louloudakis

All Your Contacts Are Belong to Us: Automated Identity Theft Attacks on Social Networks • Social Websites are increasing popularity nowadays: • Facebook reporting 3% weekly growth • Millions of registered users nowadays • But is it possible to launch an automated crawling/identity theft attack? • The answer is YES • Focused on 2 types of attacks: • Automated identity theft from existing profiles • Cross-site profile cloning

Social Networks Phenomenon • Social network: a social structure consisted of nodes (individuals/organizations) • New Phenomenon on the Internet • Nodes might be connected via friendship, common values, visions, ideas, or business relationships • Social Networks are increasing popularity • In 2009, Facebook had more than 150 million active users, reporting 3% growth every week, with over one billion photos.(today, it has more than1,2 billion registered users) • LinkedIn boasts of having 30 million registered users • XING (German/Austrian professional network) has 6 million users

Social Networks: Food for miscreants • Miscreants attraction grows in analogy with new technology growth • Back in the 90s, spam was not a problem • Today, about 90% of emails in North America/Europa/Australasia is spam [Spamhaus Project] • Malicious emails number has increased • Social Networks increasing popularity attracted more and more miscreants • Myspace and Facebook had already suffered from old ideas such as worms (LoveLetter etc.) • This kind of attack might be raise suspicion and become filtered(Bayesian Filters etc.) in emails • However, this might not occur on social networks because of the usual absence of such protecting mechanisms

Social Networks: Reasons for being attractive to attackers • Although no real large-scale social network attack has occurred until today, social networks are an attractive target for attackers: • They contain Invaluable sensitive data and information • Registered Users provide their real e-mail addresses • They also provide many sensitive information • Education • Friends • Professional background • Activities involved in • Current or previous relationship status • Having associated e-mail addresses with real people might be useful to efficiently personalize user marketing activities • Having associated real e-mail addresses to user activities might allow successfully spam filtering bypass

Attack Prerequisites and approaches • A confirmed personal “relationship” with the person who is concerned is needed • Hamiel and Moyer experimented on impersonating the security expert MarcussRanum collecting information on the web (Wikipedia, personal profile) • Received many friend requests, even from one of the target’s family members • Paper attack approaches: • Profile Impersonating via creating an identical to the user’s profile in a Social Network he already has registered and sending connection requests to his/her contacts • That way, “stealing” contacts is possible • If connections confirmed, the attacker has access to those contacts’ information • Cross-Site Profile Cloning in Social Networks the user has not registered yet and rebuild a victim’s social network on that Social Network • Especially Effective because profiles exist only once on the attacking Social Network Those Approaches can be applied on a large scale, via automated procedures, using a tool called iCloner.

iCloner Overview • Social Network Identity cloning system • Consisted by 4 main components: • The Crawler • The Identity Matcher • The Profile Creator • The Message Sender • Also contains a CAPTCHA Analysis component

iCloner: Crawler Component • Crawling the target social network to collect information on public profiles: • Social networks keep most personal information publicly hidden • But some allow some information visible to the public • Facebook friend lists are public information e.t.c. • Keeping record of the profiles that could not be retrieved • Works on Facebook, StudiVZ, MeinVZ and XING

iCloner Identity Matcher & Profile Creator Components • The Identity Matcher analyzes the information in the database and tries to identify profiles of the same person in different social networks • The Profile Creator component uses this information to create accounts on unregistered by the victim Social Networks

iCloner Message Sender • The Message Sender is responsible: • To login into the created accounts • To automatically send friend requests to the victim’s contacts • To access a user’s profile sometimes to “confuse” the networking site that • On some networking sites, CAPTCHA solving might be required in order to perform those actions

What is a CAPTCHA? • CAPTCHA Stands for Completely Automated Public Turing test to tell Computers and Humans Apart. • Is a type of challenge-response system to identify if an app user is a human being. • A CAPTCHA Algorithm generates tests, easily solvable by humans and very hard to solve for a computer app at the same time. • A Good CAPTCHA should be resistant against Optical Character Recognition techniques.

Breaking CAPTCHAs • Used a series of tools for that: • ImageMagick: Image Filtering • Tesseract: OCR text recognition • A number of Python & Perl Scripts to partition CAPTCHAs for automated attacks on various Social Networks • Solving techniques varied between social networks: • XINGused no CAPTCHA, MeinVZ/StudiVZ used CAPTCHA, and Facebook used ReCAPTCHA.

MeinVZ & StudiVZ CAPTCHAs • Both SNs require the user to solve CAPTCHA for new accounts and friend requests. • After some analysis it was found: • Each of them contains exactly 5 letters • Each letter written in a different font, with differing foreground and background colors • Each letter often tilted, scaled or blurred. • A simple grid-based noise is added to the image

Breaking MeinVZ/StudiVZ CAPTCHAs (1/2) • Used a Perl script to remove the grid noise and replace it with white pixels. • A second script attempts to identify image connected areas, then partitions them to identify letters. • If the number of the connected regions is not five(e.g. because of overlapping), we discard the CAPTCHA and ask for a new one ( < 5% of the cases). • All the letters are then scaled to the same size and converted to black and white. • After that, a letter match is attempted against a set of known fonts. • Each font character is tilted from -10 to +10 degrees and compared against the CAPTCHA extracted letter. • If the count of the number of matching pixels between the two patterns is over a dynamically calculated threshold, we have a positive match.

Breaking MeinVZ/StudiVZ CAPTCHAs(2/2) • If there is no match, six letter variations of the unknown letter are generated using ImageMagick’s filters and then the Tesseract engine is run. • If 3 equal results are found, then we consider it a positive match. • If we have a positive match for all patterns, we concatenate the results and submit the answer. • Because of 3 allowed errors on submitted answers, if the CAPTCHA contains letters that can be confusing on letter recognition process, we discard it. • This technique was not able to recognize all letters in 71% of the CAPTCHAs given, but simply the CAPTCHA was discarded and a new one was requested. • On the set of the submitted answers, 88.7% were correct, leading to an 99.8% percentage with the 3 failed attempts limit.

Facebook’sreCAPTCHAs • State-of-the-art approach developed at Carnegie Mellon University • Consists of using words that are not correctly recognized by OCR programs while digitizing books • Because of this, it is more difficult for a computer to recognize. • The CAPTCHA user contributes to the effort to increase the acurancy of the text of the digitized book. • 2 Words displayed at the same time, slightly distorted, with a curved line: one unknown/not OCR recognized and one that a number of users has been able to identify. • If the user finds the recognized word, the answer given on the unknown word might be correct

Breaking Facebook’sreCAPTCHA (1/2) • Word Analysis will be performed • The approach followed in previous SNs is inefficient, as we have to do with real words of varying size • The tool extracts the middle line of each word, and approximates it with a third degree polynomial curve. • After that, each pixel is translated up or down so that the approximating curve becomes a straight line. • Then, a number of images containing the CAPTCHA word will be generated, using ImageMagick filters, and run Tesseract on each one.The text collected is then analyzed by a lexical module • Compare the words with the content of an English dictionary • If failed an edit-distance spell correction algorithm is applied to fix small errors

Breaking Facebook’sreCAPTCHAs (2/2) • If this one fails, the word is then submitted to Google and if the results are above a threshold, the word is considered as correct. • If it fails again, the Google word suggestion is used to extract the word. • If everything fails, CAPTCHA is thrown and a request for a new one is made.

reCAPTCHA behavior • reCAPTCHA is difficult to break on a large scale • 14% of the 2000 attempted CAPTCHAs were recognized. • 26% of submitted words correctly identified at least one of two words. • It probably becomes more resilient to more and more CAPTCHA breaking attempts, as it probably gives 2 known words instead of one if an error limit becomes exceeded. • In 100 attempts for a specific account, the success rate was 4-7% while the percentage of successfully identifying one word was between 20% and 30%. • In a limited number of users though, an attack is still feasible • The attack could become distributed via a botnet. • For example: If each bot had a role of solving 7 CAPTCHAs per day, with a botnet of 10.000 bots, the attacker could send 70000 friend request messages every day

Profile Cloning Attacks • Profile Cloning is about creating an new profile of a victim using his real name and photo inside the same Social Network • An attacker can then send friend requests to the victim’s contacts, impersonating it • User are generally not cautious when accepting friend requests. • The connection level and communication frequency varies • So there are different probabilities of someone getting suspicious from a friend request of an attacker • They also might notice the duplicate profiles and delete the fake later • But an attacker might have enough time to collect the information needed for him • iCloner supports profile cloning on facebook

Cross-site profile cloning • Identify users registered in one Social Network, but not in another • Steal their identities and create accounts for them in the non-registered network • Steal their contacts that have accounts in the new SN • A much more difficultly recognized attack • A legitimate, non-duplicate account is created in the new Social Network • Relevant when forging accounts between SNs of the same nature • iCloner can automatically compare and forge accounts from XING to LinkedIn

Cross-site profile cloning • After stolen identity creation, a search for the identification of the original network contacts in that also have accounts in the target network occurs: • A simple search usually returns many results that need to become limited. • The system looks in more specific information, using a simple scoring system • 2 points if education fiends match • 2 points if companies working are the same • 1 point if the city and the country are identical • If the score is above 3, then the profiles belong to the same user.To face the problem of different information given between social networks, Google search gives the solution • If an applied Google search in both terms returns the same first 3 result hits, then the 2 entries are considered equivalent. • As soon as the contacts of a user are identified, then the system can send friend requests in the new network • Most users will probably accept the friend request without becoming suspicious

Evaluation • Real World experiments took place, with real users: • Crawled two social networks to collect large volumes of contact lists & public user data. • Profile cloning was attempted to 700 distinct users • Cross-site profile cloning attacks on 78 distinct users registered on two different social networks Of course, the whole process was transparent to the “victims”.

Attack Evaluation on StudiVZ/MeinVZ • Created 16 user accounts • Implemented small delays for each page request to keep a low profile and used CAPTCHA tools • Expected: 100k pages per day, retrieving 15000 accounts – with contact lists grouped in groups of 15 contacts, and an average number of 100 contacts per account • 6000 Pages parsed per day, encountered 215 CAPTCHAS to break, collected information from 4000 profiles

Evaluation on XING • On XING, there was no CAPTCHA mechanism, but a much more efficient many requests account blocking mechanism • Retrieved 2000 profiles before being blocked • This is not a problem, as the attacker can constantly create accounts via cloning • Finally, 118k accounts were retrieved before the experiment was stopped.

Profile Cloning Evaluation (1/3) • First experiment: Test on the willingness of users to accept friendship requests from forged profiles of people already on their contact list • iCloner created 5 forged profiles from existing real profiles (D1…D5) and 5 fictitious profiles (F1…F5) and sent contact requests to the contact list of each victim • A total of 705 distinct users contacted • Over 60% acceptance rate for forged profiles (in one case, 90%) • The acceptance rate from unknown users was below 30%, with one exception of 40%

Profile Cloning Evaluation (2/3) • Second Experiment: Test the trust that users would have in messages received from their own contacts. • A simple message was sent via both forged and fictitious accounts to their contacts, counting the delay to click. • In both cases, about 50% of users clicked on it.

Profile Cloning Evaluation (3/3) • About 45% of those users clicked the link in the first 20 hours • This time is enough to cause damage even in a large scale attack.

Cross-Site Profile Cloning Evaluation • Cloned a profile from one to another social network (XING to LinkedIn, in this experiment).Taking into consideration that 12% of XING users had a LinkedIn Account, an attacker could take at most 720k of contacts. • 5 Real XING User accounts were cloned in LinkedIn. • iCloner identified 78 from 443 XING 17.6% accounts had LinkedIn Accounts too. From the 78 friend requests, 44 were accepted (56%).

Discussion • The experiments did not take into consideration the fact of the victims becoming suspicious and having contacted their friends. • 4 users informed the “victims” that something may be wrong • However they did it AFTER they had accepted the friend requests, giving the potential attacker time to access their information • Most of the contacts interacted with fake accounts as if they were the real ones

Suggestions for Improvements in Social Network Site Security • User is the weakest link in SNs • Even advanced users can be tricked • Possible improvements: • Provide more information to the receiver on the authenticity of a request (e.g.country information based on the IP) without posing a privacy threat, as users are willing to share this type of information. • Apply more symbol overlapping on CAPTCHAs to harden the OCR process • Apply overlap on reCAPTCHA solution words • Limit the number of CAPTCHAs displayed on a user, with a threshold of a few images per minute • Social networks should detect user behavior anomalies, such as sending hundreds of friend requests in a row • This will make the simulation of real users economically inviable

Related Work • Sybil Attack • The attacker creates multiple fake identities and pretends to be distinct users in the network, using them to gain influence in the reputation system • SybilGuard and SybilLimit are 2 Sybil Attack defence systems based on the SN fast-mixing attribute. • Sophos [2007] • The authors created a profile on Facebook and manually sent friend requests to 200 random users, having 41% acceptance rate. • Social Phishing [2007] • High degree of trust confirmed in social networks

Summary • Social Networking sites are increasingly gaining popularity and criminals are attracted as well • Presented and evaluated two identity theft attacks, to establish friendship with contacts and therefore obtain their personal information • The simplest one had to do with profile cloning and friend request sending in the same social network the victim has an account • The more advanced one had to do with a cross-site profile cloning, by creating a legitimate, new account on an unregistered SN, and then try to add the user contacts associated to the target SN based on the original SN • Worked on XING, StudiVZ, MeinVZ, Facebook and LinkedIn. • Although Social Networking is useful, raising privacy and security awareness is important

Thank You!

All Your Contacts Are Belong to Us: Automated Identity Theft Attacks on Social Networks