350 likes | 366 Views
Spam 2.0 Workshop on Digital Social Networks. George Petre – glpetre@bitdefender.com Alexandru Cosoi – acosoi@bitdefender.com. Social Networks.
E N D
Spam 2.0 Workshop on Digital Social Networks George Petre – glpetre@bitdefender.com Alexandru Cosoi – acosoi@bitdefender.com
Social Networks A social network is a social structure made of nodes (which are generally individuals or organizations) that are tied by one or more specific types of interdependency, such as values, visions, idea, financial exchange, friends, kinship, dislike, conflict, trade, web links, sexual relations, disease transmission (epidemiology), or airline routes. The resulting structures are often very complex. (Wikipedia)
We will talk about…. • Social networks – an introduction • Actual context and issues debated on this subject • Review of primary types of social networks spam • Explore possibilities….
Current Work • Is Britney Spears Spam? – Aaron Zinman, Judith Donath – Sociable Media Group, MIT Media Lab, CEAS 2007 • A learning Approach to Spam Detection based on Social Networks – Ho-Yu Lam, Dit-Yan Yeung – Department of Computer Science and Engineering, Hong Kong University of science, CEAS 2007 • Social Networks and Aggressive behavior: Peer Support or peer rejection – Robert B. Cairns, Beverley D. Cairns, Holly J. Neckerman, Scott D. Gest, Jean Louis Gariepy, Developmental Psychology, 1988 • Several other scientific and non-scientific (including newspapers and blog posts) in this field
Britney Spears • 2 independent dimensions: sociability and promotion. • SNS spam definition – it depends on the user preferences • Based on the two dimensions, they tried to identify some key profiles • Detection based more on profiles and less on comments
Identified Profiles High Sociability High sociability and low promotion. Such a rating is indicative of normal social-oriented humans. They connect and communicate with their social network on a personal level by posting pictures of themselves with their friends, results of random pop quizzes, and publicly host a suite of personal comments posted by their friends. High sociability and high promotion. Besides, the strong marketing orientation of his actions, this prototype of user also engages in individual interaction with network’s members. This a rational approach sustained by a very powerful determination, most often economic (e.g. small or medium companies which attempt to increase their awareness, MLM members, etc). Low promotion High promotion Low sociability and high promotion. This is typical of a promotional entity using SNS as a marketing opportunity. They only broadcast uniform information to their network, while simultaneously trying to expand its membership as much as possible. Examples include Britney Spears (who does not communicate individually with their members), a Viagra ad and a pornographic webcam. Low sociability and low promotion. This user might be a new member to the site, or might be a low-effort spammer who does not care about posing as something real. Without information to judge, they cannot tackle their classification. Low sociability
They concluded that… • Users can (should) be assisted by an AI engine when they interact with other users • Only users can decide if “Britney Spears” is spam (for them) • Robots (automatic generated profiles) can be tracked computationally • Machine learning techniques • It is quite difficult to classify profiles into legit or dubious • Huuuge grey zone
Rolex Replica (cool for teens) • Very legitimate robot • A looooooooot of friends (3000) • SEO purpose • Friendly comments • Same comment over and over again • The advertised web site has a Google page rank of 4 (!!!!) • Spam websites usually have 0 points page rank VOTE
Viagra ad • YouTube Viagra ad (the cheap stuff!!!!) • Hyperlink flashing in the movie • May be legit, but also it may sell fake Viagra () VOTE
Porn Spam (I) • Many many many keywords • YouTube policy on porn • Using social networks to increase trust and ranking • Not easy to classify -> grey zone? VOTE
Porn Spam (II) • Again, many many keywords • Porn industry profiles (could be spam for some and a lot of fun for others) • If a friend of a friend is a top friend and also a porn star, is it spam for you? VOTE
Porn Spam (III) • Comments advertising porn • Some consider these comments as spam • Direct spam and sometimes SEO VOTE
Porn Spam (IV) • Is this SPAM? • This is NOT a movie • The destination website could contain vulnerabilities, could be phishing, advertising cheap meds, and so on. VOTE
Inch++ comments • Legit profile, with a spam comment from a legit friend. • Same comments over and over again – different “legit” profiles • Copy paste this URL please! VOTE
Obfuscations • hey my frieMnd saw your profitle and thinuks you loMokhodt! she is new to mqyspwace but wants to chcat with you on ms0n mesksenger her name on there is emily21bath@hotmail.com • <br>hey my frie<font point-size="0pt">M</font>nd saw your profi<font point-size="0pt">t</font>le and thin<font point-size="0pt">u</font>ks you lo<font point-size="0pt">M</font>ok ho<font point-size="0pt">d</font>t! she is new to m<font point-size="0pt">q</font>ysp<font point-size="0pt">w</font>ace but wants to ch<font point-size="0pt">c</font>at with you on ms<font point-size="0pt">0</font>n mes<font point-size="0pt">k</font>senger her name on there is emily21bath@hotmail.com • </td> <br> hey my friend saw your profi<font point-size="0pt">T</font>le and thin<font point-size="0pt">S</font>ks you look ho<font point-size="0pt">r</font>t! she is new to mysp<font point-size="0pt">p</font>ace but wants to chat with you on ms<font point-size="0pt">Z</font>n mes<font point-size="0pt">F</font>senger her name on there is emily21bath@hotmail.com </td> VOTE
Image Spam • Might not be spam, BUT when 4 consecutive comments form different legit users advertise this software….. VOTE
Google Redirect • Can this NOT be spam? • <A HREF=http://www.google.com.au/url?q=http://trackme.19.fo%72%75%6D%65%72%2E%63%6F%6D%2F%69%6E%64%65%78%2E%70%68%70> <FONT SIZE=5><FONT COLOR=blue>Click here to get to the website that has the myspace profile tracker </a> <br /><p> VOTE
Phishing • If you want to see my picture, you must log in first…. Right on this page VOTE
Types of spam / SN (I) • 3 types of Social Networks • Social Network type A – targets mainly teenagers • Social Network type B – targets mostly teenagers, but not entirely • Social Network type C – targets any user (no age or sex differentiation) *This classification was made by randomly checking a few (hundreds) profiles on several social networks
Profile Gatherers • Low-Medium promotion • Sociability = just adding new friends • Short description and too much friends. • Botnet? Latent Spammer?
Mitigating profiles • Legit Profile • Legit comments • A lot of friends • Posting on spammy profiles • Direct legit testimonials
How to create a “spammer profile”?(I) • Step I: Google search for “@a_big_free_email_provider” on myspace website … and extract the email addresses returned
How to create a “spammer profile”?(II) Step II: Use your favorite free e-mail provider and import an address book format file
How to create a “spammer profile”?(III) Step III: Use the “import contacts from your email account” for your free email account, enter the captcha and start spamming…
Acceptance • 5 out of 10 “add me” requests are approved on IM • 7 out of 10 “add me” requests are approved in SNS • Usually comments are on a “accept all” basis
Automatic Profile Categorization • A number of quantifiers can be obtained • Machine learning techniques (self organizing) • Provide assistance for the user at friendly profile approval • We propose ART, SOFM, KNN and other clustering techniques
Input Features • Frequency of the invitations (in some SNS) • All features from “Is Britney Spears Spam” paper • Semantic differences or similarities between comments (concepts, hyper concepts – we propose LSA, Bayesian or CNG) • Semantic differences or similarities between profiles
Experimental Data • Bayesian Filter from BitDefender Parental Control Module – trained for EMAIL spam (several semantic categories – the ones you wouldn’t like your kid to see) • As output, the system returns the probability for each category – we used all these values in the clustering algorithm • Not exactly fair, since we are emphasizing only the dirty details. • Many many clusters…. So many that it was really hard to analyze
Clusters • Sparse Clusters • Condensed clusters • Automated generated profiles • Groups with similar interests
Results • We found hundreds of similar machine generated profiles (with different number of friends, and posting comments on each other’s profiles) • We found more than 500 profile gatherers (a few days ago, we could easily search for profiles with a range of 300 000 – 500 000 friends. This search option is not allowed anymore) • Mitigating profiles are the most hard to find, but we managed to analyze a few
Social Networks Ranking • Cluster analysis • Number of Profile gatherers • Number of users • Number of spammy comments / randomly chosen profiles • Weighted average with the presented indicators
Accept Invitation Assistance • This profile is interested of the following concepts • This profile is spammed • This profile has spammy posts • This user was found in the following clusters – might be a (profile gatherer, mitigating profile, marketing profile…..) • Client based
Conclusions • We also agree that this is a highly difficult task • In most of the cases, it is impossible to say for sure that it is a spammy profile – depends on the user’s preferences. • SNS’s are a good starting point for email spam – thousands of email addreses
Conclusions (II) • …….