170 likes | 322 Views
SPAM DETECTION IN P2P SYSTEMS. Team Matrix Abhishek Ghag Darshan Kapadia Pratik Singh. AGENDA. OVERVIEW OF THE TOPIC ACCOMPLISHMENTS SOFTWARE DESIGN TEST RESULTS DEMO FUTURE SCOPE WHAT WE LEARNED. OVERVIEW.
E N D
SPAM DETECTION IN P2P SYSTEMS Team Matrix Abhishek Ghag Darshan Kapadia Pratik Singh
AGENDA OVERVIEW OF THE TOPIC ACCOMPLISHMENTS SOFTWARE DESIGN TEST RESULTS DEMO FUTURE SCOPE WHAT WE LEARNED
OVERVIEW We are trying to implement a P2P file sharing System that gives results to a particular query which are not spam. The top results returned to a query by our P2P file sharing system are actual files the user is looking for, and not spam. We discussed the Feature Based Spam Detection Algorithm as a solution to this problem.
Classification of Spam Files whose replicas have semantically different descriptors. The Spammer might name a file after a currently popular song or might give multiple names to the same file. Eg: different song titles for a same key 26NZUBS655CC66COLKMWHUVJGUXRPVUF: “12 days after christmas.mp3” “i want you thalia.mp3” “come on be my girl.mp3” …
Classification of Spam Files with long descriptors In this a Spammer inserts a single long descriptor for the file. E.g., a single replica descriptor for key 1200473A4BB17724194C5B9C271F3DC4: “Aerosmith, Van Halen, Quiet Riot, Kiss, Poison, Acdc, Accept, Def Leappard, Boney M, Megadeth, Metallica, Offspring, Beastie Boys, Run Dmc, Buckcherry, Salty Dog Remix.mp3”
Classification of Spam Files that are highly replicated on a single peer. Normal users do not create multiple replicas of the same file on a single machine. This is aimed at manipulating the group size. E.g..177 replicas of the file DY2QXX3MYW75SRCWSSUG6GY3FS7N7YC shared on a single peer.
ACCOMPLISHMENTS • Analyzed the problems of P2P Systems regarding file sharing. • Implemented our own P2P System successfully. • Implemented the Feature based Spam Detection Algorithm. • Top results returned in response to a query are not spam.
Contd… • The Design of our software contains two classes. • The Main (P2P Client) Class. • The Server Class. • Whenever a client logs into the system, it gives the list of the files it wants to share to the server. The server class maintains this list with the help of a database. • The Feature Based Spam Detection Algorithm is implemented in the P2P Client Class.
Algorithm for Spam Detection 5a. Results are ranked by Group size. 5b. Identify the top-M results as candidate results. 5c. Re-rank the top-M results by NumUniqueTerms. The results that are low in the order are more likely to be Type 1 spam than those higher up. 5d. Re-rank the top-N results by their per-host file replication degree. The results that are low in the order are more likely to be Type 4 spam than those higher up.
FUTURE SCOPE • Probe Queries to enhance Spam Detection.
What We Learned • Peer to Peer File Sharing system • Feature Based Spam Detection Algorithm
Papers. Author – Dongmei Jia Title – Cost Effective Spam Detection Techniques in P2P File Sharing Systems. Conference -- Proceeding of the 2008 ACM workshop on Large scale Distributed Systems for information retrieval. Date -- October 2008. Publisher -- ACM. URL -- http://portal.acm.org.ezproxy.rit.edu/results.cfm?coll=portal&dl=ACM&CFID=14901064&CFTOKEN=96029385 References
References Author – Dongmei Jia, Wai Gen Yee, Ophir Frieder Title – Spam Characterization and Detection in Peer to Peer File Sharing Systems. Conference -- Proceeding of the 17th ACM conference on Information and knowledge mining Date -- October 2008. Publisher -- ACM. URL -- http://portal.acm.org.ezproxy.rit.edu/citation.cfm?id=1458082.1458128&coll=portal&dl=ACM&CFID=14901064&CFTOKEN=96029385
References Author – Jia Liang, Rakesh Kumar, Yongjian Xi, Keith W Ross Title – Pollution in P2P File Sharing Systems. Conference -- INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE Date -- March 2005. Publisher -- ACM. URL -- http://ieeexplore.ieee.org.ezproxy.rit.edu/stamp/stamp.jsp?arnumber=1498344&isnumber=32100