250 likes | 284 Views
This research presents a new method for automatically detecting gambling sites by analyzing POST behaviors. Results show high precision and recall rates, with the implementation of graph analysis further enhancing performance. The study addresses the challenges of identifying online gambling platforms effectively.
E N D
Detecting Gambling Sites From Post Behaviors ShensiTong,HanlongZhang,BeijunShen,HaoZhong ShanghaiJiaoTongUniversity YongjianWang,BoJin TheThird Research Institute of Ministry of Public Security PresentedByShensiTong 2016.5.16
Outline • Introduction • Approach • Evaluation • Optimization • Conclusions
Introduction • DetectingGamblingSitesisimportant • Internetgamblingisevenmoreaddictivethantraditionalgambling,whichisharmful • MostcountriesexplicitlyprohibitInternetgamblingorunderstrictlysupervision • DetectingGamblingSitesischallenging • Tothebestofourknowledge, nopreviousworkwasproposedtodetectgamblingsitesautomatically • Thereisnoconsensusofwhichisbestfeaturetodetectgamblingsites
Introduction • MajorContributions • The first approach that mines behavior models for gamblingsites and detects previously unknown gambling siteswith mined models • A tool and two evaluations on 1TB dataset. The resultsshow that our tool detects gambling sites effectively. POST behavior of a website is the best feature todetermine whether it is a gambling site or not • An addition evaluation on applying graph analysis toimprove our approach. The results are valuable to furtheroptimize our approach
Outline • Introduction • Approach • Evaluation • Optimization • Conclusions
Approach • PreprocessingHTTPPOSTs • Typically,aPOSTrequestmessageconsistsofthefollowingparts • Requestline • “POST/a/.../script?K1 =V1 &...&Kn =Vn HTTP/1.1” • Cookieinrequestheader • “JSESSIONID=064185D5B6; NETEASE SSN=shanghai” • Requestbody • “subject=Test&message=test&formhash=bbb14e19&usesig =1&posttime=138672”. Hashpost = MD5( Script& Keys( RequestLine)& Keys( RequestBody))
Approach • ClusteringSites • Filtering • Inthispaper,wesetα1to5 • ComputestheJaccardcoefficientbetweentwowebsites • Weputtwowebsitesintothesameclusterifandonlyiftheirsimilarityvalueishigherthanapredefinedthresholdβ1
Approach • MiningBehaviorModels • Pickoutgamblingsiteclustersmanually • Minesabehaviormodelforcluster • POSTTF-IDF • Sortinadescendingorderandselecttopα3 as the model
Approach • DetectingPreviousUnknownGamblingSites • Calculatethesimilarityvaluebetweenunknownsitesandminedmodel • Ifthevalueishigherthanthresholdβ2, wesetittogamblingsites • Ifsomesitesnotfollowanyminedmodel,were-runourapproachtotrainanewmodel
Outline • Introduction • Approach • Evaluation • Optimization • Conclusions
Evaluations • Datasets • 4,000,000,000HTTPPOSTs • 750,000sites • 1TB • ErrorMeasures
Conclusion • Features • URL • Consistsoflexicalandhostinformation • HTML • ExtractsfromHTMLtagsthatappearinHTMLcodeofWebpages • Semantic • CapturestextualinformationthatisvisibleonWebpages
Outline • Introduction • Approach • Evaluation • Optimization • Conclusions
Optimization • GraphAnalysisFeatures • Degree • Numberofitsneighbors • Similarity • Similaritybetweentwowebsites • HashCount • UniqueHashPOSTforawebsite • Utmcsr • Sourcewebsitetoenterthiswebsite • Utmctr • Keywordsthatenterinsearchengine • Utmv • Usedtoidentifyasitefortrafficstatistics
Optimization • Observation1 • Likeattractslike
Optimization • Observation2 • Concentration
Optimization • Observation3 • Anomaly
Optimization • OptimizationResults • Matchingvaluesincookies • Ifsomekeywordsappearsinutmctr, thesiteislikelytobeagamblingsites • Filteringoutliersfromsites • DetermineawebsitewhetherbelongtoaclusteraccordingtoitsHashCount • FilteringlargePOSTsites • Filteringoutliersfromclusters
Outline • Introduction • Approach • Evaluation • Optimization • Conclusions
Conclusion • We propose a novel approach that detects gambling sites based on POST behavior • We evaluate our approach on large corpus, and our results show that our approach achieves both high precision and recall • We apply graph analysis to improve performance and recall