140 likes | 240 Views
On the Automatic Categorisation of Android Applications. Borja Sanz , Igor Santos, Carlos Laorden , Xabier Ugarte-Pedrero and Pablo Garcia Bringas The 9th Annual IEEE Consumer Communications and Networking Conference - Security and Content Protection, 2012 報告人 : 張文銓. OUT LINE.
E N D
On the Automatic Categorisationof AndroidApplications BorjaSanz, Igor Santos, Carlos Laorden, XabierUgarte-Pedrero and Pablo Garcia Bringas The 9th Annual IEEE Consumer Communications and Networking Conference - Security and Content Protection, 2012 報告人: 張文銓
OUT LINE • INTRODUCTION • APPS FEATURE EXTRACTION • FEATURES EXTRACTION METHOD • MACHINE LEARNING CLASSIFIERS • CONCLUSIONS
INTRODUCTION • For Apple devicesthe AppStore is the single official way to obtain applications,Android allows users to installapplications that have beendownloaded from markets or directly fromInternet. • Shabtai et al. trained machinelearning models using as features the count of elements,attributes or namespaces of the parsed apk. • They obtained89% of accuracy classifying applications into only 2 categories:tools or games.
APPS FEATURE EXTRACTION • Classifying Android applications into several categoriesusing thefeatures extracted both from the Android Market and theapplication itself.
APPS FEATURE EXTRACTION • we have collected 820 applications, that have been classified in 7 categories.
APPS FEATURE EXTRACTION • Phase: • 1.Wedescribe the process of extracting features from the Android.apk files. • 2. Show that it can achive high accuracy rates.
Extracting Features From Android.apk • Retrieve several features from the applications: • 1.Strings contained in the application. • 2.Use an open-source non-official API, called android-market-api extracted infomation from the Android Market: • (1) rating, (2) number of ratings and (3) size of application. • 3. Permissions of the applications .
Extracting Features From Android.apk • Permissions are stored in an XML file inside each application, named “AndroidManifest.xml”. • Thisfile declares the execution requirements of the application, such as the version of the operating system that requires or the libraries used.
FEATURES EXTRACTION METHOD • General steps we have followed for each application are: • 1.We extract the permissions and the resources from theapplication. • 2.We disassemble the sample. • 3.We extract the strings from the disassembled sample. • 4.We obtain data from the Android Market.
FEATURES EXTRACTION METHOD • To extract every string, we search the operational code “const-string”, that identifies the strings of the application. • We process the strings using Term Frequency (TF). TF is a weight widelyused in information retrieval and text mining
MACHINE LEARNING CLASSIFIERS • Machine-learning algorithms can commonly be divided into three different types depending on the training data: • supervised learning (監督式學習): • Bayesian Networks(貝氏網路) • Decision Trees(決策樹) • K-Nearest Neighbour(KNN) • Support Vector Machines (SVM)(支持向量機) • unsupervised learning (無監督學習): • 關聯規則分析 • semi-supervised learning(半監督式學習)
MACHINE LEARNING CLASSIFIERS • Bayesian Networks: which are based on the Bayes Theorem. • Algorithm: Tree Augmented Na¨ıve (TAN) [28] • [28] D. Geiger, M. Goldszmidt, G. Provan, P. Langley, and P. Smyth, “Bayesian network classifiers,” in Machine Learning, 1997, pp. 131–163. • Decision Trees • Random Forest [19] • [19] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp.5–32, 2001. • K-Nearest Neighbour • performed experiments for k = 1, k = 2 and k = 5 to train KNN. • Support Vector Machines (SVM)
MACHINE LEARNING CLASSIFIERS • To evaluate each classifier’s capability, we measured the Area Under the ROC (Receiver Operator Characteristics)Curve (AUC), [31]. • [31] Y. Singh, A. Kaur, and R. Malhotra, “Comparative analysis of regression and machine learning methods for predicting fault proneness models.” • Best:Bayes TAN 0.93. • Second:Random Forest 0.9.
CONCLUSIONS • 1.There are other features from the applications that could be used to improve the detection. • 2. Despite these features are inefficient toavoid malware to be uploaded into market, these features can prevent installation of malware in the smartphone. • 3.It will detection good apps and bad apps if the sample enough.