1 / 14

On the Automatic Categorisation of Android Applications

On the Automatic Categorisation of Android Applications. Borja Sanz , Igor Santos, Carlos Laorden , Xabier Ugarte-Pedrero and Pablo Garcia Bringas The 9th Annual IEEE Consumer Communications and Networking Conference - Security and Content Protection, 2012 報告人 : 張文銓. OUT LINE.

hank
Download Presentation

On the Automatic Categorisation of Android Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Automatic Categorisationof AndroidApplications BorjaSanz, Igor Santos, Carlos Laorden, XabierUgarte-Pedrero and Pablo Garcia Bringas The 9th Annual IEEE Consumer Communications and Networking Conference - Security and Content Protection, 2012 報告人: 張文銓

  2. OUT LINE • INTRODUCTION • APPS FEATURE EXTRACTION • FEATURES EXTRACTION METHOD • MACHINE LEARNING CLASSIFIERS • CONCLUSIONS

  3. INTRODUCTION • For Apple devicesthe AppStore is the single official way to obtain applications,Android allows users to installapplications that have beendownloaded from markets or directly fromInternet. • Shabtai et al. trained machinelearning models using as features the count of elements,attributes or namespaces of the parsed apk. • They obtained89% of accuracy classifying applications into only 2 categories:tools or games.

  4. APPS FEATURE EXTRACTION • Classifying Android applications into several categoriesusing thefeatures extracted both from the Android Market and theapplication itself.

  5. APPS FEATURE EXTRACTION • we have collected 820 applications, that have been classified in 7 categories.

  6. APPS FEATURE EXTRACTION • Phase: • 1.Wedescribe the process of extracting features from the Android.apk files. • 2. Show that it can achive high accuracy rates.

  7. Extracting Features From Android.apk • Retrieve several features from the applications: • 1.Strings contained in the application. • 2.Use an open-source non-official API, called android-market-api extracted infomation from the Android Market: • (1) rating, (2) number of ratings and (3) size of application. • 3. Permissions of the applications .

  8. Extracting Features From Android.apk • Permissions are stored in an XML file inside each application, named “AndroidManifest.xml”. • Thisfile declares the execution requirements of the application, such as the version of the operating system that requires or the libraries used.

  9. FEATURES EXTRACTION METHOD • General steps we have followed for each application are: • 1.We extract the permissions and the resources from theapplication. • 2.We disassemble the sample. • 3.We extract the strings from the disassembled sample. • 4.We obtain data from the Android Market.

  10. FEATURES EXTRACTION METHOD • To extract every string, we search the operational code “const-string”, that identifies the strings of the application. • We process the strings using Term Frequency (TF). TF is a weight widelyused in information retrieval and text mining

  11. MACHINE LEARNING CLASSIFIERS • Machine-learning algorithms can commonly be divided into three different types depending on the training data: • supervised learning (監督式學習): • Bayesian Networks(貝氏網路) • Decision Trees(決策樹) • K-Nearest Neighbour(KNN) • Support Vector Machines (SVM)(支持向量機) • unsupervised learning (無監督學習): • 關聯規則分析 • semi-supervised learning(半監督式學習)

  12. MACHINE LEARNING CLASSIFIERS • Bayesian Networks: which are based on the Bayes Theorem. • Algorithm: Tree Augmented Na¨ıve (TAN) [28] • [28] D. Geiger, M. Goldszmidt, G. Provan, P. Langley, and P. Smyth, “Bayesian network classifiers,” in Machine Learning, 1997, pp. 131–163. • Decision Trees • Random Forest [19] • [19] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp.5–32, 2001. • K-Nearest Neighbour • performed experiments for k = 1, k = 2 and k = 5 to train KNN. • Support Vector Machines (SVM)

  13. MACHINE LEARNING CLASSIFIERS • To evaluate each classifier’s capability, we measured the Area Under the ROC (Receiver Operator Characteristics)Curve (AUC), [31]. • [31] Y. Singh, A. Kaur, and R. Malhotra, “Comparative analysis of regression and machine learning methods for predicting fault proneness models.” • Best:Bayes TAN 0.93. • Second:Random Forest 0.9.

  14. CONCLUSIONS • 1.There are other features from the applications that could be used to improve the detection. • 2. Despite these features are inefficient toavoid malware to be uploaded into market, these features can prevent installation of malware in the smartphone. • 3.It will detection good apps and bad apps if the sample enough.

More Related