550 likes | 718 Views
Text Analytics for Mobile App Security and Beyond. Tao Xie University of Illinois at Urbana-Champaign. taoxie@illinois.edu. Mobile App Markets. Google Play. Microsoft Windows Phone. Apple App Store. App Store beyond Mobile Apps!. What If Formal Specs Are Written?!.
E N D
Text Analytics for Mobile App Security and Beyond Tao Xie University of Illinois at Urbana-Champaign taoxie@illinois.edu
Mobile App Markets Google Play MicrosoftWindows Phone Apple App Store
What If Formal Specs Are Written?! permission list, etc. informal: app description, etc. APP DEVELOPERS App Security Requirements App Functional Requirements User Functional Requirements User Security Requirements APP USERS
Informal App Functional Requirements: App Description App Code App Permissions
What If Formal Specs Are Written?! permission list, etc. informal: app description, etc. APP DEVELOPERS App Security Requirements App Functional Requirements User Functional Requirements User Security Requirements APP USERS
What If Formal Specs Are Written?! permission list, etc. informal: app description, etc. APP DEVELOPERS App Security Requirements App Functional Requirements User Functional Requirements User Security Requirements APP USERS In reality, few of these requirements are (formally) specified!! Hope?!: Bring human into the loop: user perception + judgment
Our Yin-Yang View on Mobile App Security • Reason about user-perceived info, e.g., WHYPER ( ) • Push app security behavior across the boundary () • Check consistency across the boundary () • Reduce user judgment effort ( ) User-Perceived Information App Security Behavior App Description App UIs, App categories, App metadata, User forums, … [functional] App Code App Permissions [security]
Assuring Market Security/Privacy • Apple(Market’sResponsibility) • Apple performs manual inspection • Google(User’s Responsibility) • Users approve permissions for security/privacy • Bouncer (static/dynamic malware analysis) • Windows Phone (Hybrid) • Permissions / manual inspection
Need More Than Program Analysis • Previous approaches look at permissions code (runtime behaviors) • What does the users expect? • GPS Tracker: record and send location • Phone-Call Recorder: record audio during phone call App Description App Code App Permissions
Vision “Bridging the gap between user expectation app behaviors” • User expectations • user perception + user judgment • Focus on permission app descriptions • permissions (protecting user understandable resources) should be discussed App Description Sentence Permission Linkage
WHYPER Overview • Enhance user experience while installing apps • Enforce functionality disclosure on developers • Complement program analysis to ensure justifications DEVELOPERS USERS Pandita et al. WHYPER: Towards Automating Risk Assessment of Mobile Applications. USENIX Security 2013 http://web.engr.illinois.edu/~taoxie/publications/usenixsec13-whyper.pdf
Example Sentence in App Desc. • E.g., “Also you can share the yoga exercise toyour friends via Email and SMS.” • Implication of using the contact permission • Permission sentences Keyword-based search on application descriptions
Problems with Ctrl + F • Confounding effects: • Certain keywords such as “contact” have a confounding meaning • E.g., “... displays user contacts, ...” vs “... contact me at abc@xyz.com”. • Semantic inference: • Sentences often describe a sensitive operation without actually referring to keywords • E.g., “share yoga exercises with your friends via Email and SMS”
Natural Language Processing • Natural Language Processing (NLP) techniques help computers understand NL artifacts • In general, NLP is still difficult • NLP on domain specific sentences with specific styles is feasible • Text2Policy: extraction of security policies from use cases [FSE 12] • APIInfer: inferring contracts from API docs [ICSE 12] • WHYPER: domain knowledge from API docs [USENIX Security 13]
Overview of WHYPER NLP Parser WHYPER APP Description Preprocessor Intermediate Representation Generator FOL Representation APP Permission Semantic Graphs Semantic Engine API Docs Semantic Graph Generator Annotated Description Domain Knowledge
Preprocessor • Period Handling • Decimals, ellipsis, shorthand notations (Mr., Dr.) • Sentence Boundaries • Tabs, bullet points, delimiters (:) • Symbols (*,-) and enumeration sentence • Named Entity Handling • E.g., “Pandora internet radio” • Abbreviation Handling • E.g., “Instant Message (IM)”
Intermediate-Representation Generator Also you can share the yoga exercise to your friends via Email and SMS RB PRP MD VB DT NN NN PRP NNS NNP NNP Predicate Governing Entity share to share Also advmod you you nsubj can yoga exercise aux exercise owned dobj the you det yoga via nn friends friends prep_to and your DependentEntity poss Email Email prep_via SMS SMS conj_and
Semantic Engine Inferred from API Docs Governing Entity to share share you yoga exercise owned you WordNet Similarity via friends and Email Email SMS DependentEntity
Semantic-Graph Generator Systematic approach to infer graphs • Identify resource associated with the permissions from the API class name • ContactsContract.Contacts • Inspect the member variables and member methods to identify actions and subordinate resources • ContactsContract.CommonDataKinds.Email
Evaluation • Subjects • Permissions: • READ_CONTACTS • READ_CALENDAR • RECORD_AUDIO • 581 application descriptions • 9,953sentences • Evaluation setup • Manual annotation of the sentences • WHYPER for identifying permission sentences • Comparison to keyword-based searching
Evaluation Results • Precision and recall of WHYPER • Average precision (82.8%) and recall (81.5%) • Comparison to keyword-based searching • Improving precision (41.6%) and recall (-1.2%) • E.g., microphone-blow intoand call-record
Access Control Policies (ACP) in Requirements Document • Access control is often governed by security policies called Access Control Policies (ACP) • Includes rules to control which principals have access to which resources • A policy rule includes four elements • Subject – HCP • Action – edit • Resource - patient's account • Effect - deny ex. “The Health Care Personnel (HCP)does not have the ability to edit the patient's account.”
Overview of Text2Policy ACP Rule Linguistic Analysis A HCP should not change patient’s account. An [subject: HCP] should not [action: change] [resource: patient’s account]. Subject Action Resource Effect Model-Instance Construction patient’s account UPDATE - change HCP deny Transformation Xiao et al. Automated Extraction of Security Policies from Natural-Language Software Documents. FSE 2012. http://web.engr.illinois.edu/~taoxie/publications/fse12-nlp.pdf
Example Technical Challenges in ACP Extraction • Semantic Structure Variance • different ways to specify the same rule • Negative Meaning Implicitness • verb could have negative meaning ACP 1: An HCP cannot change patient’s account. ACP2: An HCP is disallowed to change patient’s account.
Road Ahead: Yin-Yang View • Reason about user-perceived info, e.g., WHYPER ( ) • Push app security behavior across the boundary () • Check consistency across the boundary () • Reduce user judgment effort ( ) User-Perceived Information App Security Behavior App Description App UIs, App categories, App metadata, User forums, … [functional] App Code App Permissions [security]
Text Analytics for Mobile App Security and Beyond taoxie@illinois.edu App Description App UIs, App categories, App metadata, User forums, … App Code App Permissions Acknowledgments: Supported in part by NSA Science of Security (SoS) Lablet, NSF SaTC, NSF SHF, NSF CAREER
Problems with Ctrl + F • Confounding effects: • Certain keywords such as “contact” have a confounding meaning. • For instance, “... displays user contacts, ...” vs“... contact me at abc@xyz.com”. • Semantic Inference: • Sentences often describe a sensitive operation such as reading contacts without actually referring to keyword “contact”. • For instance, “share yoga exercises with your friends via email, sms”.
Natural Language Processing (NLP) • NLP techniques help computers understand NL artifacts • NLP is still difficult • NLP on domain specific sentences with specific styles is feasible
RQ1 Results: Effectiveness of WHYPER • Low FPs and FNs • out of 9,061sentences, only 129are flagged as FPs • among 581applications, 109applications (18.8%) contain at least one FP • among 581applications, 86applications (14.8%) contain at least one FN
Result Analysis (False Positives) • Incorrect parsing • “MyLink Advanced provides full synchronization of all Microsoft Outlook emails (inbox, sent, outbox and drafts), contacts, calendar, tasks and notes with all Android phones via USB” • Synonym analysis • “You can now turn recordings into ringtones.”
Result Analysis (False Negatives) • Incorrect parsing • Incorrect identification of sentence boundaries and limitations of underlying NLP infrastructure • Limitations of Semantic Graphs • Manual Augmentation • microphone-blow into and call-record • significant improvement of Delta Recalls: -6.6% to 0.6% • Automatic mining from user comments and forums
Overview of Text2Policy ACP Rule Linguistic Analysis A HCP should not change patient’s account. An [subject: HCP] should not [action: change] [resource: patient’s account]. Subject Action Resource Effect Model-Instance Construction patient’s account UPDATE - change HCP deny Transformation
Linguistic Analysis • Incorporate syntactic and semantic analysis • syntactic structure -> noun group, verb group, etc. • semanticmeaning -> subject, action, resource, negative meaning, etc. • Provide New techniques for model extraction • Identify ACP and AS sentences • Infer semantic meaning
Common Techniques • Shallow parsing • Domain dictionary • Anaphora resolution NP VG PNP An HCP can view patient’s account. He is disallowed to change the patient’s account. UPDATE HCP Subject Main Verb Group Object
Technical Challenges (TC) in ACP Extraction • TC1: Semantic Structure Variance • different ways to specify the same rule • TC2: Negative Meaning Implicitness • verb could have negative meaning ACP 1: An HCP cannot change patient’s account. ACP2: An HCP is disallowed to change patient’s account.
Semantic-Pattern Matching • Address TC1 Semantic Structure Variance • Compose pattern based on grammatical function ex. An HCP is disallowed to change the patient’s account. passive voice to-infinitive phrase followed by
Negative-Expression Identification • Address TC2 Negative Meaning Implicitness • Negative expression • “not” in subject: • “not” in verb group: • Negative meaning words in main verb group ex. No HCP can edit patient’s account. ex. HCP can not edit patient’s account. HCP can never edit patient’s account. ex. An HCP is disallowed to change the patient’s account.
AS: Syntactic-Pattern Matching • Syntactic elements • Subject , Main verb, Object • Subject and Object Checking • subject is a not a user or object is not a resource • Filtering negative-meaning sentences • Negative sentences tend not to describe ASs ex. The prescription list should include medication, the name of the doctor. . .
Overview of Text2Policy ACP Rule Linguistic Analysis A HCP should not change patient’s account. An [subject: HCP] should not [action: change] [resource: patient’s account]. Subject Action Resource Effect Model-Instance Construction patient’s account UPDATE - change HCP deny Transformation
ACP Model-Instance Construction • Identify subject, action, and resource: • Subject: HCP • Action: change • Resource: patient’s account • Infer effect: • Negative Expression: none • Negative Verb: disallow • Inferred Effect: deny ex. ACP Rule An HCP is disallowed to change the patient’s account. Subject Action Resource Effect patient’s account UPDATE - change HCP deny
AS Model-Instance Construction • Use case patterns • industry use cases [DSN’09] • public use cases • Model-Instance Construction ex. The patientviewsaccess log. Action Step Actor Action Resource access log OUTPUT – view patient
Technical Challenges in Action-Step Extraction • TC4: Transitive Subject • TC5: Perspective Variance AS 1:He edits the account. AS 2: The system updates the account. AS 3: The system displays the updated account. HCP HCP views the updated account.
Subject Flow Tracking • Address TC4 Transitive Subject • Apply data flow to track non-system subject: AS 1: The HCP edits the account. AS 2: The system updates the account. Tracking Only system as subject replaced with HCP as subject
Perspective Conversion • Address TC5 Perspective Variance • Apply data flow to track non-system subject: AS 1: The HCP edits the account. AS 2: The system shows the updated account. Tracking Only system as subject and action is output Converting to “HCP views the updated account”
Evaluation – RQs • RQ1: How effectively does Text2Policy identify ACP sentences in NL documents? • RQ2: How effectively does Text2Policy extract ACP rules from ACP sentences? • RQ3: How effectively does Text2Policy extract action steps from action-step sentences?
Evaluation – Subject • iTrust open source project • http://agile.csc.ncsu.edu/iTrust/wiki/ • 448 use-case sentences (37 use cases) • preprocessed use cases • Collected ACP sentences • 100 ACP sentences • From 17 sources (published papers and websites) • A module of an IBMApp (financial domain) • 25 use cases