140 likes | 289 Views
Crawling Chinese Android Markets. Xiang Pan Biyan Zhou George Liu. Over view. Background & Purpose Goals Work Division Proposed Algorithm & Examples Problem & Solution Precaution Measures Results & Future Improvements. NOT IN CHINA. It’s FREEEEE BUT. DANGEROUS.
E N D
Crawling Chinese Android Markets Xiang Pan Biyan Zhou George Liu
Overview • Background &Purpose • Goals • Work Division • Proposed Algorithm &Examples • Problem &Solution • Precaution Measures • Results &Future Improvements
It’s FREEEEE BUT DANGEROUS
Background &Purpose • Existing malicious activates • Ex: NickiBot(Spyware) • Runs in background forever, difficult to detect • Can record phone call, monitor phone logs and SMS, detect location and send information to remote server • Purpose of our project • Collect a sizable amount of android applications from less popular Chinese markets for analysis
Goals • Create a robust crawler that can be tailored for different markets with minimal effort • Analyze at least 5 markets to collect suspicious applications • Exanimate the precaution measures of these markets
Proposed Algorithm • Manually inspect each market for overall data structure • Meta data HTML • Downloading URL (redirection via JScript) • Select appropriate unique application attribute (id, names… etc) • Correctly parse meta data using regular expressions • Store meta data and the application in a user specified location
Problem &Solution • Different HTML structures for meta data of applications in the same market • Only capture one set of data (the most frequently used one) • Slow download speed • Utilize multithread download technique, split a single application to multiple parts • Wrong Application ID results in termination of downloading • Using try catch structure when a specified file doesn’t exist
Results &Future Improvements • Created a robust and easy to use crawler • Collected over 70 GB (~30,000) of suspicious applications • Exanimated 10 different markets for precaution measures • Create simple GUI to improve usability • Automatic authentication • Circumvent market’s cap for daily traffic on a given IP • Maintain a Database for theseapplication