410 likes | 582 Views
Use of AI algorithms in design of Web Application Security Testing Framework. HITCON Taipei 2006. Or a “non-monkey” approach to hacking web applications. By fyodor and meder fygrave@o0o.nu meder@o0o.nu. “No. we are not writing another web scanner!!”. Agenda. Why hacking web applications
E N D
Use of AI algorithms in design of Web Application Security Testing Framework HITCON Taipei 2006
Or a “non-monkey” approach to hacking web applications By fyodor and meder fygrave@o0o.numeder@o0o.nu
Agenda • Why hacking web applications • What scanners do. Why they are useless (or not) • What else could be done, but isn’t (yet) • Introduction to YAWATT • User-session based approach • Distributed • Intelligent (or not?) • Modular • More than “application security scanner” coverage
This work background • STIF, STIF2 automation – agent-based cooperative automated hacking environment http://o0o.nu/sec/STIF
So, why going for the web • They learnt to configure their firewalls • They learnt to disable services they don’t want • They finally know how to use nmap (and even nessus!!) …. • But they still want web • And they can’t learn to code
So why web applications • Applications get complex • Multilayered frameworks make it even more fun • Amount of web application based services grow • Number of web application programmers increase (home brewed web applications) but …
Web application remains a larger hole into one’s network • Web application programmers skills aren’t usually the best • Firewalls are there – just to let you in • application firewalls can stop limited number of web application attacks, but are useless when it comes to detection of logical vulnerabilities • IDS systems aren’t smart enough to pick up on Application attacks
Scanners.. use of.. • checking for enumeration ... YES • checking for exectution ... YES • checking if we can drop table YES • checking if we can drop database .. YES .. • CANNOT CONNECT TO APPLICATION
Scanners - summary • Nessus et all – don’t see web applications beyond the underlying software configuration • Libwhisker/nikto – signature based. Relatively primitive. Efficient for default bugs • Wikto/e-Or – session aware, coding flaws scanner • Kavado/Appscan/Webinspect/N-Stalker/Watchfire Appscan – intelligent scanners. Session aware. Closed “blackbox” (some allow scripted plugins)
Why scanners ain’t enough • Single-host based • Commercial scanners are black-box (not extendable, non-correctable) • Little or no control on “hacking” process • Not easily extendable on the fly with new ‘automation” modules • Often primitive, signature based logic
What would we like to have • Maximum automation of web hacking process • Minimum of code writing. • Autonomous functionality • Knowledge transfer • Ability to add ‘hacks’ on the fly • Deal with uncertainty in “intelligent way” • Learn from valid user session data
Other good things to have • Be able to test new class of bugs (i.e. session hijacking) • Be able to attack web application from multiple-locations (bypass IP restrictions, improve brute-forcing process) • Be able to automate testing of application logic bugs • Be able to make intelligent guesses
User sessions • User sessions – collections of user request/response pairs (url, name/value pairs, session information and selective HTTP protocol data) • Classified user session data include semantic classification of URL, parameters, responses and HTTP protocol data (server type, backend system(s) if visible, “unusual” HTTP headers content)
User sessions • User session data can be obtained from: • Proxy servers (burp, paros, ..) • Web server logs • Browser automation scripts (i.e. WATIR framework) • Spiders (burp)
Less code, more automation • Application content is learnt from user sessions (data feeders) • Additional application information could be gathered by agent’s plugins (i.e. directory splitting tests) • User session data is classified by: • Semantic and functional classification of URL • HTTP protocol classificators (server type, cookies ..) • Session classificators • Input data classification – type, semantics • Output classification (application error detection, redirects, “bogus’ responses etc) • Test-case suites and executed in groups • Stateless tests • Stateful tests • Mixed
Go Intelligent Main components: • Web application components (URL) classification • Semantic classification for web application input data • LSI based mapping and comparison of web content In response analysers. • Use of external search engines • Limited “binary analysis” of downloaded files (decoding pdf, doc, rtf (other formats later)
Knowledge Transfer to machine • Possibility to create new classification rules on the fly (and let the system re-learn from it) • Possibility to ‘reclassify’ application responses • Possibility to add new ‘testing’ plugins on the fly
How is URL classification used • Vulnerability scenario testing – uses ‘classificators’ subscribtion mechanism. • For example: login page tester will need ‘login’, ‘executable’ and ‘session’
Additional research directions • Other ideas to work on: • Detection of “hidden” parameters • Identification of “hidden” urls • Identification of “negative” and ‘positive” responses • Detection of application failures, redirects • Evaluation and priority based execution for plugins
A note on distributed architecture • Cooperative Agents Infrastructure • Design cooperative agent system • Multi-platform • Portable
What distributed approach gives us: • DDoS – EASY!!! • Distributed brute-forcing. Bypassing IP based restrictions, bandwidth limitations • IDS – more tricks • Bypass packet filtering restrictions an agent behind the firewall!
Communication framework • Modified version of spread • Robust • Reliable message delivery • Portable (windows/unix) • Available in C/C++ and Java flavours. Bindings exist for Python, Ruby!
In progress • Agents communicate with message • Task distribution algorithms – in progress
More on intelligence • Aside from application vulnerabilities, other things of interest are: • Email addresses, user ids that could be seen within web content • Domain names (within web pages, comments, binary files, etc) • Building ‘target-oriented’ dictionary files (used by brute-force cracking modules)
Other good things • Add your plugin code on the fly (attack automation plugins via subscription mechanism, classification plugins etc): • Can’t be simpler:
Look mah, no hands! • No reload is needed, plugins executed next time the new data is processed
beyond normalities of average application scanner • Integration and use of other tools to collect and analyse data (search engine queries, ..) • Integration with other tools (script in python or ruby, or hack “plugin” in java or C) • If you like your favourite application hax0r tool – you still can use it (and feed the data to us!)
Other remainders: • Direct interaction with analyst (not fully implemented yet):
Other remainders: • Data lookup and data mining services for plugins (via mySQL database wrapping DataMiner).
Other ‘nice to have’ things in progress • Propogation module: manual or automated agent installation on vulnerable server (controlled worm spreading capability!)
Demo • Code is spaghetti (sorry about that) • Will demonstrate functional bits
Questions and Answers Sample questions, pick one: ;---------) • Why another web hacking tool? • Can you do X too..?
Thanks • Thanks for your patience • The code, slides and docs will be available in a while: http://o0o.nu/sec
Xcon plug • XCon2006 the Fifth Information Security Conference will be held in Beijing, China, during August 22-24, 2006. • Speaking: abit late, but you can try: casper@xfocus.org • Attending: should be possible and interesting • No politics! ;-) • Thanks!