440 likes | 574 Views
Beer Garden. Michael N. Gagnon Founder and Director, HellaSec LLC mike@hellasec.com. A defense against high-density attacks.
E N D
Beer Garden Michael N. Gagnon Founder and Director, HellaSec LLC mike@hellasec.com A defense against high-density attacks This work was funded by DARPA’s Cyber Fast Track program. Distribution Statement “A” (Approved for Public Release, Distribution Unlimited). The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.
Contents • What is a high-density attack? • Beer Garden defense • Theory • Demo • Implementation • Appendix 1: Near-term solutions • Appendix 2: Examples
What is a high-density attack? Background: conventional DoS • A server is powerful • It takes an army of PCs to take down one server attack traffic • Each PC sends as much traffic as possible • This traffic overloads the server • The server becomes unresponsive
What is a high-density attack? High-density attacks • It takes one PC to take down a single server by sending “high-density” attack traffic high-density attack traffic • A single attacker sends attack traffic • This traffic overloads the server • The server becomes unresponsive
What is a high-density attack? Density • Mass = resources consumed • Volume = number of requests • Density = resources consumed per request • Ratio of mass to volume • Examples of low-density requests • Most legitimate traffic • Conventional DoS traffic • Examples of high-density requests • Algorithmic-complexity attacks • Legitimate requests for expensive operations
What is a high-density attack? How do they work? • Trigger exceptional resource usage. For example: • Cause poor algorithmic performance, i.e. “algorithmic complexity attack” • Trigger an infinite loop bug • See Appendix 2 for details and more examples • What types of resources? • CPU • Memory • Bandwidth • Disk • “Virtual resources” (e.g. connections)
What is a high-density attack? Are you at risk? • A dubious “best practice”: not planning for worst-case performance because you assume it’s sufficiently rare • Unrealistic assumption: you do not know the probability distribution of your algorithm’s inputs • Inputs could become accidentally skewed • An attacker could give you worst-case input • You are most at risk if you have algorithms that have poor worst-case performance that you do not regularly experience • And it is easy to intentionally trigger worst-case performance
Beer Garden: Theory • A defense against CPU-bound high-density attacks that target web applications and web services.
Beer Garden: Theory Ambitious Goals • Generic • Fully automated • Easy configuration • Security guarantees
Beer Garden: Theory General idea • Treat server like a crowded beer garden • Doorman “you have to pay to enter” • Limits volume of attack requests that are admitted • Bouncer “you need to leave now” • Limits damage of admitted attack requests http://mikegagnon.com/provably_protecting_servers_from_high_density_resource_ consumption_DoS_attacks.pdf
Beer Garden: Theory Operation during overload • One FastCGI worker process per core • Each worker can only handle 1 request at a time • (Also keep a few “spare” workers on deck) • Doorman keeps a queue of requests • Only forwards request to a worker process if it is idle • If there are no idle workers, and a request has timed out, then ask the Bouncer to evict that request • During overloads, timeouts are very aggressive • Keep the queue short by insisting each visitor solve a computational “puzzle” • Signature service • Learns to identify the “density” of requests (real time machine learning) • Doorman creates harder puzzles for suspicious requests • Bouncer • Kill (and restart) workers when Doorman asks
Beer Garden: Theory Logical flow of a bad request
Beer Garden: Theory Security Guarantees • During an attack: • At least 95% of legitimate requests will be serviced within 250 ms • At least 3,000 low-density requests can be serviced per second (assuming attacker can solve at most 30 puzzles a second) • Actual values depend on the application, available resources, and beer garden configuration • Use our “trainer” tool to determine security guarantees • Depends on assumptions http://mikegagnon.com/provably_protecting_servers_from_high_density_resource_ consumption_DoS_attacks.pdf
Beer Garden:Implementation git clone git://github.com/mikegagnon/nginx-overload-handler.git
Beer Garden: Implementation Architecture
Beer Garden: Implementation Doorman module • Not yet implemented • Requirement: Hot path must be lightening fast to handle high volume of requests (most exposed component) • nginx module • Classifies incoming HTTP requests using signatures • Give JavaScript puzzles* in response to HTTP requests • The more suspicious a request is, the harder the puzzle is • Once visitor solves puzzle, put request in the queue • If queue gets too big, increase puzzle complexity • If the queue is non-empty: • If there is an idle worker, then forward a request to Load Balancer • If a worker has timed out, then forward a request to Load Balancer • Send copies of HTTP requests to the Request Cache • Signature service analyzes these requests to generate signatures • If there is a high volume of requests, then send samples • Send the first megabyte of the request along with the size of the request *Ari Jules and John Brainard, "Client Puzzles: A Cryptographic Countermeasure Against Connection Depletion Attacks," in Proceedings of NDSS '99 (Networks and Distributed Security Systems), 1999.
Beer Garden: Implementation Load balancer module • Mostly implemented • nginx module • Only forwards requests to idle workers • Send alerts to kill workers, as needed • Let A = number of idle workers • Let B = number of “spare” workers • There should always be at least B idle workers. • A should be >= B • If A < B, then choose request that has been in the system the longest, and send alert for that worker to Alert Router. (That worker will be killed) • Nginx notifies Load Balancer every time a request completes. • Send “request complete” message for each successfully completed request to Alert Router. (So that Signature service can know which requests are low density)
Beer Garden: Implementation Alert Router • Mostly implemented • Python service • Reads messages from Load Balancer via named pipe • Sends messages via Thrift RPC • Receives two kinds of messages: • Alerts to kill workers • “Request complete” messages • Forwards alerts to: • Bouncer, so it can kill (and restart the worker) • SignatureService, so it knows which requests have high density • Forwards “request complete” messages to: • Signature Service, so it knows what requests have low density
Beer Garden: Implementation Bouncer Process Manager • Mostly implemented • One bouncer per backend machine • Monitors worker processes • Restarts workers when they crash (or are killed) • Thrift service, implemented in Python • Receives alerts via Thrift RPC • When Bounce receives alert, it kills the selected worker (because it timed out). • Automatically restarts it
Beer Garden: Implementation Signature Service • Request Cache • Will be implemented as instance of memcached • Keeps a cache of text from HTTP requests • Signature Service • Will be implemented as Thrift service in Python • The Alert Router tells the signature service which requests are high-density and which are low-density. • The Signature Service periodically analyzes the recent examples of high- and low-density requests to learn their characteristics • Generates signatures for high-density requests and submits them to Doorman • Requirements: • Classifying requests using signatures must be lightening fast • Code to classify requests must either exist in C or be sufficiently simple (so I can implement them in C) • Generating signatures must not be too slow • Analyze relevant features, develop good signatures • Machine learning algorithms TBD
Appendix 1: Near-term solutions Backup algorithms • Complementary to Beer Garden • When overload occurs flip a switch that replaces poor-worst case algorithms with good worst-case algorithms • What kind of algorithms? • Approximate algorithms • Algorithms that are less complete • Algorithms that have poor average-case performance • Algorithms that exhibit worst-case performance under different conditions
Appendix 1: Near-term solutions Randomized algorithms • Let’s say you must always use an algorithm that has bad worst-case performance • Is it easy to intentionally trigger worst-case performance? • Can you make it hard to intentionally trigger worst-case performance? • Examples: • Shuffle before quicksort • Randomize hash seed
Appendix 1: Near-term solutions Approximate Beer Garden • Beer Garden is ambitious • Generic defense • Fully automated • Easy configuration • Security guarantees • An application-specific approximation of Beer Garden will be much easier to implement and still be valuable in practice • Approximate Signature Service • Heuristically detect high-density requests • Which requests in your app have potential for high density? • Allow admin to manually specify signatures during emergencies • Approximate Doorman: try to allocate resources “securely” • Give logged in users preference • Each “identity” (IP address or username) gets certain number of requests per minute • Give non-suspicious requests preferential treatment. For example: • Quarantine suspicious requests: if you have 10 backend machines, send the suspicious requests to 1 designated backend. Send all other requests to the remaining 9. • Approximate Bouncer • During overloads increase aggressiveness of timeouts
Appendix 1: Near-term solutions Service Oriented Arch. • Services provide performance isolation • Instead of embedding “dangerous” algorithms in application code, put each in a separate service. • E.g. a “quicksort” service • If that service gets overloaded, then that feature is no longer available • But everything else should work • Application should be developed to gracefully handle crashed services
Appendix 1: Near-term solutions Related Work • For other ideas, see related work section in http://mikegagnon.com/provably_protecting_servers_from_high_density_resource_ consumption_DoS_attacks.pdf
Appendix 2: Examples Linux-kernel vulnerability • Attack packets cause collisions in hash table in Linux kernel • Hash table operations normally O(1) • During attack O(n) • http://www.enyo.de/fw/security/notes/linux-dst-cache-dos.html Forward packet Deliver packet Routing cache implemented as a hash table Routing decision … Network device driver attack packets
Appendix 2: Examples Wikipedia high-density accident (1/2) • On June 25, 2009 rumors of Michael Jackson’s death lead to an increase of traffic to his Wikipedia page • Because Jackson’s page contained an unusually complex subsection, rendering the page caused Wikipedia’s servers to consume an excessive amount of CPU resources—leading to a site-wide DoS.
Appendix 2: Examples Wikipedia high-density accident (2/2) http://dom.as/2009/06/26/embarrassment/ http://blog.wikimedia.org/2009/06/25/current-events/ A negligible increase in network traffic (300 packets per second) caused CPU usage to go over capacity, resulting in a DoS
Appendix 2: Examples Floating point bug • A bug in both Java and PHP language runtimes • If you tried to parse a particular string as a floating point number, it would cause an infinite loop • Practical significance: unauthenticated users can cause any Java or PHP web application to crash by giving it a particular floating-point value in the header • PHP runtime: 545-line function zend_strtod • Source code for zend_strtod is almost correct • But the compiled code performs double-precision arithmetic on an extended-precision number • number converges before it is sufficiently precise (an erroneous fixed point) • The bug fix simply declares the variable as volatile • forces the use of double-precision numbers • for(;;){ • incrementally adjust number until it is sufficiently precise • }