Detection of Interactive Stepping Stones

Detection of Interactive Stepping Stones Shobha Venkataraman shobha@cs.cmu.edu Joint work with Avrim Blum & Dawn Song Carnegie Mellon University ICML Workshop 2006 June 29, 2006

X1 Xk V A Victim Attacker Stepping Stone • stepping-stone attack: attacker uses chain of compromised machines to reach victim • Difficult to find attacker from looking only at victim • Victim only sees the last host in chain

Why stepping-stones? • Stepping-stones attractive to attackers • Ease of compromising hosts on Internet • Difficulty of detection • Don’t know when host is compromised • Only know when there is attack • Don’t know who compromised • Chaos and volume of Internet traffic • Not always logged • True attacker almost untraceable: near-perfect way to achieve anonymity! • Large-scale stepping-stones: botnets…

Botnets: “For sale, stepping-stones” • Botnets: Set of compromised hosts controlled by a single “command-center” • How this works: • individual hosts compromised • “control priveleges” sold to other attackers, who use them launch attacks. • Nearly impossible to discover true attackers • Extremely prevalent on Internet • Logs at CMU dept: discovered Gaobot infection • Across 6-7 months of traffic (everything we examined!) • Across 100+ hosts (1/10th network) at peak infection

Attacker #1 VICTIM Attacker #2 Botnets (II) CMU Pittsburgh Verizon DSL Stanford

General Stepping-Stone Detection Extremely difficult: • Indefinite delay between stepping-stone “legs” • Traffic too voluminous/insufficiently logged for traceback • Packets encrypted or padded between “legs” • Stepping-stone “legs” additionally masked by adding superfluous traffic (“chaff”)

X2 X1 M2 M1 S2 S1 V A Restricting the Problem • Restrictions: • Traffic monitoring done at routers/gateways • Interactive stepping-stone streams • Bounded delay between stepping-stones T  T = 0 Internet

X2 X1 S2 S1 V A Restricting the Problem (II) • Restrictions put together: observe 2 time-delayed streams at monitor, are they a stepping-stone pair? • If attacker uses no chaff • If attacker uses chaff T  T = 0 Internet

Prior work • Donoho, Flesia, Shankar, Paxson, Coit & Staniford, RAID 02 • Assumptions needed: • Attack stream from Poisson or Pareto distribution • Normal users perfectly uncorrelated No guarantees on monitoring time or false positives • Wang & Reeves, CCS 03 • Assumptions needed: • Timing perturbation of packets iid [strong assumption] • No chaff Scheme breaks without assumptions • Other related work: [SH95, YE00, ZP00, WRWY01, WRW02, W04]

Our work • Want to allow correlations among normal users • Don’t flag just any correlated pair • Time-correlated pair != stepping-stone pair • Use milder assumptions • Model non-attack streams as sequences of Poisson processes • No additional assumption on attacker • Allow chaff • Present algorithms and analysis for these models`

Inspiration from learning theory Learning Theory Question: How many examples do we need to see before we can identify hypotheses with guaranteed confidence? Our Question: How many packets do we need to see before we can identify normal/attack streams with guaranteed confidence? Rest of talk: answer this question…

Outline • Problem definition • Without chaff • Simple Poisson model • Generalized Poisson model • With chaff • Algorithms • Hardness of detection results • Conclusions

Problem Definition (I) • Set-up: stepping-stone monitor tracks no. of packets in streams S1, S2 at a given time t : N1(t), N2(t) • Assumptions: • Packets correspond 1-1 on stepping-stone streams (without chaff) • Max tolerable delay bound  exists • Max no. packets attacker can send in time  exists: p Our bounds will be in terms of p.

Problem Definition (II) • For stepping-stone streams S1 & S2 : 1. Every packet on S2comes from S1 N1(t)  N2(t) 2. Every packet on S1appears on S2within  time N1(t)  N2(t + ) • Assumptions on normal streams next… • Detect stepping-stone pairs with guarantees on: • Monitoring time M total packets observed on both streams before detection • False-positive probability 

Simple Poisson Model • Assumptions: • Normal stream: Poisson process with fixed rates (generalize this later) • p is known (relax this later). • No chaff (generalize this later). • Outline: • Algorithm • Analysis sketch • Relax knowledge of p

S2 S1 Algorithm • Algorithm • Observe y packets on union of streams S1 and S2 • Compute difference in no. of packets d = N1 – N2. • If d is not in [- pD, pD], return NORMAL • Repeat over x iterations the above procedure • Return ATTACK if d lies in [-pD, p] throughout • Thm: with x = log 1/e, y = 2(p + 1)2 • Monitoring time M= xy = O(p2 log 1/e) packets • False positives < e

Analysis (I) • Overhead: • Only per-stream packet counters running all the time! • Compute sums & differences for pairs once in a while • Algorithm needs NO knowledge of Poisson rates • Any stepping-stone pair sending M packets reported For stepping-stone pair, d within [-pD, pD] • If |d| >pD,some packet violates max delay bound • Ensure that false positive probability less than e i.e. d leaves [-pD, pD] with probability more than 1 - e • When d leaves[-pD, pD], algorithm returns “normal”

Stream 1 Stream 2 Analysis (II) • Streams S1and S2 Poisson processes with rates 1, 2 (normalized so that 1 + 2 = 1) • On union of streams, each packet: • 1chance of coming from S1, • 2 chance of coming from S2 Time

-2 -1 0 1 2 Stream 1 Stream 2 Analysis (III) 2 1 0 0 1 1 1 0 0 Z 1 1 0 • Let Z be the difference in no. of packets on S1and S2 • Every time packet appears on S1 S2 Z = Z + 1 with probability 1 Z = Z - 1 with probability 2 • Thus, Zequivalent to 1-d random walk • Need Z to exit [-pD, p]after some steps

-2 -1 0 1 2 Analysis (IV) • Fact: 1-d random walk exits bounded region of length t in expected O(t2) time! • Therefore, • When n = O(pD2) , Pr[Z will stay in bounded region] < 1/2 • Repeat for m = log 1/e iterations Pr[Z will stay in bounded region] < e • When Z exits bounded region, normal pair does not get falsely accused. Done!

What if p is unknown? • What if we do not know p? • Use “guess and double” strategy. • Set pj = 2j. • Run algorithm over sequence of pj: p1, p2, … • When a pair is “cleared” for pj, examine it with respect to pj+1..

What if p is unknown? • For stepping-stone pair, increases monitoring time by O(log log p). • Guarantee depends only on true value of p! In practice, set upper bound for p • Normal streams monitored until upper bound reached • As j increases, test differences exponentially less often • Fundamental problem: cannot distinguish between normal pair and attack pair with longer delay bound

Summary: Simple Poisson • Normal streams: Poisson process with single fixed rate. • Algorithm with guaranteed false positives and monitoring time • Algorithm needs no knowledge of Poisson rates • Analysis extended • When p is unknown • When false positive probability is distributed over all pairs of streams: in paper

Generalized Poisson model • Model normal process as SEQUENCE of Poisson processes: varying rates for varying time periods i.e. stream given by: (1,t1), (2,t2), … • General model: coarsely approximate almost any usage pattern, for example: • Coarsely simulate Pareto distributions – good model of typing patterns • Correlated users: same sequence of Poisson rates &time intervals

-2 -1 0 1 2 Analysis Sketch • Formally, a stream Sis given by: (1, t1), (2, t2), … • Key observation: At time T, packet distribution equivalent to Poisson process with single fixed rate j (j. tj)/T (weighted mean) • More details in paper.

Summary: General Poisson • Normal streams modelled as sequences of Poisson processes: (1, t1), (2, t2), … Very general model • Algorithm with guarantees on monitoring time and false positive rate • Once again, algorithm needs no knowledge of Poisson rates • Results in this model extended similarly: • When p is unknown • When false positive probability is distributed over all pairs of streams

Chaff • Algorithms (as presented) broken by single packet of chaff • Next, modify algorithms to handle limited chaff… Stepping Stone Victim Attacker • Chaff: dummy packets inserted in traffic streams to avoid detection

Chaff: Algorithms • Fix chaff rate, but chaff arbitrarily distributed • Simple Poisson model • Algorithm: • Let y benumber of packets needed before we exit bounded region in random walk. • Allow chaff rate of p/4y, monitor for difference to leave [-2pD, 2pD] • Regular streams get difference (wait longer) • Can tweak algorithm to handle slightly higher chaff rate, but that’s all. Hardness results next… • Extends similarly to general Poisson model.

Hardness of Detection • No algorithm based on timing delays alone can detect stepping-stones with smart use of chaff • Can give bounds on chaff needed so attacker can • pre-generate two independent processes • send packets to mimic independent processes exactly • Details & strategies in paper • If attacker can actively send such chaff, detection requires use of other information

Summary • Algorithms to detect stepping stones: • Guarantees on monitoring time and false positives • Simple and generalized Poisson models • With and without (arbitrarily distributed) chaff • When p is known/unknown • Compared to previous work: • Milder assumptions, allow for substantial correlation among normal users • No additional assumptions on attacker (besides delay bound) • With sufficient chaff, attacker can mask stepping stones, so that no algorithm that uses inter-packet delays can detect them.

Prior Work

Detection of Interactive Stepping Stones

Detection of Interactive Stepping Stones

Presentation Transcript

Stepping Stones to Using Data

Roadblocks v. Stepping Stones:

Stepping Stones to Using Data

Stepping Stones to Using Data

Stepping stones or Stumbling Blocks?

Stepping Stones to War

Stepping Stones Presentation

Stepping Stones to Using Data

Stepping Stones in Vanuatu 2

The Stepping Stones Transition Programme

The Vision for Stepping Stones

Stepping Stones to Using Data

Stepping Stones Tweede Fase

Stepping Stones in Vanuatu

Stepping Stones II

Stepping Stones Two Wells

Stepping Stones in Vanuatu

Regional Stepping Stones Retreat Programme

Regional Stepping Stones Retreat

Stepping Stones to Using Data

Stepping Stones of Atlanta Recovery Residences

Detection of Interactive Stepping Stones