590 likes | 725 Views
Original CanSecWest 04 Presentation: Matt Conover & Oded Horovitz XP SP2 Additions added/presented, Matt Conover @ SyScan 2004. Windows Heap Exploitation (Win2KSP0 through WinXPSP2). Agenda. “Practical” Windows heap internals How to exploit Win2K – WinXP SP1 heap overflows
E N D
Original CanSecWest 04 Presentation: Matt Conover & Oded Horovitz XP SP2 Additions added/presented, Matt Conover @ SyScan 2004 Windows Heap Exploitation(Win2KSP0 through WinXPSP2)
Agenda • “Practical” Windows heap internals • How to exploit Win2K – WinXP SP1 heap overflows • 3rd party (me ) assessment of WinXP SP2 improvements • How to exploit WinXP SP2 heap overflows • Summary
Windows Heap Internals • Many heaps can coexist in one process (normally 2-3) PEB Default Heap 2nd Heap
Windows Heap Internals • Important heap structures Segments Segment List Virtual Allocation list Free Lists Lookaside List
Windows Heap Internals • Introduction to Free Lists • 128 doubly-linked list of free chunks (from 8 bytes to 1024 bytes) • Chunk size is table row index * 8 bytes • Entry [0] is a variable sized free lists contains buffers of 1KB <= size < 512KB, sorted in ascending order 1400 2000 2000 2408 16 16 48 48
Windows Heap Internals • Lookaside Table • Used for “fast” allocates and deallocates when available • Starts empty • 128 singly-linked lists of busy chunks (free but left marked as busy) 16 48 48
Windows Heap Internals • Why have lookasides at all? Speed! • Singly-linked • Used to quickly allocate or deallocate • No coalescing (leads to fragmentation) • So the lookaside lists “fill up” quickly (4 entries)
Windows Heap Internals 01 – Busy 02 – Extra present 04 – Fill pattern 08 – Virtual Alloc 10 – Last entry 20 – FFU1 40 – FFU2 80 – No coalesce Self Size Previous chunk size Segment Index Flags Unused bytes Tag index (Debug) 0 1 2 3 4 5 6 7 8 • Basic chunk structure – 8 Bytes Overflow direction
Windows Heap Internals 0 1 2 3 4 5 6 7 8 • Free chunk structure – 16 Bytes Self Size Previous chunk size Segment Index Flags Unused bytes Tag index (Debug) Next chunk Previous chunk
Windows Heap Internals • Allocation algorithm (high level) • If size >= 512K, virtual memory is used (not on heap) • If < 1K, first check the Lookaside lists. If there is no free entries on the Lookaside, check the matching free list • If >= 1K or no matching entry was found, use the heap cache (not discussed in this presentation). • If >= 1K and no free entry in the heap cache, use FreeLists[0] (the variable sized free list) • If still can’t find any free entry, extend heap as needed
Windows Heap Internals • Allocate algorithm – FreeLists[0] • This is usually what happens for chunk sizes > 1K • FreeLists[0] is sorted from smallest to biggest • Check if FreeLists[0]->Blink to see if it is big enough (the biggest block) • Then return the smallest free entry from free list[0] to fulfill the request, like this: • While (Entry->Size < NeededSize) • Entry = Entry->Flink
Windows Heap Internals • Allocate algorithm – Virtual Allocate • Used when ChunkSize > VirtualAlloc threshold (508K) • Virtual allocate header is placed on the beginning of the buffer • Buffer is added to busy list of virtually allocated buffers (this is what Halvar’s VirtualAlloc overwrite is faking)
Windows Heap Internals • Free Algorithm (high level) • If the chunk < 512K, it is returned to a lookaside or free list • If the chunk < 1K, put it on the lookaside (can only hold 4 entries) • If the chunk < 1K and the lookaside is full, put it on the free list • If the chunk > 1K put it on heap cache (if present) or FreeLists[0]
Windows Heap Internals • Free Algorithm – Free to Lookaside • Free buffer to Lookaside list only if: • The lookaside is available (e.g., present and unlocked) • Requested size is < 1K (to fit the table) • Lookaside is not “full” yet (no more than 3 entries already) • To add an entry to the Lookaside: • Put to the head of Lookaside • Point to former head of Lookaside • Keep the buffer flags set to busy (to prevent coalescing)
Windows Heap Internals A C A C A B • Free Algorithm – Coalesce Step 1: Buffer free Step 2: Buffer removed from free list Step 3: Buffer removed from free list A + B Coalesced A + B + C Coalesced Step 4: Buffer placed back on the free list
Windows Heap Internals • Free Algorithm – Coalesce • Where coalesce cannot happen: • Chunk to be freed is virtually allocated • Chunk to be freed will be put on Lookaside • Chunk to be coalesced with is busy • Highest bit in chunk flags is set • …
Windows Heap Internals • Free Algorithm – Coalesce (cont) • Where coalesce cannot happen: • Chunk to be freed is first no backward coalesce • Chunk to be freed is last no forward coalesce • The size of the coalesced chunk would be >= 508K
Windows Heap Internals • Summary – Questions? • Just remember: • Lookasides are allocated from and freed to before free lists • FreeLists[0] is mainly used for 1K <= ChunkSize < 512K • Coalescing only happens for entries going onto FreeList, not lookaside list • Entries on a certain lookaside will stay there until they are allocated from
Heap Exploitation: Basic Terms • 4-byte Overwrite • Able to overwrite any arbitrary 32-bit address (WhereTo) with an arbitrary 32-bit value (WithWhat) • 4-to-n-byte Overwrite • Using a 4-byte overwrite to indirectly cause an overwrite of an arbitrary-n bytes
Arbitrary Memory Overwrite Explained Index < 64 Flags != 1 Fake Flink (WithWhat) Fake Blink (WhereTo) • Coalesce-On-Free 4-byte Overwrite • Utilize coalescing algorithms of the heap • This is the method first discussed by Oded and I at CSW04 – it is our preferred method for reliable heap exploitation on all versions < XPSP2 • Just make sure to fill the Lookaside[ChunkSize] (put 4 entries on heap) before freeing a chunk of ChunkSize to ensure coalescing • Arbitrary overwrite happens when the overflowed buffer gets freed Overflow start
Arbitrary Memory Overwrite • Lookaside List Head Overwrite: • 4-to-n-byte overwrite • What we want to do is overwrite a Lookaside list head and then allocate from it • We must be the first one to allocate that size • We will get a chunk back pointing to whatever location in memory we want • Use this to overwrite a function pointer or put the shellcode at a known writable location
Arbitrary Memory Overwrite • Lookaside List Head Overwrite: How To • Use the Coalesce-on-Free Overwrite, with these values: • FakeChunk.Blink = &Lookaside[ChunkSize] where ChunkSize is a pretty infrequently allocated size • FakeChunk.Flink = what we want a pointer to • To calculate the FakeChunk.Blink value: • LookasideTable = HeapBase + 0x688 • Index = (ChunkSize/8)+1 • FakeChunk.Blink = LookasideTable + Index * EntrySize (0x30) • Set FakeChunk.Flags = 0x20, FakeChunk.Index = 1-63, FakeChunk.PreviousSize = 1, FakeChunk.Size = 1
Exploition Made Simple Overwrite PEB lock routine to point to PEB space Put shellcode into PEB space Then cause the PEB lock routine to execute PEB Header PEB lock/unlock function pointers 0x7ffdf020, 0x7ffdf024 0x7ffdf130 ~1k of payload
Exploitation Made Simple • Win2K through WinXP SP1 in a single attempt: • First 4-byte overwrite: • Blink = 0x7ffdf020, • Flink = 0x7ffdf154 • 4-to-n-byte overwrite: • Blink = &Lookaside[(n/8)+1] • Flink = 0x7ffdf154 • Be the first to allocate n bytes (cause HeapAlloc(n)): • Put your shellcode into the returned buffer • All done! Either wait, or cause a crash immediately: • For example, do 4-byte overwrite with Blink = 0xABABABAB
Exploitation Made Simple • Forcing Shellcode To Run • Most applications (read: everyone but MSSQL) don’t specially handle access violations • An access violation results in ExitProcess() being called • Once the process attempts to exit, ExitProcess() is called • The first thing ExitProcess() does is call the PEB lock routine • Thus, causing crash = instant shellcode execution • Nice
Exploitation Made Simple • Demo
Heap Exploitation • Questions? • This technique we just covered is very reliably, providing success almost every time on all Win2K (all service packs) and WinXP (up to SP2) • On to XP SP2….
XP Service Pack 2 • Effects on Heap Exploitation • New low fragmentation heap for chunks >= 16K • PEB “shuffling” (aka randomization) • New security cookie in each heap chunk • Safe unlinking: (usually) stops 4-byte overwrites
XP Service Pack 2 • PEB Randomization • In theory, it could have a big impact on heap exploitation – though not in reality • Prior to XP SP2, it used to always be at the highest page available (0x7ffdf000) • The first (and ONLY the first) TEB is also randomized • They seem to never be below 0x7ffd4000
XP Service Pack 2 • PEB Randomization – Does it make any difference? • Not much, randomization is definitely a misnomer • If 2 threads are present: • We can write to 0x7ffdf000-0x7ffdffff, and • 2 other pages between 0x7ffd4000-0x7ffdefff • If 3 threads are present: • 0x7ffde000-0x7ffdffff • 2 other pages between 0x7ffd4000-0x7ffdefff • … • If 11 threads are present: • 100% success, no empty pages
XP Service Pack 2 • PEB Randomization – Summary • Provides little protection for… • Any application that have m workers per n connections (IIS? Exchange?) • Any service in dllhost/services/svchost or any other “active” surrogate process
XP Service Pack 2 Self Size Self Size Previous chunk size Previous chunk size Segment Index New Cookie Flags Flags Unused bytes Unused bytes Segment Index Tag index (Debug) 0 1 2 3 4 5 6 7 8 • Heap header cookie *reminder: overflow direction XP SP2 Header Current Header
XP Service Pack 2 • Heap header cookie calculation • If ((AddressOfChunkHeader / 8) XOR Chunk->Cookie XOR Heap->Cookie != 0) CORRUPT • Since the cookie has only 8-bits, it has 2^8 = 256 possible keys • We’ll randomly guess the security cookie, on average, 1 of every 256 attempts
XP Service Pack 2 • On the normal WinXP SP2 system, corrupting a chunk will do nothing • Since we only overwrite the Flink/Blink of the chunk, we corrupt no other chunks • Thus we can keep trying until we run out of memory
XP Service Pack 2 • Summary so far… • At this point, we see that we can with enough time trivially defeat all the other protection mechanisms. • On to “safe” unlinking…
XP Service Pack 2 A B C • Safe Unlinking • Safe unlinking means that RemoveListEntry(B) will make this check: • (B->Flink)->Blink == B && (B->Blink)->Flink == B • In other words: • C->Blink == B && A->Flink == B • Can it be evaded? Yes, in one particular case. Header to free
XP Service Pack 2 • UnSafe-Unlinking FreeList Overwrite Technique p = HeapAlloc(n); FillLookaside(n); HeapFree(p); EmptyLookaside(n); Overwrite p[0] (somewhere on the heap) with: p->Flags = Busy (to prevent accidental coalescing) p ->Flink = (BYTE *)&ListHead[(n/8)+1] - 4 p ->Blink = (BYTE *)&ListHead[(n/8)+1] + 4 HeapAlloc(n); // defeats safe unlinking (ignore result) p = HeapAlloc(n); // defeats safe unlinking // p now points to &ListHead[(n/8)].Blink
XP Service Pack 2 • Defeating Safe Unlinking (before overwrite) [4] Blink ListHead[n-1] [0] Flink [0] Flink FreeChunk ListHead[n] [4] Blink [4] Blink [0] Flink ListHead[n+1]
XP Service Pack 2 • Defeating Safe Unlinking: Step 1 (Overwrite) [4] Blink ListHead[n-1] [0] Flink [0] Flink FreeChunk ListHead[n] [4] Blink [4] Blink [0] Flink ListHead[n+1] Now call HeapAlloc(n) to unlink FreeChunk from ListHead FreeChunk->Blink->Flink == *(*(FreeChunk+4)+0) FreeChunk->Flink->Blink) == *(*(FreeChunk+0)+4) Both point to FreeChunk, unlink proceeds!
XP Service Pack 2 • Defeating Safe Unlinking: Step 2 (1st alloc) [4] Blink ListHead[n-1] [0] Flink ListHead[n] [4] Blink [0] Flink ListHead[n+1] FreeChunk->Blink->Flink = FreeChunk->Flink FreeChunk->Flink->Blink = FreeChunk->Blink Returns pointer to previous FreeChunk
XP Service Pack 2 • Defeating Safe Unlinking: Step 3 (2nd alloc) [4] Blink ListHead[n-1] [0] Flink ListHead[n] [4] Blink [0] Flink ListHead[n+1] Returns pointer to &ListHead[n-1].Blink Now the FreeLists point to whatever data the user puts in it
XP Service Pack 2 • Questions?
XP Service Pack 2 • Unsafe-Unlinking FreeList Overwrite Technique • For vulnerabilities where you can control the allocation size, safe unlinking can be evadable. • But is this reliable? Hardly. • …
XP Service Pack 2 • Unsafe-Unlinking FreeList Overwrite Technique (cont) • We have to flood the heap with this repeating 8 byte sequence: • [FreeListHead-4][FreeListHead+4] • And hope the Chunk’s Flink/Blink pair is within the range we can overflow • But there is an even easier method…
XP Service Pack 2 • Chunk-on-Lookaside Overwrite Technique • In fact on XP SP2, there is an even easier method • Lookasides lists take precedence over free lists • This is quite convenient because… • Lookaside lists (singly linked) are easier to exploit than the free lists (doubly linked)
XP Service Pack 2 • Chunk-on-Lookaside Overwrites • HeapAlloc checks the lookaside before the free list • There is no check to see if the cookie was overwritten since it was freed • It is a singly-linked list, thus the safe unlinking check doesn’t apply • Result: a clean exploitation technique (albeit with brute-forcing required)
XP Service Pack 2 • Chunk-on-Lookaside Overwrites (Technique Summary) // We need at least 2 entries on lookaside a_n[0] = HeapAlloc(n) a_n[1] = HeapAlloc(n) HeapFree(a_n[1]) HeapFree(a_n[0]) Overwrite a_n[0] (somewhere on the heap) with: a_n[0].Flags = Busy (to prevent accidental coalescing) a_n[0].Flink = AddressWeWant HeapAlloc(n) // discard, this returns a_n[0] p = HeapAlloc(n) p now points to AddressWeWant
XP Service Pack 2 • Chunk-on-Lookaside Overwrite - Success rate? • Reqiures overwriting a chunk already freed to the lookaside • If an attacker overflows a buffer repeatedly, how often will he/she need to before succeeding?
XP Service Pack 2 • Chunk-on-Lookaside Overwrite – Empirical results • 64K heap with 1 segment • All chunk sizes sizes between 8-1024 bytes • Max overflow size = 1016 bytes • Random number of allocs between 10-1000 • Free probability of 50% • Took an average of 84 allocations to be within overflow range • It will take at least 2 overwrites (one to overwrite a function pointer, one to place shellcode)
XP Service Pack 2 • Chunk-on-Lookaside Overwrite – Empirical results • Application specific function pointer and writable location for shellcode: • 84*2 = 168 attempts to execute shellcode • Using PEB lock routine + PEB space (application generic): • 84*2*12=2,016 attempts to execute shellcode • The 12 is for the 12 possible locations of the PEB due to PEB randomization