550 likes | 567 Views
Dive into the complexities of media software security, focusing on Windows DirectShow framework and codec vulnerabilities. Learn about filter graphs, data flow models, and more in this insightful exploration.
E N D
March 2008 Media Frenzy: Attacking the Windows Media Framework CansecWest 2008 Mark Dowd, John McDonald IBM ISS X-Force R&D
Media Software Why worry about media software? Multimedia content is at an all-time high Internet Stealing - Movies / TV shows / mp3’s Streaming media (http://www.di.fm and the like) Podcasts, VOIP Video clips (YouTube, http://videos.google.com, etc…) Embedded content in documents, web pages, emails, etc… Everyone uses media software Ubiquitous client-side vector
Media Software Security • Is media software secure? • Almost definitely maybe (probably) • Ok, no. • Contributing factors • Changes fast, with new technologies and rapid expansion • Content is nearly always untrusted • Few people worry about getting owned while watching video • Note: Microsoft code is officially Internet Not Horrible™ • Third party code FTW!
Media Software Security Large, nuanced attack surface Media file formats are often containers for arbitrary data streams Complex data flow Processing handed off to any number of possible codecs Extensive set of codecs available on a default system Example: MPEG1, MPEG2, MP3, MP4S, SAMI, many others… Most users install additional ones Example: DivX/XviD, AAC, ffdshow Not enough security research into this topic Existing research focuses on file-formats and fuzzing Prior work by David Thiel, at Blackhat 2007 (https://www.blackhat.com/presentations/bh-usa-07/Thiel/Presentation/bh-usa-07-thiel.pdf) Easy for fuzzers to miss large chunks of functionality Discovering attack surface and codecs is non-trivial
Our Focus What we will cover Media software built for Windows Focusing on DirectShow Enumeration of registered codecs on a given system How to audit a typical codec What we won’t cover Video For Windows, DMO, MF, Silverlight (sorry, not enough time!) Playback software/codecs for Unix, VLC, Quicktime Fuzz-testing Databases Diabetes The Hanseatic League circa 1432
"Pwn" -- security slang for compromising, or owning, a computer system -- is pronounced like the "pon" in pony. Directshow
DirectShow Overview • Media processing framework for Windows • Playing Media Files • Conversion between Formats • Media Capture • Central Registry • Supports multiple A/V compression and file formats • Easily extended to add support for new types of media • AVI, WMF, ASF, MPEG2, etc… • Internally uses DirectSound/DirectDraw/Direct3D/etc.. • Interfaces with various hardware • Modular Architecture • built on COM
DirectShow Overview II • Basic building block – Filter • COM object that implements IFilter interface • You link filters together to perform various tasks • Create a filter graph • Filters have input pins and output pins • Connect output pins to input pins
Filters • Data leaves the filter graph through a Renderer Filter • Deliver data to the user or a device or file • Typically one input pin • Data enters the filter graph through a Source Filter • Provides input data from a file, url, or device • Typically one output pin
Filters III • A Mux Filter, or multiplexor, is the logical opposite • Takes separate constituent streams and joins them together into a single output • Used to create media files • Media files are typically parsed by a Splitter Filter • A Splitter Filter, or Demultiplexor, takes input data and splits it into multiple separate output streams • Typically one input pin and two or more output pins
Transform Filters • Transform Filters do the rest of the data processing • exactly one input and one output • Codec Filters • Used for compressing or decompressing data with codecs • Conversion Filters • Takes data in one format and outputs data in another format • Color schemes or image scaling
Data Flow • Data flows downstream from an output pin to an input pin • Two models for data flow between pins: push and pull • Push – upstream filter prepares a buffer full of data and then delivers it to the downstream filter • useful when there is a linear stream of data going from one filter to the next • Default model, more complicated • Pull – the downstream filter directly requests certain data from its upstream filter • Used when a downstream filter needs random access to the upstream’s data • Generally used for splitter filters that need to parse files
graphedt and the Filter Graph • Graphedt.exe (in windows sdk) • Lets you experiment with filter graphs • Instantiate and connect filters installed on your system • See the filters chosen to render a given file
Filter Graph Manager • The Filter Graph Manager controls all of the filters, and is responsible for: • Choosing, Initializing, and Connecting the filters • (More on this later) • Maintaining a reference clock • All of the filters use the clock to stay in lockstep • Synchronizing the filter actions. • start, pause, and stop • Apps call the Filter Graph Manager • which sets up and calls the filters
How do codecs get selected, anyway? We need to define attack surface before auditing Enumerating codecs on a system Determining which codecs are reachable through remote vectors (such as malicious AVI files) Knowing which codec will be selected upon collision DirectShow Filters are looked up in the registry by CLSID Filters are organized by category Quite a few categories available (http://msdn2.microsoft.com/en-us/library/ms783347(VS.85).aspx) Only interesting category for us is “DirectShow Filters” (CLSID_LegacyAMFilterCategory) Location in the registry is HKEY_CLASSES_ROOT\<Category CLSID>\Instance The “Instance” subkey contains a collection of CLSID subkeys corresponding to registered filters
How do codecs get selected, anyway? Each input pin for a filter accepts data of a certain media type Filters instantiated one by one Pins queried using CBasePin::CheckMediaType() Filters are sorted in order of priority (“merit value”) Some filters don’t participate in this process at all (merit <= MERIT_DO_NOT_USE) Once filter is connected successfully to the filter graph, the process starts again The connected filter will create 0 or more output pins, This media type is retrieved with CBasePin::GetMediaType() Media type used for negotiation is done with AM_MEDIA_TYPE structures Data structure that fully describes a media type for a given stream Uses GUIDs for distinguishing both the media type and additional information pertaining to that type Typing information might be implicit (such as MPEG2 Video for MPEG files), or user specified (streams in an AVI file)
struct _AM_MEDIA_TYPE • majortype – general type of data (e.g. video – MEDIATYPE_Video, audio, opaque stream, text, etc..) • subtype – specific type of data (e.g DIVX, MP4S, audioone) • bFixedSizeSamples, lSampleSize • for fixed sample sizes • bTemporalCompression • interframe compression • Formattype, cbFormat, pbFormat • Type, Length, and Ptr for format block
Type Codes Media types use FourCC Codes Many container formats identify streams with FourCC codes instead of GUIDs Typically located in a stream header DWORD for video streams is 4 ASCII characters that represent the stream type For audio streams, integer is used GUID is derived by adding constant “0000-0010-800000AA00389B71” Eg. “divx” = {64697678-0000-0010-800000AA00389B71} Common FourCC codes available at http://www.fourcc.org Example FourCC codes (taken from http://www.fourcc.org)
Matching the media types yourself… Enumerating available codecs involves identifying those with media types accessible to you FourCC-based media types can all be reached from AVI files Examining the CheckInputType() function for an input pin can determine what types a codec will accept The “FilterData” key present for many codecs also gives this valuable information away A more precise method: programmatically querying the registry Programmatic method for enumerating filters based on various properties Achieved with the FilterMapper2 COM object (http://msdn2.microsoft.com/en-us/library/ms787861(VS.85).aspx) Select filters by merit, input/output pin count, input/output pin types and more Can also enumerate pins by category using the device enumerator using the SystemDeviceEnum COM object (http://msdn2.microsoft.com/en-us/library/ms787871(VS.85).aspx)
“Hey, I’m just the doctor – I don’t make the needles sharp.” - Alan Johnson, Peep Show Auditing Directshow
Auditing Overview • Attacking media software • Attack Surface • Data Flow • Auditing Direct Show components • Source Filters • Splitter Filters • Transform Filters • Complex attacks • Desynchronization Attacks • Dynamic Format Changes • Exploitation
Media File Attack Surface • How do you attack media software? • Provide a malicious file • Embed media content • Web pages, flash, OLE, etc.. • What’s in a media file? • Streams • Video, audio, text(subtitles), or other data • Media data • Raw or compressed, split among various types of frames: key and interpositional • Meta-information • Describes how to parse, decompress, navigate, and render the media data
Media File Attack Surface • Meta-information is your primary target • Header information for the file as a whole • Record and layout information for the file • Header information for each stream • Length, Width, Bit depth, Sample Size, Bitrate, Buffer Size, Allocation Size • Meta-information for each media sample • Index information • Chronological information for changes in format and synchronization • Actual compression meta-data • Various levels, pertinent to different filters
Example of Propagation -> AVI Your primary task as an auditor will be tracing the flow of meta-information data throughout the system.
Propagation of data • Filter to Filter (push model) • Data handed over in media samples • Typically fixed sized buffers • Size decided on negotiation • Allocator • They choose and configure an allocator • Upstream gets empty Media Sample from allocator • Fills out data, sets the used length, ships it • Downstream gets Media Sample • Extracts data and processes it
Media Samples • Core concept: Media Samples • Generic encapsulation object • Implements IMediaSample • Abstraction used because data can live somewhere “complicated” • Video memory, dma, sound card buffers, etc. • A media sample has: • underlying data • a time stamp • a media type • (if there is a change)
Source Filters • Responsible for providing data from media source • Typically a file or URL • Upstream to a splitter filter • Output pin implements IAsyncReader • Typically uses pull model for random access • General Dataflow (pull model) • The splitter decides what it needs to read next • It allocates or resizes a buffer locally, if necessary (no formal allocator) • Splitter calls SyncRead() on the upstream output pin. • Splitter processes the data placed into its local buffer by the upstream filter.
Source Filters • Load() • Loads media (called by filter graph manager) • Audit protocol parsing code • Low-level parsing issues • Output Pin • implements IAsyncReader • Async • Request() / WaitForNext() with a MediaSample • Sync • SyncRead()/SyncReadAligned() with local memory • Look for design problems • Requests across security domains
Splitter Filters • Parses media file and extracts streams • Pass them to downstream filters • File Parsing • Typically have constructions susceptible to numeric issues, such as length prefixed blocks, etc. • Look for underflows, wraps, etc • AVI/WAV recently had such an issue (http://www.microsoft.com/technet/security/Bulletin/MS07-064.mspx) • ISS X-Force disclosed such a bug also (http://www.microsoft.com/technet/security/Bulletin/MS07-068.mspx) • Discovered by Alex Wheeler and Ryan Smith (internet partners in crime) • Dynamic Format Changes • Attaching media type to media sample
File Parsing Example – AVI Splitter • File parsing bug in super-index processing • Undisclosed, but innocuous • AVI Files have indexes • Offset/length pairs, and flags • They can have super-indexes • Point to all the indexes in the file • Offset/length pairs • Validity of offset/length never checked • Internal validity of super index and sub-index entry checked • Length can be pathologically small • Causes existing memory contents to be parsed as index
Splitter Filter – Media Type Construction • Splitter filters construct a media type • Communicate format of data for downstream filters • Derived from meta-information in media file • Possibly read verbatim (AVI strf) • Some high-level validation typically performed • Constraints on our attacks on transform filters • AVI – performed on BMI format blocks, but not others • Private data after BMI is not validated • Consider effects of mixing and matching (codec-hell) • Different splitter that performs less or different validation • Different downstream that assumes different validation
Transform Filters • Transform Filters are your most common target • Single input stream and single output stream • Usually decompressing a compressed stream • Most codecs you download are of this type (DivX, AAC, AC3, M4S…) • The CTransformFilter class is used to simplify codec development • Source is in Windows SDK (samples\Multimedia\DirectShow\BaseClasses\transfrm.h) • Handles pin negotiation • Moves processing into various functions in CTransformFilter • Developer overrides/implements these functions
Transform Filters II – Areas of Interest • Input MediaType Processing/Validation • CheckInputType() • Gotcha: Negative Height • Output MediaType Processing/Validation • CheckTransform() • Allocator Configuration • DecideBufferSize() • Main Data Processing • Transform() • SetActualDataLength()
Transform Filters – Mediatype Negotiation • CheckInputType() • Called by CTransformInputPin::CheckMediaType() • Inspects media type and encapsulated format block • Check for integer overflows (e.g. width * height * color depth for video, nchannels * bitrate for audio) • Check for special cases (negative height in BMI) • Discover what sanity checks are needed for this filter to accept the proposal • CheckTransform() • Called by CTransformOutputPin::CheckMediaType() • Determines if filter can convert input MT to provided MT • Output type is usually derived from the input type • Similar issues to what you would look for in CheckInputType()
Transform Filters – Allocator Configuration • After Media Types are decided, output pin chooses allocator • An allocator is responsible for: • Provisioning empty media samples • Tracking media samples with reference counters • Free’ing and/or recycling media samples • Allocators typically allocate a pool of media samples, and hand them out as they are needed. • Also, be aware of internal allocations • occur after configuration of media types • no use of formal allocator/sample mechanism • Example: Xvid, ffdshow library wrappers
Transform Filters – Allocator Configuration • DecideBufferSize() • Caller provides ALLOCATOR_PROPERTIES structure • Used by output pin to configure allocator • Note: Allocation isn’t completed until later… Allocator Properties Structure cBuffers – number of buffers created by the allocator cbBuffer – size of each buffer in bytes, excluding prefix cbAlign – alignment of buffer cbPrefix – each buffer is preceded by a prefix of this many bytes
Transform Filters – Main Data Processing • Auditing the data processor • A large portion of the time, it’s doing decompression • Decompression makes something small into something large • Does this seem like the sort of thing we’d be interested in? :> • What to look for depends very much on what the codec does • Compressed streams with invalid huffman codes • Additional metadata in headers that aren’t correctly sanitized • Where to look • Receive() for filters using push model, Transform() for transform filters • Function decodes input into a (pre-allocated) buffer • IMediaSample::GetPointer() • Offset 0x0c in IMediaSample vtable • IMediaSample::GetActualDataLength() • Offset 0x2c in IMediaSample vtable
Format Block Desynchronization • Information is often duplicated in multiple places • If it is sourced from two or more separate user-malleable places, internet chaos™ can ensue • Format blocks often appear out-of-band • Format block describes a specific stream, but is not part of that stream • Recall AVI “strf” chunk • Also happens with ASF • For video, BITMAPINFOHEADER structure used (height, width, color depth, palette, etc) • For audio, WAVEFORMATEX (channels, bitrate, etc)