1 / 46

Reversing Data Formats: What Data Can Reveal

Reversing Data Formats: What Data Can Reveal. by Anton Dorfman ZeroNights 0x03, Moscow, 2013. About me. Fan of & Fun with Assembly language Researcher Scientist Teach Reverse Engineering since 2001 Candidate of technical science. Observed topics. Samples Basics Thoughts Fields

nydia
Download Presentation

Reversing Data Formats: What Data Can Reveal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reversing Data Formats:What Data Can Reveal by Anton Dorfman ZeroNights 0x03, Moscow, 2013

  2. About me • Fan of & Fun with Assembly language • Researcher • Scientist • Teach Reverse Engineering since 2001 • Candidate of technical science

  3. Observed topics • Samples • Basics • Thoughts • Fields • Related works • Applications

  4. Samples

  5. What is it?

  6. What is it? Hint 1

  7. What is it? Hint 2

  8. What is it?

  9. What is it?

  10. What is it? Hint 1

  11. What is it? Hint 2

  12. Basics

  13. Standard way • Hex editor • Researcher • Brain (equally important astheHexeditor) • Basic knowledge how data can be organized (in brain) • Analysis of the executable file that manipulates with data format

  14. 3ways of researching • Dynamic analysis – use reverse engineering of application that manipulate with data format • Static analysis – try to extract header, structures, fields and try to find relationship between data • Statistic analysis – when have some amount of samples and use statistics of changes and ranges of values

  15. Breaking the rules • There is the rule RTFM (Read The F**king Manual) • Nobody likes it • I’m not exception • First of all I start my research, and second – try to find related works and generalize ideas from them

  16. Definitions • Field – some value used to describe Data Format • Structure – way for organizing various fields • Data – information for representing which Data Format is developed • Header – common structure before data, may contain substructures

  17. Thoughts

  18. Main Idea • Data format is developed by human (not pets or aliens) • Sometimes looks like that no human works on it… • Format developers use common data organization concepts and similar thoughts when creating new data formats • If we find regularities in data format organization rules we can automate searching of them

  19. Three types of data format • File formats – multimedia formats, database formats, internal formats for exchanging between program components and etc. • Protocols – network protocols, hardware device interaction protocols, protocols of interaction between driver and user space application and etc. • Structures in memory – OS structures, application structures and etc.

  20. Levels of abstraction • Bit Order • Byte Order • Fields Size • Field Basic Type • Field Type • Structure • Field Semantics

  21. Tasks to solve • Extract header, separate it from data • Find field boundaries • Find structures and substructures • Find types of fields • Detect bit and byte ordering • Determine semantics of fields • Interpret goal of field – it’s semantics

  22. Bit order • There are most significant bit – MSB and least significant bit - LSB • MSB is a left-most bit – commonly used – value is 10010101b – 95h - 149 • LSB is a left-most bit - value is 10101001b – A9h - 169

  23. Byte order • Little-endian – Intel byte order • Big-endian – network byte order • Middle-endian or mixed-endian – mixed byte order – when length of value more then default processor machine word

  24. Fields

  25. Types of fields • Service fields – for describing Data Format (size of structure and etc.) • Common fields – “fields from life” (time, date and etc.) • Specific fields – we can find range for that type (bit flags and etc.) • Application specific – can be interpret only by application, we should use dynamic analysis

  26. Field Size and Levels of field interpretation • Commonly field is a byte sequence • Sometimes field is a bit sequence • Fixed size in bytes (1, 2, 3, 4, …) • Fixed size in bits (1, 2, 3, 4, …) • Variable size in bytes • Variable size in bits

  27. Basic field types • Hex value – by default • Decimal value • Character value (up to 4 symbols) • String (ASCII or Unicode) • Float value • Bits value

  28. Types of field • Text • Offset • Size • Quantity • ID • Flag • Counter • DateTime • Protect

  29. Text • Usually fixed size, remaining space after text filled with 0 • Variable sized text ended with 0, or there are size of this text field • Usually text string contain their meanings

  30. Offset • Offset to another field • Offset to data • Pointer to another structure (may be child or parent) • Pointer to another instance of such structure (next or previous in linked list) • Can be relative and absolute

  31. Export in PE-EXE

  32. Size • Size of data • Size of structure • Size of substructure • Size of field • Can be not only in bytes, but also in bits, words (2 byte), double words, paragraphs (16 byte) and etc.

  33. Quantity • Quantity of substructures • Quantity of elements • IMAGE_NT_HEADERS.IMAGE_FILE_HEADER. NumberOfSections • IMAGE_EXPORT_DIRECTORY. NumberOfFunctions

  34. ID • Identifier of data format, often called magic number • ID of structure • ID of substructure • Often ID is value consist of char symbols • Often ID looks like “magic” - BE BA FE CA

  35. Flag • Bit Flags • Enum values as a flags • Special case – Bool value

  36. Counter • Increment (possibly decrement) • Starts with 0 / begins with another value • Changes by 1 or another value

  37. DateTime • Storing format • Resolution • Moment of beginning • Range

  38. Protect • CRC value of data • CRC value of substructure • Various Hash functions and etc.

  39. Related works

  40. Projects, articles and authors • Protocol Informatics Project – based on “Network Protocol Analysis using Bioinformatics Algorithms” by Marshall A. Beddoe • Discoverer - “Discoverer: Automatic Protocol Reverse Engineering from Network Traces” by Weidong Cui, Jayanthkumar Kannan, Helen J. Wang • Laika – “Digging For Data Structures” by Anthony Cozzie, Frank Stratton, Hui Xue, and Samuel T. King • RolePlayer – “Protocol-Independent Adaptive Replay of Application Dialog” by Weidong Cuiy, Vern Paxsonz, Nicholas C. Weaverz, Randy H. Katzy • Tupni – “Tupni: Automatic Reverse Engineering of Input Formats” byWeidong Cui, Marcus Peinado, Karl Chen, Helen J. Wang, Luiz Irun-Briz

  41. Protocol Informatics Project • Global and local sequence alignment - Needleman Wunsch and Smith Waterman algorithms

  42. Discoverer

  43. Laika • Using Bayesian unsupervised learning • For fixed size structures only

  44. Tupni • Based on taint tracking engine

  45. Applications • RE any program • RE undocumented/proprietary file formats • RE undocumented/proprietary network protocols • RE undocumented structures in memory • Fuzzing • Examination of protocol implementation • Replay network interaction • Zero-day vulnerability signature generation

  46. Thank you! • Questions? • Meet me at researcher@inbox.ru, • dorfmananton@gmail.com

More Related