180 likes | 195 Views
This article discusses the importance of packet parsing and memory management in protocol implementation. Topics include avoiding buffer overflows, memory problems like fragmentation and leaks, memory debugging and profiling, handling low memory conditions, random memory corruptions, various lookup data structures, and algorithmic attacks on data structures.
E N D
More on protocol implementation • Packet parsing • Memory management • Data structures for lookup
Packet parsing • Not too much here. Only a single concern • Do not let the incoming data cause a buffer overflow • This means: be extremely careful what you assume about the input • Make sure that you always have enough buffer for the incoming packets • Always verify the lengths of TLVs or other protocol objects before copying them
Memory • Problems • Fragmentation • Out of memory errors • Leaks • How to catch corruptions
Memory management • A memory manager can be complex • Must be fast • Must minimize fragmentation • Find a best fitting are of memory to allocate for an object so as to minimize fragmentation • Allocation/free patterns • Frequency of alloc/free • Large bursts of alloc • When the process starts • Can optimize by pre-allocating
Slab allocators • Slab allocator • Allocate objects from caches (slabs) I.e. get a memory page and carve it up is smaller objects • Cache: some objects may require expensive initialization • Do it only once • Centralize: so that the overall system needs are known and memory can be managed effectively • But • There is little bit of internal fragmentation • Some data at the end of the slab is left unused • Little bit tricky to handle slabs for very large objects • Not all objects are of known size • May have to allocate variable size memory • Use multiples of fixed sizes and accept some small loss of memory • May hold the whole slab for just few objects
Memory debugging • 0xdeadbeef and 0xbaddcafe • Detect when trying to use freed or unallocated memory • Put a magic word at the end of the allocated area • Catch buffer overruns at free time • Each data object into its own page • Unmap when freed • Any access to it will cause a seg fault • Keep a log of uses • Trace who (thread, function) used this buffer
Memory Leaks • An object based manager will tell us exactly what objects are leaking • Maybe easier to find the problem • Combine with logging to see who did not free • There are multiple tools that can help here • Purify, valgrind and more…
Low memory conditions • Code that handles low memory conditions is usually not well tested • In a real life situation the system may become completely unusable • Manipulate the memory manager to create such conditions and test the system
Random memory corruptions • Be a bit paranoid • Use a hash/checksum in major data structures and occasionally verify it • If something is wrong crash and restart
Lookups • Hashes • Balanced trees • Tries/radix • Attacks • Walks
Hashes • Big problem collisions • Bad hash function • Not well understood pattern • Different patterns may need different hashes • Attacks • Universal hashing
Unbalanced Trees • Very good for random insertion and access patterns • No need for expensive balancing operations • But if things are not random can deteriorate very fast • Very vulnerable to attacks • Can make them look like a linked list
Balanced trees • Keep them balanced • AVL • The height of the two child sub-trees differ by 1 • Red-black • All paths from a node to its leaves must have the same number of black nodes • Lookup times independent of insert pattern • But insert times not • In some inserts I may have to do more work: rotations • Resistant to attacks • Worse that can happen is somebody will force me to do rotations all the time • Lookup performance depends on pattern • Splay tree tries to adapt to the lookup pattern • Recently accessed nodes are faster to access again
Radix/Tries • Store strings but not only • Anything than can be mapped to a string and has a lexicographic order • Does not need to be binary • In most cases it is not • Lookup cost depends on the length of the key and not on the number of nodes in the tree • But log(N) is usually better than length of the key • Good for longest prefix matches • No need for complex balancing • Better cache behavior! • Fewer levels means fewer memory accesses and better caching locality • Patricia trees to save space
It all has to do with the data • Insert pattern • Random enough or pseudo-sequential • Lifetime • Ratio of lookups/insert+deletes • Lookup pattern • Lookup time • Exact? • Prefix? • What is being used: • Linux • A hash table for each prefix-len for the route table • 64-child radix trees for other things • Quagga • Binary Radix tree for routes
Algorithmic Attacks • Make hashing deteriorate to linear search • Balancing data structures are ok • May or may not be feasible • Amount of bits of key • Amount of bits available to the attacker • Can handle by • Making hash function harder to predix • Use a keyed MD5 hash or similar • Universal hashing • A family of hash functions where the probability of collisions is small • Need to be carefully chosen to avoid high implementation cost
Walks • Need to walk a part or all the tree • Parent pointers, threading of various types • They take memory • Simple traversal in-order, pre-order • Need to interrupt and come back • The complete walk may take too long and I will have to do other protocol work • What happens with changes in the meanwhile • Mostly deletions • Locking and delayed deletions
Advanced walks • Should be able to walk a subset only • Ie lookup an element and walk from there • Use the threading pointers • Return where the walk stopped so the walk can be continued later • Walk will handle the locking and the delayed deletion • Can have some self timing code so that it yields after it has run for more than a limit • Or a limit on how many tree nodes to walk before stopping • Something like • Walk_tree(tree, from, until, &next, time_limit, walk_handler); • Walk_handler() processes each node returns CONTINUE or STOP • If walk is interrupted, next will become the from next time we call walk_tree()