1 / 24

Hyperion :High Volume Stream Archival

Hyperion :High Volume Stream Archival. Divya Muthukumaran. Area. Network Monitoring Identify problems due to overloaded and/or crashed servers, network connections or other devices

Download Presentation

Hyperion :High Volume Stream Archival

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hyperion :High Volume Stream Archival Divya Muthukumaran

  2. Area • Network Monitoring • Identify problems due to overloaded and/or crashed servers, network connections or other devices • Example: To determine the status of a webserver, monitoring software may periodically send an HTTP request to fetch a page

  3. Live Monitoring • Packets are examined in real time • Compute and continually update traffic statistics • Discard the captured packet headers once examined • Why the need to store packet headers?

  4. Live Monitoring • Packets are examined in real time • Compute and continually update traffic statistics • Discard the captured packet headers once examined • Why the need to store packet headers? • Example: Network forensics • To go back and examine the root cause of a problem • Ex: See how an intruder gained entry, How a worm infection happened

  5. What is the need of such a system? Querying and examining live data • Data Archival • Capture the data at wire speeds, Index and store them • Efficiently support retrieval and processing of archived data • Specifically designed to handle needs of high volume stream archival

  6. Why not traditional databases? • Some statistics • A single GB link can generate over 100,000 packets and tens of MBs of archival data. • A monitor may record from Multiple links.

  7. Design Principles • Support Queries not reads • Implies the need to maintain indexes • Writes • Sequential and Immutable • Archive locally , summarize globally • Scalability Vs Need to avoid flooding • Scalability: Favors local archiving and indexing to avoid network writes • Need to answer Distributed queries: favors sharing information across nodes

  8. Hyperion Three Key components • Stream File System • High volume archiving and querying • Multi-level index structure • High update rates + reasonable lookup performance • Distributed index layer • Distributes a summary of local indices to enable distributed querying

  9. Design choices for the Hyperion Storage System • Storage of multiple high-speed traffic streams without loss • Support for concurrent read activity without loss of write performance • Re-use of storage in a buffer-like fashion

  10. Stream File System • Stores Streams as opposed to files • Characteristics • Recycled : When storage is full new data replaces old data. • In a GP File system new data is lost old is retained • Immutable • Record-oriented: data is written in fixed or variable length records

  11. Can we use a GP FS? • Need to map streams <=>files

  12. LogFile Rotation

  13. Stream FS

  14. Stream FS Organization • Los-structured FS • What problem? • Cleaning/Garbage collection • StreamFS solves the cleaning problem • Guarantee : Storage guarantee for each stream • Small segment size • Check if next segment is a surplus . If yes then overwrite , otherwise skip.

  15. Stream FS Organization • Los-structured FS • What problem? • Cleaning/Garbage collection • StreamFS solves the cleaning problem • Guarantee : Storage guarantee for each stream • Small segment size (1 or ½ MB) • Check if next segment is a surplus . If yes then overwrite , otherwise skip. • Advantages? • Storage Reservation • Best effort use of remaining storage

  16. Reads • First get index • Use index to get data • Persistent Handles • Returned from each write operation • Passed to read op to retrieve data • What does the handle contain? • Disk location , approximate length • Allows data to be retrieved directly

  17. Handle issues • Validate the handle. How? • Self certifying record header • Id of the stream • Permissions of the stream • Record length • Hash (used for validating the handle)

  18. Stream FS Organization • Record • Variable length • On-disk record + header • Block • Fixed length • Multiple records of the same stream • Block Map • Every nth block • (stream ID + in-stream sequence number for each of the preceding n-1 blocks) • Used for easy write allocation

  19. Stream FS Organization

  20. Indexing • Uses signature based Indices • Signature for each segment • Can check if a record with a key k is present in the segment or not • Does not tell you where the record is present in the segment

  21. Multi-level Indices

  22. Multi Level Indices • Uses a Bloom Filter • Hash (key) -> b bits • In b bits k bits are set to 1 • H(key1)||H(key2)…||H(keyn) = Hs (Signature) • How to check for presence of a record? • Compute hash of its key kr, H(kr) • If a bit in H(kr) is set but not set in Hs then the value is not present • False positives

  23. Distributed Index • How to handle distributed queries without flooding? • Maintain distributed index • Integrated view of all nodes • Coarse-grain summary of data at each node is needed • Can use the top level index in the Hyperion • One index node per time interval • All nodes send their top-level indices to this node • Temporally–distributed index

More Related