Hyperion :High Volume Stream Archival

Hyperion :High Volume Stream Archival Divya Muthukumaran

Area • Network Monitoring • Identify problems due to overloaded and/or crashed servers, network connections or other devices • Example: To determine the status of a webserver, monitoring software may periodically send an HTTP request to fetch a page

Live Monitoring • Packets are examined in real time • Compute and continually update traffic statistics • Discard the captured packet headers once examined • Why the need to store packet headers?

Live Monitoring • Packets are examined in real time • Compute and continually update traffic statistics • Discard the captured packet headers once examined • Why the need to store packet headers? • Example: Network forensics • To go back and examine the root cause of a problem • Ex: See how an intruder gained entry, How a worm infection happened

What is the need of such a system? Querying and examining live data • Data Archival • Capture the data at wire speeds, Index and store them • Efficiently support retrieval and processing of archived data • Specifically designed to handle needs of high volume stream archival

Why not traditional databases? • Some statistics • A single GB link can generate over 100,000 packets and tens of MBs of archival data. • A monitor may record from Multiple links.

Design Principles • Support Queries not reads • Implies the need to maintain indexes • Writes • Sequential and Immutable • Archive locally , summarize globally • Scalability Vs Need to avoid flooding • Scalability: Favors local archiving and indexing to avoid network writes • Need to answer Distributed queries: favors sharing information across nodes

Hyperion Three Key components • Stream File System • High volume archiving and querying • Multi-level index structure • High update rates + reasonable lookup performance • Distributed index layer • Distributes a summary of local indices to enable distributed querying

Design choices for the Hyperion Storage System • Storage of multiple high-speed traffic streams without loss • Support for concurrent read activity without loss of write performance • Re-use of storage in a buffer-like fashion

Stream File System • Stores Streams as opposed to files • Characteristics • Recycled : When storage is full new data replaces old data. • In a GP File system new data is lost old is retained • Immutable • Record-oriented: data is written in fixed or variable length records

Can we use a GP FS? • Need to map streams <=>files

LogFile Rotation

Stream FS

Stream FS Organization • Los-structured FS • What problem? • Cleaning/Garbage collection • StreamFS solves the cleaning problem • Guarantee : Storage guarantee for each stream • Small segment size • Check if next segment is a surplus . If yes then overwrite , otherwise skip.

Stream FS Organization • Los-structured FS • What problem? • Cleaning/Garbage collection • StreamFS solves the cleaning problem • Guarantee : Storage guarantee for each stream • Small segment size (1 or ½ MB) • Check if next segment is a surplus . If yes then overwrite , otherwise skip. • Advantages? • Storage Reservation • Best effort use of remaining storage

Reads • First get index • Use index to get data • Persistent Handles • Returned from each write operation • Passed to read op to retrieve data • What does the handle contain? • Disk location , approximate length • Allows data to be retrieved directly

Handle issues • Validate the handle. How? • Self certifying record header • Id of the stream • Permissions of the stream • Record length • Hash (used for validating the handle)

Stream FS Organization • Record • Variable length • On-disk record + header • Block • Fixed length • Multiple records of the same stream • Block Map • Every nth block • (stream ID + in-stream sequence number for each of the preceding n-1 blocks) • Used for easy write allocation

Stream FS Organization

Indexing • Uses signature based Indices • Signature for each segment • Can check if a record with a key k is present in the segment or not • Does not tell you where the record is present in the segment

Multi-level Indices

Multi Level Indices • Uses a Bloom Filter • Hash (key) -> b bits • In b bits k bits are set to 1 • H(key1)||H(key2)…||H(keyn) = Hs (Signature) • How to check for presence of a record? • Compute hash of its key kr, H(kr) • If a bit in H(kr) is set but not set in Hs then the value is not present • False positives

Distributed Index • How to handle distributed queries without flooding? • Maintain distributed index • Integrated view of all nodes • Coarse-grain summary of data at each node is needed • Can use the top level index in the Hyperion • One index node per time interval • All nodes send their top-level indices to this node • Temporally–distributed index

Hyperion :High Volume Stream Archival

Hyperion :High Volume Stream Archival

Presentation Transcript

Hyperion Essbase

High Volume Master Stream Appliance

High Volume Applications

Oracle Hyperion Planning | HYPERION PLANNING Online Training

Hyperion Planning

High-Volume FPGAs

High Volume Data Processing

Hyperion: High Volume Stream Archival for Restrospective Querying

High Volume, High Value, High Margin Opal Ethernet

Hyperion Planning

High-Volume Server (HVS)

Integration of Hyperion Application with Hyperion Hub

High Volume Ash Concrete

Moderate to High Volume

Hyperion :High Volume Stream Archival

High-Volume FPGAs

Hyperion Planning