100 likes | 109 Views
This article discusses the challenges of managing large quantities of data globally and proposes a distributed architecture for efficient data management and search. The architecture is designed to be scalable and robust, allowing for the seamless flow of data across different locations.
E N D
A Distributed Data Architecture Mark Jessop University of York
Grid Enabled Swans London Tokyo Cape Town Mexico City
How Big is that Lake? • Heathrow capped at 36 landings per hour. • If half have 4 engines and half have 2, average aircraft carries 3 engines. • Each engine generates around 1GB of data per flight. • 36 x 3 x 1 = 108GB raw engine data per hour. • Factor in the working day and the rest of the world… • …Terabytes and up!
London Tokyo Cape Town Mexico City Managing the Flow of Water
Plumbing Toolkit • Data Repository • Catalogue • Pattern Match Engine
Pattern Match Engine • Pattern Match Control • Data Extractor/Encoder • AURA Encoder • AURA-G • Back Check
DATA DATA DATA DATA DATA DATA MCAT MCAT MCAT MCAT MCAT MCAT MCAT MCAT DATA DATA DATA Data Repository • SDSC Storage Request Broker. • Manages distributed storage resources. • Meta Data Catalogue. • Many configurations. • Heterogeneous. • Efficient data delivery. • C++ and Java APIs.
MCAT A Distributed Architecture • One node per airport. • Single global MCAT. • Stream engine data. • Global Parallel Search. • Present Results. • Scalable. • Robust.
Summary • Large quantities of data arriving globally. • Distributed architecture for data management and search. • Scalable and Robust.