110 likes | 292 Views
Requirements for EO Data Processing Farms. ESA Workshop “Models for Scientific Exploitation of EO Data” ESA ESRIN, Oct 11-12, 2012 Stephan Kiemle German Remote Sensing Data Center DFD German Aerospace Center DLR. Evolution of EO PGS Processing Facilities. Dedicated Facility
E N D
Requirements for EO Data Processing Farms ESA Workshop “Models for Scientific Exploitation of EO Data” ESA ESRIN, Oct 11-12, 2012 Stephan Kiemle German Remote Sensing Data Center DFD German Aerospace Center DLR
> Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Evolution of EO PGS Processing Facilities Dedicated Facility • single mission • dedicated hardware • tight coupling, static scheduling • predictable performance • expensive investment, housing, operating • no flexibility e.g. 1st generation ENVISAT PAF Shared Facility • multi-mission • shared hardware • static deployment, dynamic scheduling • controlled performance • usage rate reduces costs • growth and renewal still difficult e.g. ESA MMFI Virtualized Facility
> Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Virtualized Processing Facilities Host Processor Control • multi-purpose • independent hardware • dynamic deployment and scheduling • dynamic performance • initial + continuous renewal investment, pay per use • scaling with low impact on applications Sounds good! But … Infrastructure as a Service Platform as a Service Software as a Service = #request = #CPU + #Mbit/s = #VCPU + #MB/day VM Processor Processor Control Processor Control Control
> Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Scientific Exploitation Use Case: Reprocess 1..100 Tbyte of Archived EO Data • Input data usually in centralized archives • Parallel processing often possible on per-product basis Parameters for distributed processing complexity: • n Number of products • s Avg product size • r Ratio output size/input size • t Time for processing one product • b Bandwidth of weakest part in network • p Number of processing node Usage of processing nodes is limited by bandwidth: Total time ≥ Processor Processor Processor b in out in in in in in in in in in processing processing processing processing processing processing processing processing processing out out out out out out out out out p
> Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Example – Large Scale Reprocessing • joint analysis of processing and data management required • execute processing algorithms where the data is • cross-distribute data archiving Processing nodes Total time ≥
> Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Example – Small Scale Analysis • processing complexity versus data volume determines distribution Processing nodes Total time ≥
> Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Requirements for EO Data Processing Farms • Processing performance versus i/o rate • Dynamically balance distributed processing taking into account • number of CPUs, RAM, disk cache allocated • other local resources (e.g. embedded DBs, log files) • actual transfer rates for inputs, auxiliary data, outputs • Coordination • Define procedures and guidelines for use • Reconcile conflicts between projects • Accounting • Monitoring and control • Privacy/security/availability • Clear separation of production environment and other “scientific” environments
> Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Consequences for Processing and Data Management • Individual analysis for best system approach (local, farm, private cloud, …) • data rates, processing level/complexity • project characteristics, processing strategies • Algorithms encapsulated in deployable processors/processing systems • Data processors shall dynamically use CPUs, RAM, disk cache as allocated • Establish/extend standards for algorithm integration and processor deployment • Bulk product transfer capabilities, pipelining/streaming for input data set provision and output data set repatriation • Evolve archives to data lifecycle centers • layered data sets for tailored access performance • defined consolidation/migration capacities (LTDP context) • new primary data access interfaces: geodata, time series
> Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 “GeoFarm” for Scientific EO Data Exploitation at DLR Oberpfaffenhofen • 2 Blade Centers (Dell), total 672 cores Opteron, 3.3 TB RAM, interconnected with 10Gb/s Ethernet • 288 TB SAN storage, connected with 4 GB/s Fiber-Channel • Virtualized using Citrix XenServer 6 (advanced edition) • Separated pools for production network/normal infrastructure • Usage examples: • Project scope: ENVISAT/MERIS data reprocessing for CCI Fire using CATENA • Continuous operational: O3M-SAF NRT, offline and re-processing • Ongoing definitions: • use scenario and application procedure • monitoring • accounting, cost calculation and sharing
> Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Conclusions • Evolution of processing facilities towards virtualization • Different scientific EO exploitation use cases require different distributed computation models, depending on input data size, processing complexity/strategy and network bandwidth • Requirements in context of EO data processing farms: • processors need to become deployable in standard environment and dynamically use allocated resources • bulk input data provision using elaborated data management principles and technologies • DLR operates a virtualized EO processing infrastructure “GeoFarm”
> Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Thank you!Questions? Stephan Kiemle Deutsches Zentrum für Luft- und Raumfahrt e.V. (DLR) German Aerospace Center Earth Observation Center | German Remote Sensing Data Center Oberpfaffenhofen 82234 Wessling | Germany Stephan.Kiemle@dlr.de www.DLR.de/eoc