1 / 11

Requirements for EO Data Processing Farms

Requirements for EO Data Processing Farms. ESA Workshop “Models for Scientific Exploitation of EO Data” ESA ESRIN, Oct 11-12, 2012 Stephan Kiemle German Remote Sensing Data Center DFD German Aerospace Center DLR. Evolution of EO PGS Processing Facilities. Dedicated Facility

aquarius
Download Presentation

Requirements for EO Data Processing Farms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Requirements for EO Data Processing Farms ESA Workshop “Models for Scientific Exploitation of EO Data” ESA ESRIN, Oct 11-12, 2012 Stephan Kiemle German Remote Sensing Data Center DFD German Aerospace Center DLR

  2. > Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Evolution of EO PGS Processing Facilities Dedicated Facility • single mission • dedicated hardware • tight coupling, static scheduling • predictable performance • expensive investment, housing, operating • no flexibility e.g. 1st generation ENVISAT PAF Shared Facility • multi-mission • shared hardware • static deployment, dynamic scheduling • controlled performance • usage rate reduces costs • growth and renewal still difficult e.g. ESA MMFI Virtualized Facility

  3. > Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Virtualized Processing Facilities Host Processor Control • multi-purpose • independent hardware • dynamic deployment and scheduling • dynamic performance • initial + continuous renewal investment, pay per use • scaling with low impact on applications Sounds good! But … Infrastructure as a Service Platform as a Service Software as a Service = #request = #CPU + #Mbit/s = #VCPU + #MB/day VM Processor Processor Control Processor Control Control

  4. > Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Scientific Exploitation Use Case: Reprocess 1..100 Tbyte of Archived EO Data • Input data usually in centralized archives • Parallel processing often possible on per-product basis Parameters for distributed processing complexity: • n Number of products • s Avg product size • r Ratio output size/input size • t Time for processing one product • b Bandwidth of weakest part in network • p Number of processing node Usage of processing nodes is limited by bandwidth: Total time ≥ Processor Processor Processor b in out in in in in in in in in in processing processing processing processing processing processing processing processing processing out out out out out out out out out p

  5. > Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Example – Large Scale Reprocessing • joint analysis of processing and data management required • execute processing algorithms where the data is • cross-distribute data archiving Processing nodes Total time ≥

  6. > Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Example – Small Scale Analysis • processing complexity versus data volume determines distribution Processing nodes Total time ≥

  7. > Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Requirements for EO Data Processing Farms • Processing performance versus i/o rate • Dynamically balance distributed processing taking into account • number of CPUs, RAM, disk cache allocated • other local resources (e.g. embedded DBs, log files) • actual transfer rates for inputs, auxiliary data, outputs • Coordination • Define procedures and guidelines for use • Reconcile conflicts between projects • Accounting • Monitoring and control • Privacy/security/availability • Clear separation of production environment and other “scientific” environments

  8. > Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Consequences for Processing and Data Management • Individual analysis for best system approach (local, farm, private cloud, …) • data rates, processing level/complexity • project characteristics, processing strategies • Algorithms encapsulated in deployable processors/processing systems • Data processors shall dynamically use CPUs, RAM, disk cache as allocated • Establish/extend standards for algorithm integration and processor deployment • Bulk product transfer capabilities, pipelining/streaming for input data set provision and output data set repatriation • Evolve archives to data lifecycle centers • layered data sets for tailored access performance • defined consolidation/migration capacities (LTDP context) • new primary data access interfaces: geodata, time series

  9. > Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 “GeoFarm” for Scientific EO Data Exploitation at DLR Oberpfaffenhofen • 2 Blade Centers (Dell), total 672 cores Opteron, 3.3 TB RAM, interconnected with 10Gb/s Ethernet • 288 TB SAN storage, connected with 4 GB/s Fiber-Channel • Virtualized using Citrix XenServer 6 (advanced edition) • Separated pools for production network/normal infrastructure • Usage examples: • Project scope: ENVISAT/MERIS data reprocessing for CCI Fire using CATENA • Continuous operational: O3M-SAF NRT, offline and re-processing • Ongoing definitions: • use scenario and application procedure • monitoring • accounting, cost calculation and sharing

  10. > Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Conclusions • Evolution of processing facilities towards virtualization • Different scientific EO exploitation use cases require different distributed computation models, depending on input data size, processing complexity/strategy and network bandwidth • Requirements in context of EO data processing farms: • processors need to become deployable in standard environment and dynamically use allocated resources • bulk input data provision using elaborated data management principles and technologies • DLR operates a virtualized EO processing infrastructure “GeoFarm”

  11. > Req. for EO Data Processing Farms > Stephan Kiemle • Models for Scientific Exploitation of EO Data > 2012-10-11 Thank you!Questions? Stephan Kiemle Deutsches Zentrum für Luft- und Raumfahrt e.V. (DLR) German Aerospace Center Earth Observation Center | German Remote Sensing Data Center Oberpfaffenhofen 82234 Wessling | Germany Stephan.Kiemle@dlr.de www.DLR.de/eoc

More Related