1 / 24

Models for scientific exploitation of EO data * ESRIN * 12.10.2012

Calvalus Full Mission EO CAL/VAL, Processing and Exploitation Services Norman Fomferra , Martin Boettcher , Marco Zuehlke , Carsten Brockmann Brockmann Consult GmbH. Models for scientific exploitation of EO data * ESRIN * 12.10.2012. Calvalus

della
Download Presentation

Models for scientific exploitation of EO data * ESRIN * 12.10.2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CalvalusFull Mission EO CAL/VAL, Processing andExploitation ServicesNorman Fomferra, Martin Boettcher, Marco Zuehlke, Carsten Brockmann BrockmannConsult GmbH Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  2. Calvalus Full mission EO cal/val processing and exploitation services Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  3. Outline • Objectives and achievements • Apache Hadoop in five slides • Calvalus = Hadoop for EO • Calvalus bulk processing • Jeffrey Dean and • Sanjay Ghemawat, • Google, 2004: • “MapReduce: Simplified • Data Processing on • Large Clusters” * * Sixth Symposium on Operating System Design and Implementation; San Francisco, CA, 2004 Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  4. There was a dream … • exploit easily full mission EO archives • have a powerful and affordable multi-mission processing infrastructure • generate products using full mission datasets, with new algorithms and algorithm versions • aggregate results in temporal and spatial dimension • test new ideas in a rapid prototyping approach • have a tool to perform calibration and validation on full mission archives as the basis for reliable scientific conclusions • Robust production Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  5. Calvalus for Land Cover CCI Pre-processing Generation of 7-day composites of surface reflectance from full mission MERIS FRS and RR for CCI Land Cover is a data and computing intensive automated job that runs for 3 months on a 72 nodes Calvalus/Hadoopcluster Quicklook generation for full mission MERIS FRS and RR reads and processes 150 TB input data in 10 hours. This is about 50 Gbit/s. Other full mission processes are between thesetwotimes. Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  6. Projects using Calvalus • ESA CoastColour: 6 years MERIS FR, 27 regions • ESA Land Cover CCI: pre-processing, full mission weekly L3 from MERIS and SPOT VGT • ESA Ocean Colour CCI: algorithm improvement cycle, MODIS, SeaWiFS, MERIS • GlobVeg: global FAPAR and LAI from MERIS • Prevue: MERIS full mission subset extraction • Fronts: MERIS detection of fronts • Diversity II: bio-diversity of lakes and drylands Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  7. Hadoop = HDFS + jobs/tasks + MapReduce Network data archive Direct, data-local processing Archive-centric approach • Network storage • data are transferred on the network • risk of network bottleneck Compute cluster Hadoop approach • data-local processing • tasks are transferred on the network • good scalability Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  8. Cluster hardware and network test server test 1 • standard hardware • Calvalus additions for I/O and development test 1 test 1 vm1 node 1 local disk node 2 local disk master node 3 local disk node 4 local disk feeder ... node n local disk externaldata source or destination Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  9. Hadoop Distributed File System • distributed file system HDFSon local disks of compute nodes • transparent, optimised data-localaccess • data replication • automated recovery • continued service Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  10. Hadoop Job Scheduling Task Task Job Task • flexible granularity of inputs defined by split functions (for EO: one file – one split) • massive parallel processing, task pull • takes failure into account, automated re-attempt, optional speculative execution • job queues, priorities, fair sharing among projects Task Task data-local processing Input split Input split Input set Input split Input split Input split 500 .... 50000 Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  11. Parallel aggregation with MapReduce • data-local access of inputs • a well-selected sorting and partitioning function • generation of the output in parts that can be simply concatenated Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  12. Calvalus = Hadoop for Earth Observation Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  13. L2 Processor (Mapper Task) L1 File L2 File L2 Processor (Mapper Task) L1 File L2 File L2 Processor (Mapper Task) L1 File L2 File L2 Processor (Mapper Task) L1 File L2 File L2 Processor (Mapper Task) L1 File L2 File L2 Bulk Processing Realisation • MERIS RR L1, North Sea, 3 days • CoastColour NN L2 processor • 6 minutes (22 nodes) • output: L2 files Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  14. L2 Proc. & Matcher(Mapper Task) L1 File OutpRecs L2 Proc. & Matcher(Mapper Task) L1 File OutpRecs MA Output Gen. (Reducer Task) L2 Proc. & Matcher(Mapper Task) L1 File OutpRecs L2 Proc. & Matcher(Mapper Task) L1 File OutpRecs L2 Proc. & Matcher(Mapper Task) L1 File OutpRecs MA Report Inp Recs Match-up Analysis Realisation • MERIS RR L1, global, 3 months • CoastColourC2W processor • NOMAD in-situ dataset • 6 minutes (22 nodes) • Scatter-plots and pixel extraction Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  15. L2 Proc. & Spat. Binning(Mapper Task) L1 File Spa.Bins L2 Proc. & Spat. Binning(Mapper Task) Spat.Bins L1 File L3 Temp. Binning (Reducer Task) L2 Proc. & Spat. Binning(Mapper Task) Temp.Bins L3 Formatting (Staging) Spat.Bins L1 File L2 Proc. & Spat. Binning(Mapper Task) L3 Temp. Binning (Reducer Task) Temp.Bins L1 File Spat.Bins L2 Proc. & Spat. Binning(Mapper Task) L1 File Spat.Bins L3 File(s) L2/L3 Processing Realisation • MERIS RR L1, global, 10-day • CoastColourC2W processor • 1.5 hours (22 nodes) • 1 L3 product Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  16. L2 Proc. & Spat. Binning(Mapper Task) L1 File Spat.Bins L2 Proc. & Spat. Binning(Mapper Task) Spat.Bins L1 File L3 Temp. Binning L2 Proc. & Spat. Binning(Mapper Task) Temp.Bins Spat.Bins L1 File L2 Proc. & Spat. Binning(Mapper Task) L3 Temp. Binning (Reducer Task) Temp.Bins L1 File Spat.Bins L2 Proc. & Spat. Binning(Mapper Task) L1 File Spat.Bins TA Formatting (Staging) L2 Proc. & Spat. Binning(Mapper Task) L1 File Spat.Bins L2 Proc. & Spat. Binning(Mapper Task) Spat.Bins L1 File L3 Temp. Binning L2 Proc. & Spat. Binning(Mapper Task) Temp.Bins Spat.Bins L1 File L2 Proc. & Spat. Binning(Maper Task) L3 Temp. Binning (Reducer Task) Temp.Bins L1 File Spat.Bins L2 Proc. & Spat. Binning(Mapper Task) L1 File Spat.Bins TA Report Trend Analysis Realisation • MERIS RR L1, South Pacific Gyre, 2002-2010, first 4 days of a month • CoastColour C2W processor • 30 minutes (22 nodes) • Time-series plots and data Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  17. Processor integration • Adapter for Unix executables (C++, Fortran, Python, ...) • Adapter for BEAM GPF operators • Concurrent processor versions in the system • Automated deployment of processor bundles at runtime Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  18. Calvalus + BEAM for data streaming Supported by BEAM Graph Processing Framework • Access to data via reader/writer objects instead of files • Operator chaining to build processors from modules • Tile cache and pull principle for in-memory processing • HadoopMapReduce for partitioning and streaming Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  19. Quality check in bulk processing workflows feed back ORB ATT error report 700 inputs with issues identified in MERIS L1B GET ASSE GET ASSE FRS/RR L1B AMORGOS FRG/RRG L1B L2 proc. SDR L3 proc. 7 daySR compo autom. QC inven tory autom. QC inven tory QL gen black list black list QL gen 1 day QL visual QC visual QC SR QL Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  20. Bulk production control for full mission reprocessing processor versions, ... processing workflow two months at a time years, increasing parameters constraints resources sequencing • Processing Monitor • Request Queue • Workflow engine • Resource management concurrent processing steps start bulk production progress observation status report Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  21. Jobs and tasks to be managed Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  22. Calvalus portal for on-demand processing • input set selection • processor versions • processing parameters • in-situ data for matchup analysis • variables for aggregation • trend analysis Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  23. Summary • Calvalus is a multi-mission full mission data processing system for bulk (re)processing, data analysis and algorithm validation • Calvalus is based on the open source middleware Apache Hadoop and implements massive parallel data-local processing • Calvalus integrates processors of the BEAM GPF processing framework and Unix executables in any programming language • Calvalus is successfully in used by various projects and will be further developed • Acknowledgement: The initial Calvalus idea was developed and its realisation was funded by the European Space Agency under the SME-LET programme. Models forscientificexploitationof EO data * ESRIN * 12.10.2012

  24. Reflection points • The adequate hardware infrastructure for Hadoop is different from the current trend of virtualisation and network storage (transparency vs. knowledge of data location). • Adapted optimised solutions may have a shorter life cycle than generic, standardised ones (processor interfaces that support data streaming vs. file interface) • Historical missions (ENVISAT) are not the problem. Are we prepared for Sentinel data? Models forscientificexploitationof EO data * ESRIN * 12.10.2012

More Related