1 / 18

Data Ingestion in EMSO

Data Ingestion in EMSO. Presented by Marco Pappalardo Spacearth Technology Srl , Italy m arco.pappalardo@spacearth.net marco.pappalardo@softwareengineering .it INDIGO SUMMIT on Data Ingestion Catania, 12 th May 2017. RIA-653549. What is EMSO?.

aloveday
Download Presentation

Data Ingestion in EMSO

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Ingestion in EMSO Presented by Marco Pappalardo Spacearth Technology Srl, Italy marco.pappalardo@spacearth.net marco.pappalardo@softwareengineering.it INDIGO SUMMIT on Data Ingestion Catania, 12thMay 2017 RIA-653549

  2. Whatis EMSO? The European Multidisciplinary Seafloor and water-column Observatory (EMSO) is a large scale, distributed, marine Research Infrastructure (RI) of fixed-point observatories It serves marine science researchers, marine technology engineers, policy makers, and the public. It monitorsnatural hazards, climate change, and marine ecosystems. 11 nodes and 4 test sites Indigo Summit on Data Ingestion – Data Ingestion in EMSO

  3. EMSO Nodes Indigo Summit on Data Ingestion – Data Ingestion in EMSO

  4. Observatorywhat? Indigo Summit on Data Ingestion – Data Ingestion in EMSO

  5. EMSO Generic Instrumentation Module EGIM is a sea-floor observatory. Data acquired by the EGIMs, through an EGIM Sensor Observation Service Gateway, will be dispatched both to the EMSO Regional Data Nodes and to the EMSODEV Data Management Platform. The EMSODEV (EMSO) Data Management Platform will collect, analyze, … and publish data. Indigo Summit on Data Ingestion – Data Ingestion in EMSO

  6. Why EGIM? • Goal: to develop and deploy EGIMs • to measure a specific set of variables suitable for all sites and depths, including: temperature, conductivity (salinity), pressure (depth), turbidity, dissolved oxygen, ocean currents, and passive acoustics • 1st deployment on Dec 2016 @ Vilanova y la Geltrù Indigo Summit on Data Ingestion – Data Ingestion in EMSO

  7. EMSODEV Data Management Platform • The DMP includes a set of common services, compliant to the phases of the computational viewpoint of the ENVRI Reference Model v2.0: • Data acquisition; • Data curation • (including data storage and partitioning, data quality checking and cataloguing services, import/export utilities, query services); • Data publishing • (query preparation, preparation for import/export of curated data); • Data processing services • (real time and/or batch processing computing capabilities); • Data use • (platform authentication and authorization). Indigo Summit on Data Ingestion – Data Ingestion in EMSO

  8. DMP API emsodev-api is a Spring-Boot based RESTfull web service REST API docsavailablewithindeployedappthroughSwagger Indigo Summit on Data Ingestion – Data Ingestion in EMSO

  9. Data gathering from OBSEA SOS SOS server API GetCapabilities EMSODEV DATA MANAGEMENT PLATFORM GetObservation OBSEA data DescribeSensor • Two raw data collectors exist: • A Pull Transfer Flow: • data is retrieved via API exposed by the SOS server available at the OBSEA observatory. • A Push Transfer Flow: • data will be sent to a DMP service which “listens” to near-real time updates on XML files describing sensors data and observations Indigo Summit on Data Ingestion – Data Ingestion in EMSO

  10. Data Acquisition • Real time data access • several standards like OGC Sensor Web Enabled (OGC SWE) • specifying interoperability interfaces and metadata encodings that enable real time integration of heterogeneous sensor webs into the information infrastructure. • SWE specification like Sensor Observations Service (SOS), Sensor Model Language (SensorML), and Observations & Measurements (O&M), will be supported. • Metadata formats • extended Dublin Core format • ISO19139 • … Indigo Summit on Data Ingestion – Data Ingestion in EMSO

  11. Data Ingestion Indigo Summit on Data Ingestion – Data Ingestion in EMSO

  12. Data Curation • Sensors data will be coming from the SOS@EGIM in • asynchronous/batch (PULL) mode • real-time mode (PUSH) • “Push” and “Pull” send (HTTP POST/PUT) formatted data to data store controllers • Distributed File Systems, NoSQL DBs, Time Series DBs, Streaming Store Controllers • Both PUSH and PULL transfer flow save metadata into Metadata and Service Repository. • OneDatawas evaluated as candidate solution to enlarge this set of Data Storage solutions. • Sensor data can be either • Retrived via APIs exposed by an SOS server (Pull Transfer) • Sent to DMP(latform) before being consolidated on the SOS server (Push Transfer Flow) • Two main processes happen during the each transfer flow: • data scraping, extracting parts of marine observ’scoming/retrieved from SOS server; • data munging/wrangling, converting data from a "raw" format into another one that allows data to be more conveniently consumed later Indigo Summit on Data Ingestion – Data Ingestion in EMSO

  13. Data Publishing • The will be equipped with DMP Tools in addition to API • Activate process of importing a dataset from external data sources (EMSO regional nodes); • Querying data curated within the EMSODEV DMP; • Activate the process of defining (e.g. selecting a time range and a measured parameter) and generating a dataset to be exported outside the EMSODEV DMP. • Medium to long-term preservation is ensured by regional EMSO nodes. • Long term archiving will be ensured by national and international certified long-term data archives such as those of the ICSU World Data System (PANGAEA) and the National Oceanographic data centers (NODC). • A common approach for Data Preservation is to be derived. Indigo Summit on Data Ingestion – Data Ingestion in EMSO

  14. Data Use and Reuse • Complex interactions are mediated by virtual laboratories • providing a persistent context for interactions between groups of users and components within DMP. • experimental laboratory: a utility/tool allowing scientists/users to deploy datasets for processing and acquiring results. • All laboratories must interact with a security service (AAI). • Data produced will be available for usage beyond the original purpose • Adopted sensors are often multi-purpose and designed for multiple users and applications. • Selection of certified repositories for long-term preservation/curation in progress • Data to be stored together with the minimum software, metadata and documentation. • EMSO promotes standardization+integration of Regional EMSO Nodes data. • to improve overall accessibility and reusability of local node data via the EMSO Data Portal. Indigo Summit on Data Ingestion – Data Ingestion in EMSO

  15. Demo Indigo Summit on Data Ingestion – Data Ingestion in EMSO

  16. Acknowledgement Daniele Baratta (Swing:It, Software Engineering Italia Srl) MichałOrzechowski (CYFRONET) Daniele Cosenza (Spacearth Technology Srl) Riccardo DelpopoloCarciopolo (Spacearth Technology Srl) Indigo Summit on Data Ingestion – Data Ingestion in EMSO

  17. Thankyou for watching

  18. INDIGO and EUDAT Solutions Currently OneData IAM B2DROP B2SHARE B2FIND In the future EUDAT services to use DMPonline Future Gateways AutomatedIntegrityTests Indigo Summit on Data Ingestion – Data Ingestion in EMSO

More Related