180 likes | 194 Views
Data Ingestion in EMSO. Presented by Marco Pappalardo Spacearth Technology Srl , Italy m arco.pappalardo@spacearth.net marco.pappalardo@softwareengineering .it INDIGO SUMMIT on Data Ingestion Catania, 12 th May 2017. RIA-653549. What is EMSO?.
E N D
Data Ingestion in EMSO Presented by Marco Pappalardo Spacearth Technology Srl, Italy marco.pappalardo@spacearth.net marco.pappalardo@softwareengineering.it INDIGO SUMMIT on Data Ingestion Catania, 12thMay 2017 RIA-653549
Whatis EMSO? The European Multidisciplinary Seafloor and water-column Observatory (EMSO) is a large scale, distributed, marine Research Infrastructure (RI) of fixed-point observatories It serves marine science researchers, marine technology engineers, policy makers, and the public. It monitorsnatural hazards, climate change, and marine ecosystems. 11 nodes and 4 test sites Indigo Summit on Data Ingestion – Data Ingestion in EMSO
EMSO Nodes Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Observatorywhat? Indigo Summit on Data Ingestion – Data Ingestion in EMSO
EMSO Generic Instrumentation Module EGIM is a sea-floor observatory. Data acquired by the EGIMs, through an EGIM Sensor Observation Service Gateway, will be dispatched both to the EMSO Regional Data Nodes and to the EMSODEV Data Management Platform. The EMSODEV (EMSO) Data Management Platform will collect, analyze, … and publish data. Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Why EGIM? • Goal: to develop and deploy EGIMs • to measure a specific set of variables suitable for all sites and depths, including: temperature, conductivity (salinity), pressure (depth), turbidity, dissolved oxygen, ocean currents, and passive acoustics • 1st deployment on Dec 2016 @ Vilanova y la Geltrù Indigo Summit on Data Ingestion – Data Ingestion in EMSO
EMSODEV Data Management Platform • The DMP includes a set of common services, compliant to the phases of the computational viewpoint of the ENVRI Reference Model v2.0: • Data acquisition; • Data curation • (including data storage and partitioning, data quality checking and cataloguing services, import/export utilities, query services); • Data publishing • (query preparation, preparation for import/export of curated data); • Data processing services • (real time and/or batch processing computing capabilities); • Data use • (platform authentication and authorization). Indigo Summit on Data Ingestion – Data Ingestion in EMSO
DMP API emsodev-api is a Spring-Boot based RESTfull web service REST API docsavailablewithindeployedappthroughSwagger Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Data gathering from OBSEA SOS SOS server API GetCapabilities EMSODEV DATA MANAGEMENT PLATFORM GetObservation OBSEA data DescribeSensor • Two raw data collectors exist: • A Pull Transfer Flow: • data is retrieved via API exposed by the SOS server available at the OBSEA observatory. • A Push Transfer Flow: • data will be sent to a DMP service which “listens” to near-real time updates on XML files describing sensors data and observations Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Data Acquisition • Real time data access • several standards like OGC Sensor Web Enabled (OGC SWE) • specifying interoperability interfaces and metadata encodings that enable real time integration of heterogeneous sensor webs into the information infrastructure. • SWE specification like Sensor Observations Service (SOS), Sensor Model Language (SensorML), and Observations & Measurements (O&M), will be supported. • Metadata formats • extended Dublin Core format • ISO19139 • … Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Data Ingestion Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Data Curation • Sensors data will be coming from the SOS@EGIM in • asynchronous/batch (PULL) mode • real-time mode (PUSH) • “Push” and “Pull” send (HTTP POST/PUT) formatted data to data store controllers • Distributed File Systems, NoSQL DBs, Time Series DBs, Streaming Store Controllers • Both PUSH and PULL transfer flow save metadata into Metadata and Service Repository. • OneDatawas evaluated as candidate solution to enlarge this set of Data Storage solutions. • Sensor data can be either • Retrived via APIs exposed by an SOS server (Pull Transfer) • Sent to DMP(latform) before being consolidated on the SOS server (Push Transfer Flow) • Two main processes happen during the each transfer flow: • data scraping, extracting parts of marine observ’scoming/retrieved from SOS server; • data munging/wrangling, converting data from a "raw" format into another one that allows data to be more conveniently consumed later Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Data Publishing • The will be equipped with DMP Tools in addition to API • Activate process of importing a dataset from external data sources (EMSO regional nodes); • Querying data curated within the EMSODEV DMP; • Activate the process of defining (e.g. selecting a time range and a measured parameter) and generating a dataset to be exported outside the EMSODEV DMP. • Medium to long-term preservation is ensured by regional EMSO nodes. • Long term archiving will be ensured by national and international certified long-term data archives such as those of the ICSU World Data System (PANGAEA) and the National Oceanographic data centers (NODC). • A common approach for Data Preservation is to be derived. Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Data Use and Reuse • Complex interactions are mediated by virtual laboratories • providing a persistent context for interactions between groups of users and components within DMP. • experimental laboratory: a utility/tool allowing scientists/users to deploy datasets for processing and acquiring results. • All laboratories must interact with a security service (AAI). • Data produced will be available for usage beyond the original purpose • Adopted sensors are often multi-purpose and designed for multiple users and applications. • Selection of certified repositories for long-term preservation/curation in progress • Data to be stored together with the minimum software, metadata and documentation. • EMSO promotes standardization+integration of Regional EMSO Nodes data. • to improve overall accessibility and reusability of local node data via the EMSO Data Portal. Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Demo Indigo Summit on Data Ingestion – Data Ingestion in EMSO
Acknowledgement Daniele Baratta (Swing:It, Software Engineering Italia Srl) MichałOrzechowski (CYFRONET) Daniele Cosenza (Spacearth Technology Srl) Riccardo DelpopoloCarciopolo (Spacearth Technology Srl) Indigo Summit on Data Ingestion – Data Ingestion in EMSO
INDIGO and EUDAT Solutions Currently OneData IAM B2DROP B2SHARE B2FIND In the future EUDAT services to use DMPonline Future Gateways AutomatedIntegrityTests Indigo Summit on Data Ingestion – Data Ingestion in EMSO