190 likes | 297 Views
Rosette Vandenbroucke HPC Coordinator r osette.vandenbroucke@vub.ac.be Middleware and Managing Data and Knowledge in a Data-rich World. ASPIRE Data Panel. Gill Davies – Online music performances Antonella Fresa - DCH Jens Jensen - HEP Andrew Lyall – Biomed Roshene McCool - Astronomy
E N D
Rosette VandenbrouckeHPC Coordinator rosette.vandenbroucke@vub.ac.be Middleware and Managing Data and Knowledge in a Data-rich World
ASPIRE Data Panel Gill Davies – Online music performances Antonella Fresa - DCH Jens Jensen - HEP Andrew Lyall – Biomed Roshene McCool - Astronomy Rosette Vandenbroucke
Work method Per discipline: List data creation/handling and associated requirements now and in the next 10 years Select aspects that are important for the represented disciplines Describe important future data and data handling expectations and common requirements Formulate recommendations
Aspects and type of data not covered Many more data aspects exist Not possible to handle them all Other scientific disciplines Twitter and blog data Social sites data Logs of mobile phone use ...
Data aspects considered • Networking Bandwidth requirements, storage, mirrors, preservation, disaster recovery, costs • Middleware • Meta data • AAI • Data policies availability, replication • Data origin authentication of source, integrity
NetworkingBandwidth (1) 3 models observed: SKA/HEP model Tier structure HG-DCH model data transfer between large centers/depositories very large number of “small” users Musical Performance model small amount of data network latency important
NetworkingBandwidth (2) • Shared general concern Network links below required bandwidth - too expensive - network link not available where needed - no permission to connect to the national research network Cost issues: - bandwidth now available for free may incur tariffs in the future - very high bandwidth and/or dedicated lightpaths requirements can lead to high costs - some regions/countries have more expensive connections - Last mile
NetworkingStorage, mirrors, preservation, Disaster recovery Not all data can be stored or preserved Preservation schemes in study Replication of data sometimes inherent in the data structure Disaster recovery: not often explicitly addressed
Middleware Middleware very much discipline specific. Expectation for generic solutions
Metadata Very important Used by all Many standards exist ! Definition and usage per discipline No consideration for cross-disciplinary use
AAI Everyone agrees about the need for a globally accepted AAI system No consensus on how to do e-IRG has made recommendations for such an AAI system Federations of authentication and eduGAIN are an excellent move in that direction
Data Policies • Availability of data • Policies on data access discipline specific • General tendency to move to “open data” • “open data” cannot always be done, due to • the costs of generating the data • The costs of storage and curation • data confidentiality
Data origin • Integrity and source authentication are important • No general mechanism for data-source authentication • Metadata can help • In some disciplines data is only relevant to experts, so considered as quite safe • Authentication by a unique digital signature at creation • Source authentication can add costs
DATA GROWING in every discipline putting higher requirements on all aspects we have looked at
Recommendation 1Network related • Collaboration between user communities and NRENs, GÉANT, ... to understand network requirements associated with the data deluge • Adequate network services made available timely and economically viable • All important network parameters have to be studied (speed, throughput, privacy, persistence of connection, cost, ...)
Recommendation 2standardisation of datasets and metadata • Define standardised data sets: • To profit from economy of scale fro cross-discipline middleware • Define standardised data sets, metadata, middleware and applications • For easier accessibility of data • Adopt a common metadata standard that takes into account multi-disciplinary use of data
Recommendation 3AAI Adopt a globally recognised AAI based on standards for the exchange of assertions and security tokens that can be used by all (user communities, e-infrastructure providers, ICT providers, ...)
Recommendation 4Data origin Create common mechanisms and procedures for all disciplines to certify and authenticate data.
Recommendation 5preservation, curation Facilitate collaboration between disciplines to create common policies, procedures and tools to assist in the curation and preservation of data.