1 / 19

Rosette Vandenbroucke HPC Coordinator r osette.vandenbroucke@vub.ac.be

Rosette Vandenbroucke HPC Coordinator r osette.vandenbroucke@vub.ac.be Middleware and Managing Data and Knowledge in a Data-rich World. ASPIRE Data Panel. Gill Davies – Online music performances Antonella Fresa - DCH Jens Jensen - HEP Andrew Lyall – Biomed Roshene McCool - Astronomy

oren
Download Presentation

Rosette Vandenbroucke HPC Coordinator r osette.vandenbroucke@vub.ac.be

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rosette VandenbrouckeHPC Coordinator rosette.vandenbroucke@vub.ac.be Middleware and Managing Data and Knowledge in a Data-rich World

  2. ASPIRE Data Panel Gill Davies – Online music performances Antonella Fresa - DCH Jens Jensen - HEP Andrew Lyall – Biomed Roshene McCool - Astronomy Rosette Vandenbroucke

  3. Work method Per discipline: List data creation/handling and associated requirements now and in the next 10 years Select aspects that are important for the represented disciplines Describe important future data and data handling expectations and common requirements Formulate recommendations

  4. Aspects and type of data not covered Many more data aspects exist Not possible to handle them all Other scientific disciplines Twitter and blog data Social sites data Logs of mobile phone use ...

  5. Data aspects considered • Networking Bandwidth requirements, storage, mirrors, preservation, disaster recovery, costs • Middleware • Meta data • AAI • Data policies availability, replication • Data origin authentication of source, integrity

  6. NetworkingBandwidth (1) 3 models observed: SKA/HEP model Tier structure HG-DCH model data transfer between large centers/depositories very large number of “small” users Musical Performance model small amount of data network latency important

  7. NetworkingBandwidth (2) • Shared general concern Network links below required bandwidth - too expensive - network link not available where needed - no permission to connect to the national research network Cost issues: - bandwidth now available for free may incur tariffs in the future - very high bandwidth and/or dedicated lightpaths requirements can lead to high costs - some regions/countries have more expensive connections - Last mile

  8. NetworkingStorage, mirrors, preservation, Disaster recovery Not all data can be stored or preserved Preservation schemes in study Replication of data sometimes inherent in the data structure Disaster recovery: not often explicitly addressed

  9. Middleware Middleware very much discipline specific. Expectation for generic solutions

  10. Metadata Very important Used by all Many standards exist ! Definition and usage per discipline No consideration for cross-disciplinary use

  11. AAI Everyone agrees about the need for a globally accepted AAI system No consensus on how to do e-IRG has made recommendations for such an AAI system Federations of authentication and eduGAIN are an excellent move in that direction

  12. Data Policies • Availability of data • Policies on data access discipline specific • General tendency to move to “open data” • “open data” cannot always be done, due to • the costs of generating the data • The costs of storage and curation • data confidentiality

  13. Data origin • Integrity and source authentication are important • No general mechanism for data-source authentication • Metadata can help • In some disciplines data is only relevant to experts, so considered as quite safe • Authentication by a unique digital signature at creation • Source authentication can add costs

  14. DATA GROWING in every discipline putting higher requirements on all aspects we have looked at

  15. Recommendation 1Network related • Collaboration between user communities and NRENs, GÉANT, ... to understand network requirements associated with the data deluge • Adequate network services made available timely and economically viable • All important network parameters have to be studied (speed, throughput, privacy, persistence of connection, cost, ...)

  16. Recommendation 2standardisation of datasets and metadata • Define standardised data sets: • To profit from economy of scale fro cross-discipline middleware • Define standardised data sets, metadata, middleware and applications • For easier accessibility of data • Adopt a common metadata standard that takes into account multi-disciplinary use of data

  17. Recommendation 3AAI Adopt a globally recognised AAI based on standards for the exchange of assertions and security tokens that can be used by all (user communities, e-infrastructure providers, ICT providers, ...)

  18. Recommendation 4Data origin Create common mechanisms and procedures for all disciplines to certify and authenticate data.

  19. Recommendation 5preservation, curation Facilitate collaboration between disciplines to create common policies, procedures and tools to assist in the curation and preservation of data.

More Related