1 / 13

The Helmholtz Association Project „Large Scale Data Management and Analysis“ (LSDMA)

The Helmholtz Association Project „Large Scale Data Management and Analysis“ (LSDMA). Kilian Schwarz, GSI; Christopher Jung , KIT. Overview. Motivation Data Life Cycle LSDMA’s dual approach Facts and Numbers Initial Communities LSDMA, FAIR and ALICE.

sinead
Download Presentation

The Helmholtz Association Project „Large Scale Data Management and Analysis“ (LSDMA)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Helmholtz AssociationProject „Large Scale Data Management and Analysis“ (LSDMA) Kilian Schwarz, GSI; Christopher Jung, KIT

  2. Overview • Motivation • Data Life Cycle • LSDMA’s dual approach • Facts and Numbers • Initial Communities • LSDMA, FAIR and ALICE

  3. Why is Scientific Big Data important? Honestly, I do not need to explain this to you.

  4. Examples of Scientific Big Data in non-HEP Examples for sciences with Big Data: • Systems Biology: ~10 TB per day in high-throughput microscopy (zebra fish embryos) • Climate simulation: 10-100 PB per year • Brain research: 1 PB per year for brain mapping • Photon Science: XFEL 10 PB/year • and many other sciences which do know their needs yet

  5. Challenges of Big Data • Non-reproducibility of scientific data (or at high costs) • Current analysis methods scale poorly • Existing big data knowledge in the respective fields • Each discipline has its specific needs • Multidiscliplanary research • Metadata • Authentication and authorization (single sign-on) • Data privacy (incl. removal of private data) • “Good scientific practice” • Cost estimation for long-term archival (at different service levels) • Data preservation • Open Access • …

  6. Data Life Cycle Inspiration for LSDMA: support the whole data life cycle!

  7. Dual approach: community-specific and generic Data Life Cycle Labs • Joint r&d with the scientific user communities • Optimization of the data life cycle • Community-specific data analysis tools and services Data Services Integration Team • Generic r&d • Interface between federated data infrastructures and DLCLs/communities • Integration of data services into scientific working process

  8. Facts and numbers • Initial project period: 1.1.2012-31.12.2016 • Funded by Helmholtz Association (13 MEUR for 5 years) • To become a part of the sustainable program-oriented funding of Helmholtz Association in 2015 • Partners: 4 Helmholtz research centers, 6 universities and the German climate research center • Leading project partner: KIT

  9. Initial communities • Energy • Smart grids, battery research, fusion research • Earth and Environment • Climate model, environmental satellite data • Health • Virtual human brain map • Key Technologies • Synchroton radiation, nanoscopy, systems biology, electron-microscopical imaging techniques • Structure of Matter • Photon Science: Petra 3, XFEL • FAIR@GSI (14 experiments with big and small communities)

  10. LHC Computing – Prototype for FAIR • FAIR profits from computing experience within an already running experiment • ALICE can test new developments in FAIR • new FAIR developments are on the way, and to some extend they already go back to ALICE • FAIR will play an increasing role (funding, network architecture, software development and more ...)

  11. Goals for GSI/FAIR in LSDMA To be developed within LSDMA (DLCL: structure of matter) in collaboration with LSDMA – DSIT, the FAIR community, and ALICE (whereever synergy can be found) • parallel and distributed computing • triggerless “online” system • porting of needed algorithms to GPU • Grid/Cloud infrastructure • enable the possibility to submit compute jobs to Clouds • create interfaces to existing environments (AliEn, ...) • data archives • long term data archives • including concepts for xrootd and gStore • meta data calatog and data analysis • Metropolitan Area Systems • includethedistributed FAIR T0/T1 centreinto a global Grid/Cloudinfrastructure • Federated Identity Management • Global Federations • Global File System • Optimizationof Data Storage • hot versus colddata • corruptandincompletedatasets • parallel storage • 3rd partycopy Additional synergies via DSIT

  12. Next Stepsat GSI • Advertise LSDMA positions (2 for FAIR DLCL) – do youknowcandidates ? • GSI DSIT alreadystartedtohirepeople • Discussionwith FAIR experimentsand ALICE • Set-upof e-scienceinfrastructures, firstfor PANDA and CBM, based on theexperienceswith ALICE (AliEn/xrootd/...) • Includesmaller FAIR experiments • Continuetodevelopexisting e-scienceinfrastructure, also in closecollaborationwith DSIT and ALICE

  13. Summary and Outlook • There are many challenges in Scientific Big Data • LSDMA is a sustainable Helmholtz Association project, supporting the whole data life cycle, using a community-specific and a generic approach • FAIR is an important initial community in the research field ‘structure of matter’; several developments planned -> synergies w/ALICE • GSI has two open job positions for LSDMA

More Related