130 likes | 314 Views
The Helmholtz Association Project „Large Scale Data Management and Analysis“ (LSDMA). Kilian Schwarz, GSI; Christopher Jung , KIT. Overview. Motivation Data Life Cycle LSDMA’s dual approach Facts and Numbers Initial Communities LSDMA, FAIR and ALICE.
E N D
The Helmholtz AssociationProject „Large Scale Data Management and Analysis“ (LSDMA) Kilian Schwarz, GSI; Christopher Jung, KIT
Overview • Motivation • Data Life Cycle • LSDMA’s dual approach • Facts and Numbers • Initial Communities • LSDMA, FAIR and ALICE
Why is Scientific Big Data important? Honestly, I do not need to explain this to you.
Examples of Scientific Big Data in non-HEP Examples for sciences with Big Data: • Systems Biology: ~10 TB per day in high-throughput microscopy (zebra fish embryos) • Climate simulation: 10-100 PB per year • Brain research: 1 PB per year for brain mapping • Photon Science: XFEL 10 PB/year • and many other sciences which do know their needs yet
Challenges of Big Data • Non-reproducibility of scientific data (or at high costs) • Current analysis methods scale poorly • Existing big data knowledge in the respective fields • Each discipline has its specific needs • Multidiscliplanary research • Metadata • Authentication and authorization (single sign-on) • Data privacy (incl. removal of private data) • “Good scientific practice” • Cost estimation for long-term archival (at different service levels) • Data preservation • Open Access • …
Data Life Cycle Inspiration for LSDMA: support the whole data life cycle!
Dual approach: community-specific and generic Data Life Cycle Labs • Joint r&d with the scientific user communities • Optimization of the data life cycle • Community-specific data analysis tools and services Data Services Integration Team • Generic r&d • Interface between federated data infrastructures and DLCLs/communities • Integration of data services into scientific working process
Facts and numbers • Initial project period: 1.1.2012-31.12.2016 • Funded by Helmholtz Association (13 MEUR for 5 years) • To become a part of the sustainable program-oriented funding of Helmholtz Association in 2015 • Partners: 4 Helmholtz research centers, 6 universities and the German climate research center • Leading project partner: KIT
Initial communities • Energy • Smart grids, battery research, fusion research • Earth and Environment • Climate model, environmental satellite data • Health • Virtual human brain map • Key Technologies • Synchroton radiation, nanoscopy, systems biology, electron-microscopical imaging techniques • Structure of Matter • Photon Science: Petra 3, XFEL • FAIR@GSI (14 experiments with big and small communities)
LHC Computing – Prototype for FAIR • FAIR profits from computing experience within an already running experiment • ALICE can test new developments in FAIR • new FAIR developments are on the way, and to some extend they already go back to ALICE • FAIR will play an increasing role (funding, network architecture, software development and more ...)
Goals for GSI/FAIR in LSDMA To be developed within LSDMA (DLCL: structure of matter) in collaboration with LSDMA – DSIT, the FAIR community, and ALICE (whereever synergy can be found) • parallel and distributed computing • triggerless “online” system • porting of needed algorithms to GPU • Grid/Cloud infrastructure • enable the possibility to submit compute jobs to Clouds • create interfaces to existing environments (AliEn, ...) • data archives • long term data archives • including concepts for xrootd and gStore • meta data calatog and data analysis • Metropolitan Area Systems • includethedistributed FAIR T0/T1 centreinto a global Grid/Cloudinfrastructure • Federated Identity Management • Global Federations • Global File System • Optimizationof Data Storage • hot versus colddata • corruptandincompletedatasets • parallel storage • 3rd partycopy Additional synergies via DSIT
Next Stepsat GSI • Advertise LSDMA positions (2 for FAIR DLCL) – do youknowcandidates ? • GSI DSIT alreadystartedtohirepeople • Discussionwith FAIR experimentsand ALICE • Set-upof e-scienceinfrastructures, firstfor PANDA and CBM, based on theexperienceswith ALICE (AliEn/xrootd/...) • Includesmaller FAIR experiments • Continuetodevelopexisting e-scienceinfrastructure, also in closecollaborationwith DSIT and ALICE
Summary and Outlook • There are many challenges in Scientific Big Data • LSDMA is a sustainable Helmholtz Association project, supporting the whole data life cycle, using a community-specific and a generic approach • FAIR is an important initial community in the research field ‘structure of matter’; several developments planned -> synergies w/ALICE • GSI has two open job positions for LSDMA