130 likes | 490 Views
Data Management Challenges in the Human Brain Project. Thomas Heinis Data-Intensive Applications & Systems Lab Ecole Polytechnique Federale de Lausanne. Human Brain Project. Goal: develop platform to simulate human brain!
E N D
Data Management Challenges in the Human Brain Project Thomas Heinis Data-Intensive Applications & Systems Lab EcolePolytechniqueFederale de Lausanne
Human Brain Project • Goal: develop platform to simulate human brain! • 10 year and 1 billion Euro project to integrate all available knowledge/data Single Neuron 3D Model • Involves coordination of more than 200 interdisciplinary partners • Lead by Henry Markram • Based on EPFL’s Blue Brain Project Neocortex Model
Human Brain Project Workflow Literature Research Analysis Experimental Observations Model Building Medical Records/Data Simulation Visualization
Data Deluge • Medical Data from hundreds of sources • Model Data alone: • Simulation Trace: potentially infinite Present Goal Model Data Fine grained (3D Mesh) # of Neurons Model Data Coarse grained 270GB 5.8TB 1M 86 Billion Full Brain 27PB 0.5EB
HBP Data Management Challenges • Querying of Petascale Data • Data Privacy/Anonymization • Cloud Analytics • Quick & efficient access to raw data • Distributed Workflow Execution • Provenance/Reproducibility • Data Personalization
Spatial Indexing 3D Spatial Range Query 3D Spatial Range Query 2010 Applications: Bottleneck in Spatial Analysis Model Building Analysis 2008 2006
Overlap Problem R-Trees: Hierarchy of Minimum Bounding Rectangles (MBR) STEP 2: Query Execution STEP 1: Index Construction Range Query Overlap c Overlap reduces performance, increases with level of detail
“FLAT”, A Two Phase Spatial Index* Add reachability information during index construction c 1) SEEDING: Find any one object arbitrarily inside the query region. 2) CRAWLING: Retrieve remaining results by traversing the neighbors. *joint work with Farhan Tauheed, Laurynas Biveinis and Anastasia Ailamaki
Impact Blue Brain Project: Part of the toolset used every day February 2012: first 1 million neuron model indexed Still 5 orders of magnitude smaller than human brain General Applicability: 3D surface mesh models N-Body simulation data set 2012 (270 GB) 2010 2008 2006
HBP Data Management Challenges • Querying of Petascale Data • Data Privacy/Anonymization • Cloud Analytics • Quick & efficient access to raw data • Distributed Workflow Execution • Provenance/Reproducibility • Data Personalization
Thank you • Data Management in the Human Brain Project • Thomas Heinis