1 / 27

Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K )

Biomedical Big Data Initiative (BD2K) . Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K ) . Myriad Data Types. Genomic. Other ‘ Omic. Imaging. Phenotypic. Exposure. Clinical. Data and Informatics Working Group.

manjit
Download Presentation

Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biomedical Big Data Initiative (BD2K) Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K)

  2. Myriad Data Types Genomic Other ‘Omic Imaging Phenotypic Exposure Clinical

  3. Data and Informatics Working Group acd.od.nih.gov/diwg.htm

  4. What Are the Big Problems to Solve? 1. Locating the data 2. Getting access to the data 3. Extending policies and practices for data sharing 4. Organizing, managing, and processing biomedical Big Data 5. Developing new methods for analyzing biomedical Big Data 6. Training researchers who can use biomedical Big Data effectively

  5. Overarching Strategy and Goals Two initiatives being proposed to overcome roadblocks Big Data to Knowledge (BD2K) – enable the biomedical research enterprise to maximize the value of biomedical data InfrastructurePlus – create an adaptive environment at NIH to sustain world-class biomedical research

  6. Big Data to Knowledge (BD2K): Overview • Major trans-NIH initiative addressing an NIH imperative and key roadblock • Aims to be catalytic and synergistic • Overarching goal: • By the end of this decade, enable a quantum leap in the ability of the biomedical research enterprise to maximize the value of the growing volume and complexity of biomedical data

  7. BD2K: Four Programmatic Areas • Facilitating Broad Use of Biomedical Big Data • II. Developing and Disseminating Analysis Methods and Software for Biomedical Big Data • III. Enhancing Training for Biomedical Big Data • IV. Establishing Centers of Excellence for Biomedical Big Data

  8. Area 1: Data Sharing & Access • Facilitating usage and sharing of biomedical big data • New Policies to Encourage Data & Software Sharing • Index of Research Datasets to Facilitate Data Location & Citation • Community-based Development of Data & Metadata Standards A. Policies to Facilitate Data Sharing. B. Data Catalog: Data Discovery, Citation, Links to Literature. C. Frameworks for Community-Based Solutions to Developing Data Standards. D. Enabling Research Use of Clinical Data.

  9. Area 2: Software and Systems Development • Development of analysis methods and software • Software to Meet Needs of the Biomedical Research Community • Facilitating Data Analysis: Access to Large-scale Computing • Dynamic Community Engagement of Users and Developers A. Grants for software development B. Software Registry: Making biomedical software findable and citable C. Cloud computing: Facilitating Data Analysis D. Dynamic Social Engagement via social media

  10. Area 2: Software and Systems Development Software Grants Current and emerging needs for using, managing, and analyzing the larger and more complex data sets inherent to biomedical Big Data • Compression/Reduction • Visualization • Provenance • Data Wrangling

  11. Area 2: Software and Systems Development Big Data needs Big Computing Cloud Computing • Leveraging the cloud • Storing and analyzing huge data sets • Collaborative environment • Developing appropriate policies for use of controlled access data in the cloud (dbGaP) • Developing working relationships with major cloud providers • AWS, Google, Microsoft (Azure) HPC • More exploration with Supercomputing facilities

  12. Area 3: Training • Enhancing computational training • Increase Number of Computationally Skilled Trainees • Strengthen the Quantitative Skills of All Researchers • Enhance NIH Review and Program Oversight

  13. Area 4: Centers • Establishing centers of excellence • Collaborative environments & technologies • Data integration • Analysis & modeling methods • Computer science & statistical approaches A. Investigator-initiated Centers B. NIH-specified Centers

  14. Big Data to Knowledge (BD2K) bd2k.nih.gov

  15. Biomedical Research as Part of the Digital Enterprise Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health

  16. Myriad Data Types Genomic Other ‘Omic Imaging Phenotypic Exposure Clinical

  17. Myriad Data Types Genomic Other ‘Omic Imaging Phenotypic Exposure Clinical

  18. Components of The Academic Digital Enterprise • Consists of digital assets • E.g. datasets, papers, software, lab notes • Each asset is uniquely identified and has provenance, including access control • E.g. publishing simply involves changing the access control • Digital assets are interoperable across the enterprise

  19. Let’s Break Down the Silos • New policies, regulations e.g. data sharing • Economic drivers • The promise of shared data

  20. The NIH is Starting to Think About the Digital Enterprise Big Data to Knowledge (BD2K) bd2k.nih.gov

  21. This is great, but BD2K is just a start, what will the end product look like?

  22. To get to that end point we have to consider the complete research lifecycle

  23. The Research Life Cycle will Persist IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION

  24. Tools and Resources Will Continue To Be Developed Authoring Tools Data Capture Analysis Tools Scholarly Communication Lab Notebooks Software Visualization IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION

  25. Those Elements of the Research Life Cycle will Become More Interconnected Around a Common Framework Authoring Tools Data Capture Analysis Tools Scholarly Communication Lab Notebooks Software Visualization IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION

  26. New/Extended Support Structures Will Emerge Authoring Tools Data Capture Analysis Tools Scholarly Communication Lab Notebooks Software Visualization IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Discipline- Based Metadata Standards Community Portals Data Journals Git-like Resources By Discipline New Reward Systems Commercial & Public Tools Training Institutional Repositories Commercial Repositories

  27. bonazziv@nih.gov Thank You Questions?

More Related