1 / 12

Nature Reviews/2012

Nature Reviews/2012. Next-Generation Sequencing (NGS): D ata Generation. NGS will generate more broadly applicable data for various novel functional assays P rotein-DNA binding Histone modification Transcript levels Spatial interactions Combination of applications into larger studies

jaegar
Download Presentation

Nature Reviews/2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nature Reviews/2012

  2. Next-Generation Sequencing (NGS): Data Generation • NGS will generate more broadly applicable data for various novel functional assays • Protein-DNA binding • Histone modification • Transcript levels • Spatial interactions • Combination of applications into larger studies • 1000 Genomes Project

  3. Next-Generation Sequencing (NGS): Data Interpretation • Meaningful interpretation of sequencing data is important • Rely heavily on complex computation • Major problems • Low adoption of existing practice • Difficulty of reproducibility

  4. Problem1:Low Adoption of Existing Practices Example: Variant discovery • A series of accepted and accessible practices from “1000 Genomes Projects” • 299 articles in 2011 cited this project • Only 10 studies used the recommended tools • Only 4 studies used the full workflow • Not following tested practices undermines the quality of biomedical research • Why low adoption? • Over complicated logistical challenges (e.g. resort input data) • Limited application of toolkit (e.g. handful of well-annotated genomes) • Little agreement on what is considered to be the “best practice”

  5. Problem2Difficulty of Reproducibility Example: Read mapping • To repeat a mapping experiment: primary data, software and its version, parameter setting, name of reference genome • 19 studies cited “1000 genomes projects”, only 6 satisfy all details • 50 random selected papers using burrows-wheeler aligner, only 7 provides all details • Most results in today’s publications cannot be accurately verified, reproduced, adopted or used • Why difficult? • Lack of mechanism for documenting analytical steps

  6. Potential of Integrative Frameworks • Combinations of diverse tools under the umbrella of an unified interface • E.g. BioExtract, Galaxy, GenePattern, GeneProf • Advantages • Making data analysis transparent and reproducible • Making use of high-performance computing infrastructure • Improving long-term archiving

  7. 1. Promoting Transparency and Reproducibility • Automatic tracking, recording and disseminating all details of computational analyses • GenePattern: embed details into Microsoft Word documents while preparing publication • Galaxy: create interactive Web-based supplements with analysis details • Allow readers to inspect the described analysis in details

  8. 2. Using High-performance Computing Infrastructure • High-performance computing resources • Computing clusters at institutions or nationwide efforts, e.g. XSEDE • Private and public clouds • Not accessible to the broad biomedical community • Virtual machines or application-programming interface • With integrative frameworks, anyone can deploy an solution on any type of resource • E.g. CloudMan • User interface for managing computing clusters on cloud resources

  9. 3. Improving Long-term Archiving • General vulnerability of centralized resources: longevity of hosted analysis services • Depend on various external factors, e.g. funding climate • With integrative frameworks • Create snapshots of a particular analysis • Compose virtual machine images from analysis to be stored as an archival resource • E.g. Dryad system or Figshare • Export complete collection of analysis automatically for archival • Anyone can recreate a new virtual instance with this archival • Improved reproducibility

  10. Future Directions: Tools Distribution • Current practice • Tools needs to be compiled, installed and supplied with associated data • E.g. short-read mapper requires genome indices • Better practice • Digital platforms providing a set of tools to be automatically installed into users’ integrative framework environment • Pioneer work: e.g. Gparc, Galaxy Tool Shed • Allow sharing of analysis workflows, data sets, visualizations and any other analysis artifacts

  11. Future Directions: Integrate Analysis and Visualization • Current practice • Visualization is the last step of an analysis • Better practice • Visualization as an active component during analysis • Advantages • Users are able to directly sense how parameter changes affect the final result in real time • In the context of publication, it aids readers to evaluate and inspect the results

  12. Conclusion • To sustain the growing application of NGS, data interpretation must be as accessible as data generation • Necessary to bridge the gap between experimentalists and computational scientists • For experimentalists, embrace unavoidable computational components • For computational scientists, ensure the software is appeal to be used • Emergence of integrative frameworks • Tracking details precisely • Ensuring transparency and reproducibility

More Related