170 likes | 373 Views
Managing Next Generation Sequencing and Multiplexed Genotyping Data Using Open Source LabKey Server Adam Rauch adam@labkey.com . LabKey Software Company Overview. LabKey Software is a consulting company Spun off from the McIntosh Lab (part owned by FHCRC)
E N D
Managing Next Generation Sequencing and Multiplexed Genotyping Data Using Open Source LabKey ServerAdam Rauchadam@labkey.com
LabKey Software Company Overview • LabKey Software is a consulting company • Spun off from the McIntosh Lab (part owned by FHCRC) • Professional software engineers from Amazon, Microsoft, BEA etc • Work in partnership with scientists • For-profit fee-for-service contracts • Non-profit grant sub-awards • Co-investigators with a shared research agenda • All development approved by and relevant to FHCRC • Development & support around LabKey Server • Extending the base LabKey Server platform • Creating customized lab-specific solutions • Hosting LabKey server • Support
What Is LabKey Server? • An open-source, web-based platform for organizing, analyzing & sharing scientific data • Data integration analysis for assays • Proteomics, flow cytometry, plate-based assays, etc. • Study Data Management • Combines demographic, clinical, assay & specimen data • LabKey Server powers many deployments… • CPAS: FHCRC proteomics repository • Atlas Science Portal: SCHARP’s HIV vaccine studies • AdaptiveTCR: Customer analytics for ImmunoSEQ NGS • UW (Katze, Heinecke, et al), USC, Markey, Harvard, IDRI, TGen, Wisconsin Primate EHR, UC Denver, etc.
Dave O’Connor Lab, University of Wisconsin • Academic research lab • Focus: understanding SIV using nonhuman primate models & applying NHP methods to human HIV disease research
O’Connor Lab SIV/HIV Research Host Immune Genetics Virus Genetics Source: Korber et al., British Medical Bulletin 2001 Source: modified from Yewdell et al., Nature Reviews Immunology 2003
Importance of MHC Class I Host Immune Genetics • MHC class I molecules dictate immunity to disease • High degree of polymorphism within the MHC class I peptide-binding domain • Specific MHC alleles associated with superior control of HIV infection Source: modified from Yewdell et al., Nature Reviews Immunology 2003
Importance of Viral Variability Virus Genetics • HIV has fast replication cycle, high mutation rate • Evolution of the virus causes escape from immune responses • Specific mutations are associated with resistance to antiretroviral drug therapy Source: Korber et al., British Medical Bulletin 2001
Sequencing in the O’Connor Lab • 2005 – 2009 Sanger sequencing • “Prohibitively expensive” for most experiments • 2009 Roche/454 GS FLX at UIUC • 2010 Roche/454 GS Junior in lab • Roche/454 GS Junior • Long-read instrument, critical for genotyping • Identical to GS FLX, but 1/8 throughput & lower cost • ~100,000 reads per run (~1¢ per read), average ~560bp read length • 115 runs this year • MID tagging • Allows pooling multiple samples (30-100) into a single run • Galaxy server • Open-source sequence analysis tool (Giardine et al, Genome Res 2005) • Lab has built custom workflow to match sequences to known MHC alleles • Uses BLAT, transitioning to AGILE (Northwestern alignment tool)
Roche/454 MHC Workflow • Total RNA isolation and cDNA synthesis • RNA isolation ~4 hrs; cDNA synthesis ~2 hrs • Primary PCR amplification • plus SPRI purification, quantification, pooling ~3 hrs • emPCR • set-up ~1 hr, run ~5.5 hrs • Breaking and enrichment • ~3 hrs • Roche/454 GS Junior run • set-up ~1.5 hrs; run time ~10 hrs • Data processing and analysis • run processing ~2 hrs; analysis time varies www.454.com
There is a real disconnect between the ability to collect next-generation sequence data (easy) and the ability to analyze it meaningfully (hard) Dave O’Connor Problem: Data Management!
Problem: Data Management • As volume has increased, lab has found it difficult to manage all their sequencing data & meta data: • Run meta data • Run metrics • Sequencing reads and quality scores • Sample information and multiplex identifiers (MIDs) • Reference sequences for genotyping experiments • Genotyping matches • O’Connor asked LabKey to build a system that can: • Store sequencing and genotyping data in a single database that links all the tables, allowing arbitrary queries and reports • Provide tools for analysis, querying, visualization and export • Automate data workflows for efficiency & consistency • Eventually, link sequencing results to their primate EHR system
LabKey Sequencing System Reads Quality ScoresMetrics Sample Information Reference Sequences Galaxy Genotyping Workflow Sequencing and Genotyping Database Reporting Analysis Visualization Export External Tools
Possible Future Directions • Respond to O’Connor lab’s near-term needs • Genomics-specific analytics • Additional export formats • Tighter integration with Galaxy • Support for amplicon-designated reads • Match combining • Simplify configuration and operation • Integrate with Wisconsin primate EHR • Better integration with R / Bioconductor • Visualization • Other sequencing platforms: Illumina, PacBio…
Acknowledgements • O’Connor Laboratory • David O’Connor • Simon Lank • Julie Karl • Benjamin Bimber • LabKey Software • Mark Igra • Brian Connolly • Elizabeth Nelson • Josh Eckels • Matthew Bellew • Et al