1 / 34

Legume Information Network: A Component of the Virtual Plant Information Network

Legume Information Network: A Component of the Virtual Plant Information Network. National Center for Genome Resources University of Minnesota – Center for Computational Genomics and Bioinformatics United States Department of Agriculture – Agricultural Research Service. Gregory D. May

terence
Download Presentation

Legume Information Network: A Component of the Virtual Plant Information Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Legume Information Network: A Component of the Virtual Plant Information Network National Center for Genome Resources University of Minnesota – Center for Computational Genomics and Bioinformatics United States Department of Agriculture – Agricultural Research Service Gregory D. May Atlanta October 2007

  2. Current State of Bioinformatics Resources • Hundreds of Project web-sites and DBs; • Project DBs are distributed, autonomous and ephemeral; • Inconsistent user interfaces TIGR Gene Indices Stein et al, (2006) Plant Biology Databases: A Needs Assessment by the NSF-USDA Working Group on Long-Lived Databases.

  3. The promises of 30+ high throughput ‘omics’ technologies • Improved crops • nutrition, novel traits, resistance, yield, sustainable • Improved animal production • Improved human health • biomarker diagnostics • personalized medicines and therapies • Improved environment • bioremediation • carbon sequestration • energy independence

  4. The need • The legume biologist still must navigate multiple information resources for many research questions • “Develop a virtual, easy-to-navigate “one-stop” legume information network. By “one-stop” we refer by analogy to Google and how it can be seen as a single, yet non-exclusive, information resource.” Gepts et al, Report from the CATG meeting. Plant Physiology (2005) 137:1228.

  5. Virtual Plant Information Network • Establish an architecture based on semantic web technologies to support interoperable (database) network • Standardize data formats and user-interfaces to support machine readable representation of genomes, genetic maps, polymorphisms, QTL, expression, proteins, metabolites and phenotypes. • Develop breeder’s toolboxes with visual interfaces similar to that depicted in GEYSIR

  6. Goals • Design a solution for integrating disparate data sources • Develop a prototype, Legume Information Network, demonstrating the capabilities of semantic web technologies • Legume community take a leadership role in data and tool integration using semantic-MOBY

  7. The Requirements Devise a way in which resources can be described, discovered, and invoked on the web using: • a common syntax – so machines can parse the data and services of each other • a public semantic – so machines can make determinations on suitability-for-purpose • a discovery service – so machines can find data and services across the web based on the semantics of the resources being offered and the needs of the task at hand

  8. Client Discovery Server Provider The Approach: Keep it simple Clients, Providers, and even Discovery Servers all read and contribute to the same set of statements. All actors understand a single, mutable graph which embeds an explicit logic necessary and sufficient to describe, query, discover, invoke, and satisfy resources and requests.

  9. Services Data Provider Services Service Description Provider GO Annotated Transcript Sequences LIS Medicago IMGAG Annotations CCGB Precomputed BlastX against NCBI's NR LIS Blocks precomputed analysis retrieval LIS GenScan precomputed gene predictions LIS Sequence Text Retrieval LIS GO Annotations Retrieval LIS InterPro precomputed analysis retrieval LIS Visualization Services Service Description Provider Comparative Map and Trait Viewer LIS ISYS TableViewer LIS Alignment visualization using PFAAT CCGB Analysis Services Service Description Provider Clustalw Multiple Sequence Alignment CCGB BlastN LIS Transcript Contigs LIS Blast sequences against Kegg Genes CCGB Blast sequences against TIGR TOG Sequence CCGB BlastN Legume BACs LIS BlastN Lotus finished BACs LIS

  10. LIN partners

  11. Resources A running Discovery Server: www.semanticMoby.org The project web site: vpin.ncgr.org Discussion forum: vpin.ncgr.org/mvnforum/forum Collection of ontologies: ontologies.ncgr.org Protocol documentation: ontologies.ncgr.org/OWLDocs/moby Publications and other docs: vpin.ncgr.org/links.shtml Developers’ resources: www.semanticmoby.org/developer/index.jsp Provider Developer Kit: vpin.ncgr.org/provider.shtml Client Developer Kit: vpin.ncgr.org/client.shtml

  12. Generation of DNA Sequence Data Cost/1000 bp 1990 ~ $10.00 2000 ~ $3.00 2005 ~ $1.00 2006 ~ $0.10 2007 ~ $0.03

  13. Sequencing Platform Comparison

  14. Alpheus: Cyberinfrastructure for medical and agricultural resequencing • Nucleotide variant and splice isoform detection • 100s Gb-scale resequencing projects • Short reads (454, Solexa, SOLiD plus Sanger) • Paired and unpaired • Alignments to genomic and transcriptomic references • Greek mythology: cleansed the Augean stables and restored life to the soil

  15. Pileup Visualization Slidable window Overview of transcript Coding domain | nsSNP | SNP | in/del 454 reads

  16. Dynamic Filtering

  17. CONFIDENTIAL

  18. CONFIDENTIAL

  19. CONFIDENTIAL

  20. CONFIDENTIAL

  21. Summary of Medicago ecotype F83005.5 Solexa resequencing • With 1x coverage of a 540Mb genome • One SNP ~600bp – no filtering • ~45,000 High-stringent SNPs

  22. CONFIDENTIAL

  23. Application of Next-Generation Sequencing Technologies for Variant Detection in Crop Plants and Pathogens • Whole transcriptome shotgun re-sequencing • Expressed portions (or gene space) of the genome across populations in the absence of a reference genome • Whole genome shotgun re-sequencing • Sequence across populations with available reference genomes • WGS skimming of transformation events • Target genome re-sequencing across populations • Area under the QTL • Pooled long-PCR products to walk between markers • Restriction enzyme-anchored

  24. GEYSIR(Genomic Explorer y Survey of Immune Response) Sample study 1 Sample study 2 Gene Neighborhood Gene & Nucleotide View geysir.ncgr.org Clickable LOD scores movesselection windows Marker on linkage map (cM) Map region selection windows (grab & slide) Zoom & pan buttons View Selected Studies (across all chromosomes) Sample study 1 Marker on physical map (Mb) Chromosome Map Marker titles visible in this 1.5 Mb region Candidate genes in blue CTRL-left mouse click takes you to Gene detail page Slide-able feature neighborhood window Nucleotide slider window Exons in green Click on chromosome 22 SNP markers Clickable SNP bubbles take you to dbSNP Nucleotide slider window View

  25. USDA-ARS LIN Randy Shoemaker Michelle Graham CCGB/U. Minn LIN Ernest Retzel Jim Johnson Michael Heuer John Crow NCGR VPIN/LIN Damian Gessler Gary Schiltz Bill Beavis Andrew Farmer S. Knapp N. Young Acknowledgements • NCGR LIS • Greg May • Kamal Gajendran • Andrew Farmer • Michael Gonzales • Selene Virk • Bill Beavis • USDA-ARS LIS • Randy Shoemaker • David Grant • Rich Wilson • NCGR GEYSIR • Susan Baxter • Faye Schilkey • Neil Miller • Dan Weems • Lar Mader • Funding • LIS/LIN: USDA-ARS • SCA 3625-21000-038-01 • GEYSIR: NIH-NIAID HHSN266200400064C • VPIN: NSF-BDI 0516487 • LIS Steering Committee: • Mark Burow • Doug Cook • Perry Cregan • Rebecca Dickstein • David Grant • Randy Shoemaker • Michael Udvardi • Nevin Young

More Related