200 likes | 293 Views
Creating An Allele Index For NPGS: Bioinformatic Issues. Edward Buckler USDA-ARS at Cornell University, Ithaca, NY. AIM: Make more useful plants by conserving, finding and combining better alleles.
E N D
Creating An Allele Index For NPGS: Bioinformatic Issues Edward Buckler USDA-ARS at Cornell University, Ithaca, NY
AIM: Make more useful plants by conserving, finding and combining better alleles. NEED: The National Germplasm conserves 464,000 accessions and may contain 100,000,000 distinct alleles, but there is no index.
Population structure Familial relatedness Genetic mapping is the basis of the index, and QTL mapping approaches now exist for virtually all types of populations. • Near gene level resolution achieved in multiple species • Identification of genes controlling flowering, starch, nutrients, wood quality • Positive Results in: • Maize • Rice • Arabidopsis • Conifers
What needs to happen? Genotyping (0.5Mdp per accession) Phenotyping (500dp per accession) Bioinformatics (GRIN) Mapping Tools Breeder Decision Tools
What data is currently available outside NPGS? • Several large NSF Plant Genome projects on diversity with NPGS germplasm at the heart of these projects • Numerous smaller projects (however, most data gets lost over time from these) • Millions of genotypic and phenotypic data points in just maize, wheat, and rice projects. • Database aware analysis tools (eg. TASSEL)
Panzea Web Data Access Alignment & SNP Display Upload Tools Display GDPDM Gramene Panzea (Maize) Rice Evol. GRIN? GDPC Data Browser GDPC Other Analysis Tools TASSEL Germinate GRIN DBs Middleware Analysis
GDPDM • Germplasm • Genotype • Phenotype • Environment • Used by maize, wheat, and rice diversity projects.
Panzea Web Data Access Alignment & SNP Display Upload Tools Display GDPDM Gramene Panzea (Maize) Rice Evol. GDPC Data Browser GDPC Other Analysis Tools TASSEL Germinate GRIN DBs Middleware Analysis
Purpose The purpose of GDPC is to simplify access to the large genomic and phenotypic datasets that are becoming available in plant biology. www.maizegenetics.net/gdpc
GDPC Data Flow Diagram www.maizegenetics.net/gdpc
GDPC Data Flow Diagram www.maizegenetics.net/gdpc
Databases • Where has GDPC been mapped? • Panzea (GDPDM schema) • Gramene (GDPDM) • Germinate (generic schema) • GRIN (passport data) www.maizegenetics.net/gdpc
GDPC Browser Demo GDPC: Marker Data
Current GDPC Limitations • XML is not efficient for large datasets • Several avenues are possible for improving efficiency • More visualization and analysis tools need to be developed • Linkage Mapping • Breeder Decision Tools • Geographic interfaces • Pedigree Interfaces (in progress)
What should GRIN consider? • Becoming the lead repository for genotypic and phenotypic diversity data • Lead efforts for the consolidation of community diversity data • Implement several middleware or web services standards (eg. GDPC and perhaps others IRRI) • Collaborate on the development of data visualization tools
All of the software can be accessed through www.maizegenetics.netwww.sourceforge.netwww.panzea.orgwww.gramene.org