250 likes | 284 Views
ASM Philadelphia, May 2009. How We Annotated Genomes for Free: Fast and Accurate Functional Analysis Using Subsystems Technology. Rob Edwards Depts of Computer Science And Biology, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory.
E N D
ASM Philadelphia, May 2009 How We Annotated Genomes for Free: Fast and Accurate Functional Analysis Using Subsystems Technology Rob Edwards Depts of Computer Science And Biology, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory http://rast.nmpdr.org/?page=Conference
Pigeons If it’s good enough for Google – it’s good enough for me
Annotation Servers • Complete genomes • http://rast.nmpdr.org • Metagenomes • http://metagenomics.theseed.org http://rast.nmpdr.org/?page=Conference
How much has been sequenced? 100 bacterial genomes Environmental sequencing First bacterial genome 1,000 bacterial genomes Number of known sequences Year http://rast.nmpdr.org/?page=Conference
How much will be sequenced? Everybody in USA Everybody at an ASM meeting One genome from every species 100 people Most major microbial environments All cultured Bacteria http://rast.nmpdr.org/?page=Conference
The SEED Family http://rast.nmpdr.org/?page=Conference
Over 1,000 Subsystems Three level “hierarchy” • Amino Acids and Derivatives • Alanine, serine, and glycine • Serine Biosynthesis • Amino Acids and Derivatives • Lysine, threonine, methionine, and cysteine • Methionine Biosynthesis Make your own subsystems! http://rast.nmpdr.org/?page=Conference
The Annotation Process • Find the phylogenetic neighborhood of your genome • Look for proteins that related organisms have • Core proteins • Subset of all subsystems • Use those calls as a training set for critica/glimmer • Intrinsic training set! http://rast.nmpdr.org/?page=Conference
Automatic Metabolic Reconstruction • Subsystem, GO, and KEGG connections • KEGG EC numbers • KEGG reaction numbers • SEED reaction numbers (Chris Henry) • Metabolic flux models • Automatically generate FBA matrices (Aaron Best/Matt DeJongh; Hope College) http://rast.nmpdr.org/?page=Conference
The Populated Subsystem http://rast.nmpdr.org/?page=Conference
Find And Suggest Candidate Functions • Rapidly correct missing annotations • Add more members to subsystems • Improves future genome annotations! (especially with new subsystems) http://rast.nmpdr.org/?page=Conference
The Real Live Test • 10 genomes submitted on Thursday at 6 pm • First annotation complete before 8 am Friday • Remaining annotations completed Friday before noon • (there were others in the pipeline too!) http://rast.nmpdr.org/?page=Conference
Subsystems Coverage http://rast.nmpdr.org/?page=Conference
Prophages PHANTOME Mya Breitbart, Matt Sullivan, Jeff Elhai, Rob Edwards NSF Haloferax sulfurifontis prophage
Metagenome Comparisons Metagenomics RAST has 300 public metagenomes Compared using tblastx http://rast.nmpdr.org/?page=Conference
High Salinity SalternsSaN Diego, July 2004 Thanks Beltran Rodriguez-Mueller, Mya Breitbart, & Forest Rohwer
Low salinity salterns High salinity salterns July 2004 Nov 2005
Free workshops on NMPDR, RAST, mg-RAST, SEED Contact Leslie McNeil lkmcneil@ncsa.uiuc.edu or visit http://www.nmpdr.org/ http://rast.nmpdr.org/?page=Conference
Acknowledgements FIG Ross Overbeek Veronika Vonstein Annotators Annotation Servers Rick Stevens Ross Overbeek Folker Meyer Bob Olson Daniel Paarman Mark D'Souza Jared Wilkening Andreas Wilke Environmental Genomics Forest Rohwer Beltran Rodriguez-Mueller Artist Paula Morris http://rast.nmpdr.org/?page=Conference