280 likes | 301 Views
Marie-Ad èle Rajandream The Pathogen Sequencing Unit The Sanger Institute The Wellcome Trust Genome Campus Hinxton Cambridge United Kingdom. The Sanger Institute. Principally funded by Wellcome Trust (about 96 %) 60,000,000 bases per day of raw data 600 employees
E N D
Marie-Adèle Rajandream The Pathogen Sequencing Unit The Sanger Institute The Wellcome Trust Genome Campus Hinxton Cambridge United Kingdom
The Sanger Institute • Principally funded by Wellcome Trust (about 96 %) • 60,000,000 bases per day of raw data • 600 employees • Sequencing of Human, Mice, Zebrafish & pathogen genomes • Manual and automatic genome annotation (Ensembl, Artemis) • Identification of cancer causing mutations (recently BRAF gene mutation) • Sequence variation and disease association
The Pathogen Sequencing Unit Sequencing • Small genomes (bacterial and model organisms) • 60-70 projects • Current capacity 4 M reads p/a sufficient for 100 Mb of finished sequence • Mainly whole genome/chromosome shotguns including finishing • Many are international collaborations • Larger more complex genomes (35-100 Mb) on the horizon Informatics • Automatic analysis • Manual annotation by expert biologists • Tools: finishing (Cyclops), annotation (Artemis), comparative analysis (ACT) • Data dissemination • Database resources Functional Genomics • S. pombe • Bacterial Genomes • D. discoideum
curation Project pages GeneDB http://www.genedb.org analysis sequences annotation BLAST FTP site
What is GeneDB? • a generic organism database • annotated sequences as well as functional data • visualisation in user-friendly environment • annotation and analysis of data by biologists • flexible enough to incorporate new data types • linked to external databases • fully curated
The GeneDB project • Started in 2001 • Funded by the Wellcome Trust for a period of 5 years • Initially for 3 organisms: S. pombe, Leishmania & Trypanosome • 2 full-time programmers, 1 part-time programmer • One curator for each organism • One helpdesk person / programmer • Prototype now done and in use
Technical Outline Prototype Web jsp cgi blast ominblast asp common cerevisiae pombe malaria leish tryp Data asp images serialise indices cerevisiae images serialise indices pombe malaria tryp leish “Java” biojava data gui minelet mining test utils web EMBL
Broad specifications for production version • Relational database • Curator / annotator interface incorporating functionality of Artemis (MESS) • Facility for doing more complex queries For comprehensive, detailed specs see our Functional Specifications document
“biotin carboxylase” Inferred by Sequence Similarity with a yeast sequence SGD:S0005299 (which was originally annotated based on a published mutant phenotype)
Wellcome Trust Sanger Institute Pathogen Sequencing Unit Project Management Bart Barrell Julian Parkhill Marie-Adele Rajandream Al Ivens Neil Hall Sequencing Carol Churcher Karen Brooks Inna Cherevach Tracey Chillingworth Kay Clarke Paul Davies Nancy Hamlin Kay Jagels Sharon Moule Brian White Sally Whitehead Programming Rob Davies David Harper Arnaud Kerhornou Paul Mooney Kim Rutherford Adrian Tivey Ed Zuiderwijk Karen Mungall Theresa Feltwell Ian Goodhead Zahra Hance Heidi Hauser Mandy Sanders Mark Simmonds Danielle Walker Analysis Martin Aslett Steven Bentley Matthew Berriman Ana Cerdeno Christiane Hertz-Fowler Matthew Holden Keith James Rachel Lyne Arnab Pain Chris Peacock Mohammed Sebaihia Nick Thomson Valerie Wood Subcloning Ann Cronin Audrey Fraser David Johnson Mike Quail Claire Price Ester Rabbinowitsch Sarah Sharp Barbara Harris Becky Atkin Andrew Barron Carol Chillingworth Louise Clarke Craig Corton Jonathan Doggett Nicola Lennard Alexandra Line Doug Ormand David Harris Matthew Collins Nigel Fosker Arlette Goble Lee Murphy Susan O’Neil Simon Rutter David Saunders Kathy Seeger Robert Squares Steven Squares Mapping Maria Fookes John Woodward AdministrationYvonne Shaw