370 likes | 539 Views
Generic Database. What should a genome database do?. Interact with other Database. Information Genomic Proteomic literature. Genome Browser. Download results Multiple format. Search Browse Collect. Generic. Usable by everyone. GeneDB – An Overview.
E N D
What should a genome database do? Interact with other Database Information Genomic Proteomic literature Genome Browser Download results Multiple format Search Browse Collect Generic Usable by everyone
GeneDB – An Overview Aim – To provide a database to house the data from the many sequencing projects that the Sanger Institute has been involved in. The database had to be: Generic, flexible enough to handle sequence from diverse organisms Curatable, capable of being manually edited by annotators and curators Intuitive and user friendly Capable of housing new data types, easily expandable Searchable, allow users complete flexibility in searching, selecting and downloading whatever information theywant Interactive, community feedback
GeneDB November 2004 - Datasets www.genedb.org Total number of organisms – 26 Number of protozoa - 12 Kinetoplastids Leishmania braziliensis ~33600 361k reads 5 X Yes Trypanosoma congolense~30000 262k reads ~5 X Yes Trypanosoma b. gambiense~30000 188k reads ~5 X Yes
Basic information – • on the selected gene • Location – The chromosome • number, coordinates, gene • length and a graphical map c)Curated and/or automatic annotation • Predicted peptide properties • statistics on the predicted protein, • known or predicted domains • and motifs
Gene Ontology – Annotation • using the GO controlled vocabulary. • Database cross references • are linked to other public databases • Curated orthologs – database • links to manually selected • orthologous genes • Similarity information • and the respective database links • Swiss-Prot annotations – for • this protein and keywords • Contact – feedback forms for • curators and technical queries
Orthologs and Paralogues in GeneDB Tri-tryp orthologs Predicted by clustering and Reciprocal BLAST Paralogs or families Predicted using BLAST P and TribeMCL 4 BLAST e value cutoffs TribeMCLEnright A.J., Van Dongen S., Ouzounis C.A; Nucleic Acids Res. 30(7):1575-1584 (2002)
How to access data: • keyword searching • sequence searching/ motif search • browsable catalogues, product, domain • browsable contig/chromosome maps • GO (gene ontology) - AmiGO • complex querying • across species
Searching GeneDB Sequence search analysis Browse Catologues Simple Query
OMNIBLAST Search multiple datasets over multiple organisms, Uses more than one BLAST algorithm if appropriate Produces an intermediate results page, listing summary of the top 5 hits of all searches If protein sequence used will also display predicted Pfam protein families found Access full BLAST search result from intermediate page
Cross species search for nucleoside transporter By name or ID By product By protein domain
Proteomics Tool Select the dataset Select restriction enzyme Enter peptide mass data
Data downloads Any search result that gives a list History of any boolean queries
Generate download list by adding to gene basket Contiguous sequence
Leishmania major Stats Trypanosoma brucei stats
http://www.genedb.org/ • GeneDB reference guide More information • Feed back forms for technical and biological queries • Papers: • Trends in Parasitology, 2002 18 (10) 465-67 • January 2004 issue of Nucleic Acids Research