100 likes | 346 Views
GENBANK, SWISSPROT AND OTHERS. As Problem Sources for CSE 549 Andriy Tovkach Genetics. GENBANK OVERVIEW. Consists of EMBL, NCBI and DDBJ Started 10 years ago Exponential growth ( graph ) On Saturday, the 7 th – 20.2 billion bases. FILE FORMAT. Header Features Sequence ( see files ).
E N D
GENBANK, SWISSPROT AND OTHERS As Problem Sources for CSE 549 Andriy Tovkach Genetics
GENBANK OVERVIEW • Consists of EMBL, NCBI and DDBJ • Started 10 years ago • Exponential growth (graph) • On Saturday, the 7th – 20.2 billion bases
FILE FORMAT • Header • Features • Sequence (see files)
FASTA FORMAT • Single line description begins with > • Followed by sequence data • Can be both protein or DNA
ENTREZ as RETRIEVAL SYSTEM • PubMed – 12 million citations from life science journals • Nucleotide – collection of DNA sequences • Protein – protein sequences from SwissProt • Genome – genomes of over 800 organisms • Also Structure, PopSet, Taxonomy, OMIM
PROTEIN DATABASES • SWISS-PROT • EBI – TREMBL • NCBI – GENPEPT (already in history)
GENOME DATABASES • SGD: • homepage • example 1.1 • example 1.2 • Wormbase • Ensembl Human Genome Browser
CONCLUSIONS • Sequencing projects produce a lot of data • These data have at least to be structured in the databases • Ideally all sequences need high-quality human annotation • That’s why computer scientists are welcome in biology
LITERATURE • Genebank presentation by Manpreet Katari (CSE 549, Fall 2000) • Thomas Lengauer (Ed.) Bioinformatics – From Genomes to Drugs • Entrez website • Google