100 likes | 222 Views
Bioinformatics. Ayesha M. Khan 22 Feb, 2012. Flowchart of sequence data from labs and literature to primary sequence database and subsequent secondary databases. Secondary Sequence Database Protein Domains & Families Metabolic Pathways
E N D
Bioinformatics Ayesha M. Khan 22 Feb, 2012 Lec-4
Flowchart of sequence data from labs and literature to primary sequence database and subsequent secondary databases Secondary Sequence Database Protein Domains & Families Metabolic Pathways e.g. RefSeq and Conserved Domain Database (CDD) within NCBI Primary Sequence Database Amino AcidNucleic Acid e.g. GenBank, EMBL, DDBJ SwissProt and PIR Sequencing centers Literature Researchers Lec-4
Always remember that: • The data within primary databases is as reliable as the data submitted. • This depends primarily on the methods used to produce it. • Regardless of who obtains the sequence data, nucleic acid and amino acid sequencing results are subject to errors. Lec-4
Protein Sequence databases • The protein sequence database was developed at the National Biomedical Research Foundation (NBRF) • Early 1960’s by Margaret Dayhoff to investigate evolutionary relationships among proteins • 1988 onwards, maintained collectively by: Protein Information Resource (PIR) at NBRF, International Protein Information Database of Japan (JIPID), and the Martinsried Institute for Protein Sequences (MIPS). Lec-4
Protein Sequence databases SWISS-PROT Started in 1986-University of Geneva and EMBL It is now maintained by Swiss Institute of Bioinformatics (SIB) and EBI/EMBL TrEMBL Started in 1996-Follows SWISS-PROT format and contains translations of coding sequences in EMBL. It also provides: synthetic sequences, short amino acid fragments, and codons that do not encode real proteins. Lec-4
Composite protein sequence databases • A database that merges a variety of different primary sources. • They obviate the need to interrogate multiple resources. • It can eliminate identical sequence copies, or eliminate both identical and highly similar sequences. Lec-4