180 likes | 242 Views
The Sense of Sequense. Chris Evelo BiGCaT Bioinformatics Universiteit Maastricht. The Sense of Sequense. Databases. What have we got to compare our sequences with? Chris Evelo. Is that gene on my array? A simple question with a complicated answer. Gontran Zepeda.
E N D
The Sense of Sequense Chris Evelo BiGCaT Bioinformatics Universiteit Maastricht
The Sense of Sequense • Databases. What have we got to compare our sequences with?Chris Evelo. • Is that gene on my array? A simple question with a complicated answer.Gontran Zepeda. • Annotating a full array. Evading the EST trap and the Affymetrix challenge.Stan Gaj.
First questions to answer. • How to sequence an entire genome? • Typical errors? • Why not start with chromosome 1? • Is it useful?
How to sequence an entire genome Start show
Example trace file. DNA sequence trace showing a portion of the nucleotide sequence of the gene encoding the envelope protein of the Human Immunodeficiency Virus, HIV-1.
Typical errors. • Not all base/dye combo’s same mobility • (typically corrected by software) • Bad quality at start and end of sequences • Bad separation in front runners • Typical low broad peeks at the end • As a result multiple equal bases overlap
Are genome databases useful? • Copied DNA to computer disks. • Computers can read bits easier than bases. • But why read them? • Or better, how read them. We need more information.
Figure 3-15. The transfer of information from DNA to protein.The transfer proceeds by means of an RNA intermediate called messenger RNA (mRNA). In procaryotic cells the process is simpler than in eucaryotic cells. In eucaryotes the coding regions of the DNA (in the exons,shown in color) are separated by noncoding regions (the introns). As indicated, these introns must be removed by an enzymatically catalyzed RNA-splicing reaction to form the mRNA. Gene expression Alberts et al. Molecular Biology of the Cell, 3rd edn.
Three levels • And we need them all… DNA, mRNA and protein • Protein information comes from biochemistry and physiology: • Main database is Swissprot (high quality/ highly curated) • US has PIR • Hypothetical proteins: • Main database trEMBL • Databases now combined: UniProt
Three levels Protein:SwissprottrEMBL = UniProt DNA Genome data mRNA ??
mRNA. • Measuring mRNA is easy • Use PolyA tail to isolate • PCR and blot (use primer if known) • Clone and sequence • And what do you know then? “It’s an expressed sequence tag…”
Three levels Protein:SwissprottrEMBL = UniProt DNA Genome data mRNA ESTs (EMBL)
Annotate! DNA: Genome data mRNA: ESTs- EMBL Clustered- Unigene Protein:- Swissprot- trEMBL = UniProt