200 likes | 411 Views
Genome databases and webtools for genome analysis. Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit sites used in lab exercise #2. Major components of NCBI. GenBank PubMed Entrez BLAST Conserved Domain Database (CDD)
E N D
Genome databases and webtools for genome analysis • Become familiar with microbial genome databases • Use some of the tools useful for analyzing genome • Visit sites used in lab exercise #2
Major components of NCBI • GenBank • PubMed • Entrez • BLAST • Conserved Domain Database (CDD) • Cluster of orthologous groups (COGS) • OMIM
GenBank • Database of DNA and protein sequences • Searchable • Caution: Sequences deposited by the community, not curated for accuracy. • RefSeq - verified by NCBI.
BLAST • Basic Local Alignment Search Tool • Comparing nucleotide sequences and protein sequences • Microbial specific BLAST page • Focus of a future lab
OMIM • Online Mendelian Inheritance in Man. • Database that links diseases and genes
TIGR • Comprehensive microbial resource (CMR). • Many genomes. • Tools to analyze genomes.
SubtiList • Website for B. subtilis genome. • Features • Annotated genes • Gene region display • Updated similarity searches for every protein • BLAST and pattern search capabilities • Links to journal articles and protein databases
RDP • Ribosomal database project • Curated at MSU • Contains a compilation of all ribosomal DNA sequences (currently over 100,000). • Second database contains information regarding copy number of ribosomal RNA.
KEGG • Kyoto Encyclopedia of Genes and Genomes • Often changing database of gene content, metabolic pathways, etc. • Excellent resource for reconstructing pathways in organism of interest.
Genome sequencing and annotation Week 2 reading assignments - pages 65-79, 110-122. Boxes 2.1, 2.2 and 2.3. Don’t worry about the details of HMM. Hughes Functional Genomics Review.
Sequencing - dideoxy method for DNA sequencing. • Methods for sequencing genomes. • Methods for finding and annotating genes in microbial genomes.
Dideoxy sequencing (Sanger method) • Developed by Frederick Sanger (for which he won his second Nobel Prize in 1980).
Two types of labeling • Radioactive • 32P, 35S • Run out each dideoxy base in a separate reaction, lane on a gel. • No longer used • Fluorescent • Four different fluorophores for each base • Can be mixed. • Chromatograms - GTSF
Phred • Method for automated quality assessment of DNA sequence traces. • Variance in peak spacing in 7 peak window • Ratio of largest uncalled peak to smallest called peak in 7 and 3 peak windows. • Number of bases between current base and nearest unresolved base. • Phred score = 10 x (-log(P)). • Phred scores of 20 or higher are considered good calls. Why?
Sequencing of genomes • Hierarchical or contig based sequencing • Clone smaller segments of the genome. • Labor intensive, slow • Not needed for sequencing microbial genomes • Shotgun method • Randomly clone and sequence 1.5-2 kb fragments of DNA. 5-10 fold coverage. • Computationally intensive.
Finding genes in a genome sequence • What to look for? • Glimmer - HMM algorithm for identifying genes. (TIGR). • ORF finder - NCBI. • Most automated annotation engines have ORF finding capabilities. • Much more difficult in eukaryotic genomes.