190 likes | 313 Views
Readings for this week. Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks…. Sign up for meeting next week for proposal feedback/progress checkup. Inferring protein function. By genomic context…………. Inferring protein function. By homology…….
E N D
Readings for this week Gogarten et al Horizontal gene transfer…..Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for proposal feedback/progress checkup
Inferring protein function By genomic context………….
Inferring protein function By homology……
COGs—Clusters of Orthologous Groups (Eukaryotic versions are KOGs) Identified using all-all against all sequence comparisons on collection of complete genomes. Includes genes with orthologous and paralogous relationshipsCOGS are grouped into large scale functional categories
Domains--Conserved structural entities with distinctive secondary structure content and an hydrophobic core Example: Protein kinase domain Looking at Parts of Proteins Motifs-- A pattern of amino acids that is conserved across many proteins and confers a particular function on the protein.Example: Zinc finger CX2-4C....HX2-4H
How to identify domains? PFAM—Protein Families Database Based on Hidden Markov Models (HMM) statistical probability models of multiple sequence alignmentsUses a seed alignment of manually curated alignments (PFAM-A)Based on these alignments a Position Specific Scoring Matrix (PSSM) is created
PFAM—Protein Families Database Searching a protein against PFAM results in an E value with meaning similar to BLAST evalues (the probability that a sequence would score that well for that domain by chance)
Other Protein Databases SMART—uses HMMs, focus is signalling and regulatory proteins (tend to be more divergent than enzymes)TIGR FAMs– TIGR curated alignments used to generated HMMs, one advantage is names should be functionally accurate for all proteins they representPRINTS—not HMM based, uses “fingerprints” of conserved motifs Ecumenical solution—InterPro—collection of multiple databases under one umbrella
Still more kinds of BLAST PSI-BLAST– Position Specific Iterated BLAST Use to: find members of a protein family or build a custom position-specific score matrix most sensitive BLAST program, making it useful for finding very distantly related proteins or new members of a protein family 1st round: Standard BLASTP search, then a PSSM is built with all hits with E values better than inclusion threshold 2nd round: PSSM is used to evaluate the alignment in this search. Additional hits better than inclusion threshold are incorporated into an updated PSSM 3rd + rounds: as second round. Search reaches convergence when no new hits are found. Can save PSSM for use in later searching
Still more kinds of BLAST PHI-BLAST– Pattern Hit Initiated BLAST Find proteins similar to the query around a given pattern Must enter both a query sequence containing the pattern AND a pattern to search on Example Pattern: (easy) FGELA (harder) [LIVMF]-G-E-x-[GAS]-[LIVM]-x(5,11)-R-[STAQ]-A-x-[LIVMA]-x-[STACV] Matching peptide: FGELALMYNTPRAATIVA
Enzyme Nomenclature EC Numbers: A hierachical classification scheme for enzymes enzymes are named and classified according to the reactions they catalyze • Oxidoreductases • Transferases • Hydrolases • Lyases • Isomerases • Ligases
Putting it all together…. KEGG– Kyoto Encyclopedia of Genes and Genomes Collection of manually drawn metabolic/cellular pathway maps, based on most up to date biochemical information Metabolic maps are strongest feature--use EC numbered enzymes as key players, allowing pathways of different genomes to be easily mapped based on their predetermined EC content Also has a growing collection of signalling/cellular process maps