110 likes | 234 Views
Theoretical methods for predicting gene function II. predicting protein domains and their function from sequence analysis. S. Wodak, ULB Inter-university DEA/DES in Bioinformatics. The main steps. [3.1]. Predict domains. [3.2]. Predict function of individual domains. Family G
E N D
Theoretical methods for predicting gene function II. predicting protein domains and their function from sequence analysis S. Wodak, ULB Inter-university DEA/DES in Bioinformatics
The main steps [3.1] Predict domains [3.2] Predict function of individual domains Family G Funct(s) Y Family A Funct(s) X Family M Funct(s) Z Family F Funct(s) W
Domain analysis Proteins tend to be modular -> domains. A first step in functional prediction/annotation can be a scan for known domains in a newly sequenced protein Scan databases of ‘fingerprints’ of classified domains: PROSITE (Bairoch et al., 1997):consensus sequence strings for more than 1000 domains PROFILESCAN: BLOCKS (Henickoff et al., 1998):ungapped alignments and pattern matching PRINTS(Attwood et al., 1998):a set of multiple seq. motifs separated along the sequence PFAM (Batemann et al., ): HMM from multiple alignments
Example: The alcohol dehydrogenase domain (Demo) (PDB-code 8ADH) CATH: http://www.biochem.ucl.ac.uk/bsm/cath_new/domains/8adh02.html PDBsum Swiss-Prot PROSITE pattern associated with Zinc binding/active site PFAM PROSITE etc..
Zinc binding constellation in carbonic anhydrase
Predicting function of individual domains based on sequence similarity 1- Intrinsic feature analysis - compositional biases -transmembrane regions (stretched of hp residues) -coiled coil segments (hepta-repeats of pol/hp residues) -pro rich, glu rich If not eliminated first, can lead to spurious hits, and thus erroneous inference of function 2- Sequence alignments - Pairwise alignments Blast, Fasta : >40% sequence identity - Multiple alignments: <40% sequence identity -Psi-Blast - SAM-98 (HMM)/PFAM More sensitive Erroneous inference of function can still be made, because sequence Similarity does not guarantee structural similarity.
Predicting function based on sequence alignments >40% sequence identity Pairwise alignments, Blast, Fasta -can be used to ‘safely’ infer function fororthologs: close homologs, genes evolved as a result of speciation (not duplication); likely to perform same function in different species ->comparions of the sequence tree and the species tree, can help identify orthologs. Inferring function for non-ortholog homologs -much more error prone. 7/10 genes will have a homolog in the sequence DB’s.. and some fraction of those will have a known 3D structure <40% sequence identity ->But the structural and functional features of the homolog cannot be transferred without additional analysis
Detection of remote homologs - Multiple alignments: -Psi-Blast: Position specific Iterated Blast -HMM Hidden Markov Models C - Other: -ISS Intermediate sequence search B A Sequence comparisons using multiple sequence alignments detect 3x as many homologs as pairwise alignments Park et al. (1998) J. Mol. Biol. 284, 1201-1210
Sequence comparisons using multiple sequence alignments detect 3x as many homologs as pairwise alignments Park et al. (1998) J. Mol. Biol. 284, 1201-1210 error rate 1/100,000 error rate 1/1000 PDBD40-J Database of 935 sequences with ≤40% sequence identity and known evolutionary relationships from SCOP: -Gap-Blast -Fasta -Psi-Blast: -SAM-98 -ISS 14 16 27 29 24 19 23 44 50 34 % homologs recognised NRDB90 Database of 152,228 non redundant sequences (<90% sequence identity) from other sequence DB’s SCOP
Structural proteomics: extending structure information to sequences Library of known folds New sequence Assign known fold from library Function Build detailed Atomic model
Detection of remote homologs across genomes Pfam... Slide incomplete