510 likes | 611 Views
Protein Functional Site Prediction. The identification of protein regions responsible for stability and function is an especially important post-genomic problem
E N D
Protein Functional Site Prediction • The identification of protein regions responsible for stability and function is an especially important post-genomic problem • With the explosion of genomic data from recent sequencing efforts, protein functional site prediction from only sequence is an increasingly important bioinformatic endeavor.
What is a “Functional Site”? • Defining what constitutes a “functional site” is not trivial • Residues that include and cluster around known functionality are clear candidates for functional sites • We define a functional site as catalytic residues, binding sites, and regions that clustering around them.
Phylogenetic motifs • PMs are short sequence fragments that conserve the overall familial phylogeny • Are they functional? • How do we detect them?
Phylogenetic motifs • PMs are short sequence fragments that conserve the overall familial phylogeny • Are they functional? • How do we detect them? • First we design a simple heuristic to find them • Then we see if the detected sites are functional
Scan for Similar Trees Whole Tree
Scan for Similar Trees Whole Tree
Scan for Similar Trees Whole Tree Windowed Tree
Scan for Similar Trees Whole Tree Windowed Tree Partition Metric Score: 6
Scan for Similar Trees Whole Tree Windowed Tree Partition Metric Score: 8
Scan for Similar Trees Whole Tree Windowed Tree Partition Metric Score: 4
Scan for Similar Trees Whole Tree Windowed Tree Partition Metric Score: 6
Scan for Similar Trees Whole Tree Windowed Tree Partition Metric Score: 8
Scan for Similar Trees Whole Tree Windowed Tree Partition Metric Score: 6
Scan for Similar Trees Whole Tree Windowed Tree Partition Metric Score: 6
Scan for Similar Trees Whole Tree Windowed Tree Partition Metric Score: 0
Scan for Similar Trees Whole Tree Windowed Tree Partition Metric Score: 6
Scan for Similar Trees Whole Tree Windowed Tree Partition Metric Score: 6
Scan for Similar Trees Whole Tree Windowed Tree Partition Metric Score: 8
Scan for Similar Trees Whole Tree Windowed Tree Partition Metric Score: 0
Scan for Similar Trees Whole Tree Windowed Tree Partition Metric Score: 6
Scan for Similar Trees Whole Tree Windowed Tree Partition Metric Score: 6
Scan for Similar Trees Whole Tree Windowed Tree Partition Metric Score: 6
Phylogenetic Motif Identification • Compare all windowed trees with whole tree and keep track of the partition metric scores • Normalize all partition metric scores by calculating z-scores • Call these normalized scores Phylogenetic Similarity Z-scores (PSZ) • Set a PSZ threshold for identifying windows that represent phylogenetic motifs
Map PMs to the Structure Set PSZ Threshold
Map PMs to the Structure Map Set PSZ Threshold
Map PMs to the Structure Map Set PSZ Threshold
TIM Phylogenetic Similarity False Positive Expectation
TIM Phylogenetic Similarity False Positive Expectation
TIM Phylogenetic Similarity False Positive Expectation
TIM Phylogenetic Similarity False Positive Expectation
Cytochrome P450 Phylogenetic Similarity False Positive Expectation
Cytochrome P450 Phylogenetic Similarity False Positive Expectation
Enolase Phylogenetic Similarity False Positive Expectation
Glycerol Kinase Phylogenetic Similarity False Positive Expectation
Glycerol Kinase Phylogenetic Similarity False Positive Expectation
Myoglobin Phylogenetic Similarity False Positive Expectation
Myoglobin Phylogenetic Similarity False Positive Expectation
Evaluating alignments • For a given alignment compute the PMs • Determine the number of functional PMs • Those identifying more functional PMs will be classified as better alignments
Functional PMs PAl=blue MUSCLE=red Both=green (a)=enolase, (b)ammonia channel, (c)=tri-isomerase, (d)=permease, (e)=cytochrome