1 / 46

Sequence Entropy

Sequence Entropy. Sequence Analysis. Significance of Alignment Positions. Observed occurrence of amino acids at some position in an alignment that deviates from expected may indicate some (functional) significance What ‘deviates from expected’? unlikely occurrences What is unlikely?

judah-weber
Download Presentation

Sequence Entropy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Entropy Sequence Analysis

  2. Significance of Alignment Positions • Observed occurrence of amino acids at some position in an alignment that deviates from expected may indicate some (functional) significance • What ‘deviates from expected’? • unlikely occurrences • What is unlikely? • only (relatively) few possibilities to obtain observed result

  3. Counting… • Number of possibilities W for finding given numbers Nx of aminoacids types x: • Problem: evaluates to huge numbers even for modest numbers of sequences…

  4. Sequence Entropy • Entropy: • Using Stirlings approximation:ln M! ≈ M(ln M-1) for M≫1

  5. Shannon’s ‘Information Entropy’: • ‘A Mathematical Theory of Communication’, The Bell System Technical Journal, Vol. 27, 1948. “ Can we define a quantity which will measure, in some sense, how much information is ‘produced’ by such a process, or better, at what rate information is produced? ” • He was thinking about the Transmission of Information, i.e., from a Source through some Channel to a Destination.

  6. Choice, Uncertainty and Entropy • A set of ‘events’ with probabilities p1, p2, …, pn • Is there a measure that indicates how much ‘choice’ is possible, given those probabilities? • If there is, it should be: • continuous for all pi • monotonic in n if all probabilities are equal • additive for ‘sub-events’

  7. Additivity: 1/2 1/2 1/2 1/3 1/3 2/3 1/2 1/3 1/6 1/6 H(1/2,1/2) H(2/3,1/3) H(1/2,1/3,1/6) = H(1/2,1/2) + 1/2 H(2/3,1/3)

  8. Solution: Entropy • the entropy of a set of probabilities pi • measures information, choice and uncertainty • zero only if only one pi is not zero • there is only one choice • maximal if all pi are equal • most ‘uncertain’ situation: all options are possible

  9. Information Content • Shannon was thinking about the Transmission of Information, i.e., from a Source through some Channel to a Destination. • …but it applies equally well to any type of ‘message’ • We can use it to measure the level of conservation in columns in an alignment

  10. Simple Example: Sequence Entropy p1 = p2 = ½ p1 = 0 p2 = 0 p2 = f(‘A’) p1 = f(‘L’)

  11. Conditional Probability / Prior Knowledge • Suppose we observe two things (A and B), and we suspect a relation (A causes B) • The fundamental question is then:How likely is A when we know B? • (n.b., this is Bayes statistics…) • Or, what is the uncertainty (entropy) of A knowing B?

  12. Conditional Entropy • Entropy of joint occurrence of value i for event x and value j for event y : • and • so that • i.e., the entropy of a joint event is less than or equal to the sum of the individual entropies • it is equal only if the events are independent

  13. Co-occurrence in practice • Measure of ‘mutial information’, by relative entropy: • more often written like: • i.e., what is the entropy in x, giveny

  14. Relative Entropy in Sequence Analysis • Many biological problems relate to questions like: “Why do these proteins do this, and those proteins not?” • or “Why do these patients get sick, and those not?” • The answer can be related to similarities and differences between sequences • Similarities (conservation) relate to functionally critical positions • Differences can explain functional differences

  15. Comparing groups of Sequences • For each position i in an alignment, we calculate the relative entropy for group Avs.B from the frequencies p (observed probabilities) of all aminoacid types x, as follows:

  16. 3.0 A p å i , x A p log i , x B p 2.5 x i , x B p å B i , x p log i , x A p 2.0 x i , x Entropy / Harmony 1.5 1.0 0.5 0.0 Simple Example: Relative Entropy ‘A/B’ A B

  17. Relative Entropy • Captures similarities and differences • Is infinite for completely dissimilar positions • e.g., A vs. L • Not symmetrical: • Maybe not easy for selecting dissimilar positions

  18. 3.0 2.5 2.0 Entropy / Harmony 1.5 B p å i , x B p log i , x AB p x i , x 1.0 A p å i , x A p log i , x AB p x i , x 0.5 0.0 Simple Example: Relative Entropy ‘A/AB’ A B

  19. Relative Entropy (A/AB) • Also captures similarities and differences • Is no longer infinite for completely dissimilar positions • Only symmetrical for equal size groups • in practice, not symmetrical • Maybe still not too easy for selecting dissimilar positions

  20. Measuring Overlapping Distributions • Weigh both groups equally; take pA+pB in stead of pAB : • Fixed interval [0,1], but not completely symmetrical

  21. 3.0 2.5 2.0 A B p p å å i i , , x x A B p p log log Entropy / Harmony 1.5 i i , , x x + + A A B B p p p p x x i i , , x x i i , , x x 1.0 0.5 0.0 Entropy vs. Sequence Harmony: Example A B

  22. Sequence Harmony • Introduce symmetry by averaging: • May seem a trivial choice, but: Pirovano, Feenstra & Heringa. “Sequence Comparison by Sequence Harmony Identifies Subtype Specific Functional Sites”, Nucleic Acids Res., in press (2006).

  23. 3.0 2.5 2.0 Entropy / Harmony 1.5 1.0 0.5 0.0 Entropy vs. Sequence Harmony: Example A B

  24. Analyzing multiple groups • Relative Entropy and Sequence Harmony are defined to compare a set of groups (N=2) • problem for multiple groups (N>2) • A solution: Entropy-variability plots • variability is number of different aminoacid types in a certain alignment position • problem: variability always tends towards maximum (20) for larger number of sequences • Another solution: Two-entropy plots • Total family entropy vs. sum of sub-family entropy

  25. Two-Entropies analysis of GPCRs • Goal: to identify the function of individual positions Ye, Lameijer, Beukers & IJzerman. “A Two-Entropies Analysis to Identify Functional Positions in the Transmembrane Region of Class A G Protein-Coupled Receptors”, Proteins 63, pp1018–1030 (2006)

  26. G-protein Coupled Receptors (GPCRs) • Huge family of integral cell-membrane proteins • crucial in signal transduction • 70 subfamilies • 1935 Class A GPCRs • Three main regions: • extracellular side • transmembrane (TM) • cytoplasmic side

  27. Two-entropies vs. Entropy-variability:upper vs. lower domain Red: upper domainBlue: lower domain

  28. Two-entropies vs. Entropy-variability:relative solvent accessibility (RSA) Red: RSA < 15%Blue: RSA > 15%

  29. Two-entropies vs. Entropy-variability:ligand binding site vs. other positions Subfamily specific binding Red: <4 Å from retinalBlue: >4 Å from retinal Subfamily specific binding Common activation mechanism Common activation mechanism

  30. GPCR Two-entropies analysis Red: ligand binding Green: coupling & activation Blue: others

  31. GPCR Functional Site Prediction: Method Comparison Finding known sites for ligand binding in Bovine Rhodopsin (left) and Aminergic Receptors (right)

  32. 1 0 260 280 300 320 340 360 380 400 420 440 460 Smad-MH2 Alignment & Sequence Harmony • Walter Pirovano*, K. Anton Feenstra* and Jaap Heringa. “Sequence Comparison by Sequence Harmony Identifies Subtype Specific Functional Sites”, Nucleic Acids Res., in press (2006). • K. Anton Feenstra, Walter Pirovano and Jaap Heringa. “Sub-type Specific Sites for SMAD Receptor Binding Identified by Sequence Comparison using ‘Sequence Harmony’ ”. in: From Computational Biophysics to Systems Biology. pp. 73-78. Eds. U.H.E. Hansmann, J. Meinke, S. Mohanty and O. Zimmermann, Jülich, NIC Series, Vol. 34, 2006. • Elena Marchiori*, Walter Pirovano, Jaap Heringa and K. Anton Feenstra*. “A Feature Selection Algorithm for Detecting Subtype Specific Sites for Smad Receptor Binding”, Bio-ICMLA06, accepted (2006).

  33. 262 270 280 290 300 310 AR BR Smads: Comparing two Groups

  34. Smad-MH2 Alignment & Functionally Specific Sites • 29 known sites of functional specificity • based mostly on site-specific mutants and characterized on affinity for binding to BMPR-I vs. TBR-I receptor types

  35. Finding Low-harmony sites in Smad-MH2 270 280 290 300

  36. Finding Low-harmony sites in Smad-MH2 Pirovano, Feenstra & Heringa. “Sequence Comparison by Sequence Harmony Identifies Subtype Specific Functional Sites”, Nucleic Acids Res., in press (2006).www.few.vu.nl/~feenstra/articles/NAR 2006 Sequence Harmony.pdf

  37. Smad-MH2: Low Harmony Patches

  38. Smad-MH2: Functional Clusters R427 TbR-I/BMPR-I/ALK1/2 A323 receptor-binding M327 T430 V325 TbR-I/ALK1/2 Q284 TbR-I/BMPR-I A354 R410 V461 W368 P378 R462 C463 P360 ? Y366 Q407 FAST1, Mixer, SARA S460 R334 Q400 Q364 N381 R365 L440 ? T298 R337 F346 co-repressors A392 SARA/Mixer L297 retention & transcription factors Q309 N443 c-Ski/SnoN I341 S308 P295 F273 Q294 A272 SARA S269 T267

  39. Conclusions Smad-MH2 • 40 Sites of Low Sequence Harmony in Smad-MH2 • different between the AR (TGF-b) and BR (BMP) sub-type Smads • Low Harmony sites in Smad-MH2 are functionally relevant • Other methods cannot select all known sites! • Functional Sites are Interaction Surfaces on Protein Surface: • Next: Analyze Interaction Partners in the Pathway • 14 Low Harmony Sites in Smad-MH2 of unknown function • 11 putative functions from structural considerations • promising candidates that determine TGF-b/BMP specificity • confirm (or rebuke) putative functions?

  40. Sequence Harmony Webserver http://www.ibi.vu.nl/programs/seqharmwww1-b/

  41. Sequence Harmony Webserver: Groups

  42. Sequence Harmony Webserver: Reference

  43. Sequence Harmony Webserver: Structure

  44. SMAD Sequence Harmony: Raw Table

  45. SMAD Sequence Harmony: Results

  46. SMAD Sequence Harmony: Structure

More Related