1 / 16

Position Weight Matrices for Representing Signals in Sequences

Position Weight Matrices for Representing Signals in Sequences. Triinu Tasa, Koke 04.02.05. Definitions. Sequence, string – ordered arrangement of letters {'A', 'C', 'G', 'T'}

markmeyer
Download Presentation

Position Weight Matrices for Representing Signals in Sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05

  2. Definitions • Sequence, string – ordered arrangement of letters {'A', 'C', 'G', 'T'} • Pattern – simplified regular expression, alphabet {'A', 'C', 'G', 'T', '.'}, where '.' - wild-card of length 1 ('A', 'C', 'G' or 'T') Triinu Tasa, Koke 04.02.05

  3. What is a weight matrix? What is a weight matrix? GATGAG GATGAT TGATAT GATGAT or [GT][AG][TA][GT]A[GT] Triinu Tasa, Koke 04.02.05

  4. What is a weight matrix? Better: GATGAG GATGAT TGATAT Alignment matrix C: A 0 2 1 0 3 0 C 0 0 0 0 0 0 G 2 1 0 2 0 1 T 1 0 2 1 0 2 Frequency matrix F: A 0 0.7 0.3 0 1 0 C 0 0 0 0 0 0 G 0.7 0.3 0 0.7 0 0.3 T 0.3 0 0.7 0.3 0 0.7 Triinu Tasa, Koke 04.02.05

  5. What is a weight matrix? Or weight matrix W: where N – number of sequences used - a priori probability of letteri Triinu Tasa, Koke 04.02.05

  6. What is a weight matrix? Importance matrix I: I(i, j) = * A 0 1.4 0.3 0 3 0 C 0 0 0 0 0 0 G 1.4 0.3 0 1.4 0 0.3 T 0.3 0 1.4 0.3 0 1.4 Triinu Tasa, Koke 04.02.05

  7. Applications - Clustering Applications • Pattern clustering 1. G.GATGAG.T 62/75 1:39/49 2:23/26 R:17.3026 BP:1.12008e-37 2. G.GATGAG 89/110 1:45/60 2:44/50 R:10.436 BP:1.61764e-34 3. GATGAG.T 124/148 1:52/70 2:72/78 R:7.36961 BP:2.79148e-33 4. TG.AAA.TTT 132/145 1:53/61 2:79/84 R:6.84578 BP:1.83509e-32 5. AAAATTTT 200/231 1:63/77 2:137/154 R:4.69239 BP:1.19109e-30 6. TGAAAA.TTT 104/114 1:45/53 2:59/61 R:7.78277 BP:3.86086e-29 7. AAA.TTTT 343/537 1:79/145 2:264/392 R:3.05349 BP:5.66833e-29 8. G.AAA.TTTT 135/156 1:51/62 2:84/94 R:6.19534 BP:5.69933e-29 9. TG.GATGAG 49/57 1:30/35 2:19/22 R:16.1117 BP:9.35765e-28 10. TG.AAA.TTTT 86/91 1:40/43 2:46/48 R:8.87311 BP:1.1124e-27 ... Triinu Tasa, Koke 04.02.05

  8. Applications - Clustering G.GATGAG.T: GAGATGAGAT GTGATGAGAT GAGATGAGGT ... A -6.9 0.98 -6.9 1.38 -6.9 -6.9 1.38 -6.90.98 -6.9 C -6.9 -6.9 -6.9 -6.9 -6.9 -6.9 -6.9 -6.9 -6.9 -6.9 G 1.38 -6.9 1.38 -6.9 -6.9 1.38 -6.9 1.38 0.29 -6.9 T -6.9 0.29 -6.9 -6.9 1.38 -6.9 -6.9 -6.9 -6.9 1.38 Triinu Tasa, Koke 04.02.05

  9. Applications - Clustering Compare matrices with each other using the dynamic programming approach : where A, B – matrices i, j - columns If D(m,n) > threshold => matrices are different Triinu Tasa, Koke 04.02.05

  10. Applications - Clustering G.GATGAG.T TG.AAA.TTT AAAATTTT G.GATGAG TGAAAA.TTT AAA.TTTT GATGAG.T TG.AAA.TTTT We want to represent the clusters by logos: We need to align the patterns first – position the similar parts of the patterns above each other: G.GATGAG.T G.GATGAG-- --GATGAG.T or the logo will look like this: Triinu Tasa, Koke 04.02.05

  11. Applications – Multiple alignment Multiple Alignment Importance matrix I – represents the aligned patterns. Example: G.GATGAG.T GATGAG.T G.GATGAG 1. Insert the first pattern into I: ('.' gives 0.25 to each) A 0 0.25 0 1 0 0 1 0 0.25 0 C 0 0.25 0 0 0 0 0 0 0.25 0 G 1 0.25 1 0 0 1 0 1 0.25 0 T 0 0.25 0 0 1 0 0 0 0.25 1 2. Align the second pattern with I using a dynamic programming approach: Triinu Tasa, Koke 04.02.05

  12. Applications – Multiple alignment Dynamic programming matrix: G .G A T G A G . T G 0.00 0.10 0.01 0.10 0.00 0.00 0.10 0.00 0.10 0.01 0.00 A 0.00 0.00 0.11 0.00 0.20 0.00 0.00 0.20 0.00 0.11 0.00 T 0.00 0.00 0.01 0.00 0.00 0.30 0.00 0.00 0.00 0.01 0.21 G 0.00 0.10 0.01 0.11 0.00 0.00 0.40 0.00 0.10 0.01 0.00 A 0.00 0.00 0.11 0.00 0.21 0.00 0.00 0.50 0.00 0.11 0.00 G 0.00 0.10 0.01 0.21 0.00 0.00 0.10 0.00 0.60 0.01 0.00 . 0.00 0.00 0.10 0.01 0.21 0.00 0.00 0.10 0.00 0.60 0.01 T 0.00 0.00 0.01 0.00 0.00 0.31 0.00 0.00 0.00 0.01 0.70 G.GATGAG.T --GATGAG.T Triinu Tasa, Koke 04.02.05

  13. Applications – Multiple alignment 3. Add the pattern '--GATGAG.T' to I, if necessary add columns to the matrix. 4. Repeat the procedure for every pattern. Output: G.GATGAG.T G.GATGAG-- --GATGAG.T Why importance matrix? Triinu Tasa, Koke 04.02.05

  14. Applications – Multiple alignment Example: Pattern: GATG So far aligned: GATGATGTA- - - - GATGTGG We want: w(G, 4) > w(G, 1) > w(G, 9) Solution – importance matrix Triinu Tasa, Koke 04.02.05

  15. Applications – Weight matrix matching • Weight Matrix Matching Purpose:find the sequences that the weight matrix describes best in a given text file ...CATAGGAAATTCCACCTCTTTGGCTTTGCCCAGTCTTCCCTTGAGGATGCCTACGTTC... 1. Calculate the score for each position 2. if score > threshold => signal Problem: finding a good threshold • Threshold – 99.5% quantile Triinu Tasa, Koke 04.02.05

  16. Questions? Triinu Tasa, Koke 04.02.05

More Related