1k likes | 1.36k Views
USC, November 21st, 2006. 2. Outline. Information theory and genetics. USC, November 21st, 2006. 3. Outline. Information theory and geneticsBrief introduction to gene regulatory networks (GRN). USC, November 21st, 2006. 4. Outline. Information theory and geneticsBrief introduction to gene regulatory networks (GRN)Reverse engineering GRN.
E N D
1. Error-Control and Constrained Coding Solutions for DNA Microarrays and Aptamer Arrays Design Olgica Milenkovic
University of Colorado, Boulder
ITA Center, University of California, San Diego
2. USC, November 21st, 2006 2 Outline Information theory and genetics
3. USC, November 21st, 2006 3 Outline Information theory and genetics
Brief introduction to gene regulatory networks (GRN)
4. USC, November 21st, 2006 4 Outline Information theory and genetics
Brief introduction to gene regulatory networks (GRN)
Reverse engineering GRN
5. USC, November 21st, 2006 5 Outline Information theory and genetics
Brief introduction to gene regulatory networks (GRN)
Reverse engineering GRN
DNA microarrays
Production
Quality control coding
Error control coding
6. USC, November 21st, 2006 6 Outline Information theory and genetics
Brief introduction to gene regulatory networks (GRN)
Reverse engineering GRN
DNA microarrays
Production
Quality control coding
Error control coding
RNA aptamer arrays
SELEX (Systematic Enrichment of Ligands by Exponential Evolution)
Structured probe selection
RNA folds and grammars
15. USC, November 21st, 2006 15
16. USC, November 21st, 2006 16
17. USC, November 21st, 2006 17
18. USC, November 21st, 2006 18
19. USC, November 21st, 2006 19
20. USC, November 21st, 2006 20
24. USC, November 21st, 2006 24
25. USC, November 21st, 2006 25
26. USC, November 21st, 2006 26
27. USC, November 21st, 2006 27
28. USC, November 21st, 2006 28
29. USC, November 21st, 2006 29
30. USC, November 21st, 2006 30
31. USC, November 21st, 2006 31 Boolean Networks and Gene Interactions
32. USC, November 21st, 2006 32 Boolean Networks and Gene Interactions
33. USC, November 21st, 2006 33 Boolean Networks and Gene Interactions
34. USC, November 21st, 2006 34
35. USC, November 21st, 2006 35 Boolean Networks and Gene Interactions
36. USC, November 21st, 2006 36 Boolean Networks and Gene Interactions
37. USC, November 21st, 2006 37
38. USC, November 21st, 2006 38
39. USC, November 21st, 2006 39 DNA Microarrays and Aptamer Arrays
40. USC, November 21st, 2006 40 DNA Microarrays
41. USC, November 21st, 2006 41 DNA Microarrays
42. USC, November 21st, 2006 42 DNA Microarrays
43. USC, November 21st, 2006 43 DNA Microarrays
44. USC, November 21st, 2006 44 Probe Selection/Construction
45. USC, November 21st, 2006 45 Probe Selection/Construction
46. USC, November 21st, 2006 46 Probe Selection/Construction
47. USC, November 21st, 2006 47 Probe Selection/Construction
48. USC, November 21st, 2006 48 Probe Selection/Construction
49. USC, November 21st, 2006 49 Probe Selection/Construction
50. USC, November 21st, 2006 50 Minimum Hamming, Reverse and Reverse-Complement Hamming Distance
51. USC, November 21st, 2006 51 Minimum Hamming, Reverse and Reverse-Complement Hamming Distance
52. USC, November 21st, 2006 52
53. USC, November 21st, 2006 53 Constant GC Content
54. USC, November 21st, 2006 54 Probe Synthesis in Microarrays
55. USC, November 21st, 2006 55 Probe Synthesis in Microarrays
56. USC, November 21st, 2006 56 Probe Synthesis in Microarrays
57. USC, November 21st, 2006 57 Probe Synthesis in Microarrays
58. USC, November 21st, 2006 58
59. USC, November 21st, 2006 59
60. USC, November 21st, 2006 60 Base Scheduling Shortest asynchronous base schedule
Shortest common super-sequence of set of M sequences (NP-hard)
ESN(M,k) – expected length of a longest common subsequence of M randomly chosen sequences of length N over an alphabet of size k
61. USC, November 21st, 2006 61 Mask Design
62. USC, November 21st, 2006 62 Quality Control
63. USC, November 21st, 2006 63 Relevant Coding-Theoretic Ideas
64. USC, November 21st, 2006 64 Relevant Coding-Theoretic Ideas
65. USC, November 21st, 2006 65 Relevant Coding-Theoretic Ideas
66. USC, November 21st, 2006 66 Error-Correcting Microarray Design Probe multiplexing (Khan et.al, 2003, Shmulevich et.al. 2004)
67. USC, November 21st, 2006 67 VLSIPS/Analysis for Multiplexed Arrays (Milenkovic, 2006) Features:
Multiple polymer synthesis at one given spot
Can use two different classes of linkers sensitive to different wavelengths so to select probes for extension (say, `blue’ and `green’ and `cyan’)
68. USC, November 21st, 2006 68 VLSIPS/Analysis for Multiplexed Arrays
70. USC, November 21st, 2006 70 Mask Design / Scheduling
71. USC, November 21st, 2006 71 Quality Control Coding
72. USC, November 21st, 2006 72 DNA Microarrays and Aptamer Arrays
73. USC, November 21st, 2006 73 RNA Secondary and Tertiary Structure
74. USC, November 21st, 2006 74 RNA Secondary and Tertiary Structure
75. USC, November 21st, 2006 75 Secondary Structures
76. USC, November 21st, 2006 76 Aptamers SELEX (Systematic Evolution of Ligands by EXponential enrichment) – Archemix (Lary Gold, University of Colorado, Boulder)
77. USC, November 21st, 2006 77 Aptamers SELEX (Systematic Evolution of Ligands by EXponential enrichment) – Archemix (Lary Gold, University of Colorado, Boulder)
78. USC, November 21st, 2006 78 Aptamers SELEX (Systematic Evolution of Ligands by EXponential enrichment) – Archemix (Lary Gold, University of Colorado, Boulder)
79. USC, November 21st, 2006 79 Aptamers SELEX (Systematic Evolution of Ligands by EXponential enrichment) – Archemix (Lary Gold, University of Colorado, Boulder)
80. USC, November 21st, 2006 80 How Many Possible Shapes for a Secondary Structure Are There?
81. USC, November 21st, 2006 81 How Many Possible Shapes for a Secondary Structure Are There?
82. USC, November 21st, 2006 82 How Many Possible Shapes for a Secondary Structure Are There?
83. USC, November 21st, 2006 83 Results on Secondary Structures
84. USC, November 21st, 2006 84
85. USC, November 21st, 2006 85 Can One Do Better?
86. USC, November 21st, 2006 86 Can One Do Better?
87. USC, November 21st, 2006 87 Suggested by Vauchaussade &Viennot, 1985 Mapping Secondary Structures to Ternary Sequences
88. USC, November 21st, 2006 88
89. USC, November 21st, 2006 89 Interpretations Dyck, Motzkin and Schroeder words
90. USC, November 21st, 2006 90 Interpretations Dyck, Motzkin and Schroeder words
Dyck, Motzkin and Schroeder lattice paths
91. USC, November 21st, 2006 91 Interpretations Dyck, Motzkin and Schroeder words
Dyck, Motzkin and Schroeder lattice paths
Incomplete rooted binary trees
…
92. USC, November 21st, 2006 92 Interpretations Dyck, Motzkin and Schroeder words
Dyck, Motzkin and Schroeder lattice paths
Incomplete rooted binary trees
…
93. USC, November 21st, 2006 93 Mapping RNA Folded Shapes to Lattice Paths Definition: A lattice path of length n is a sequence of points P1,P2,…, Pn with n = 1, such that each point Pi belongs to the plane integer lattice and consecutive points Pi and Pi+1 are connected by a line segment.
94. USC, November 21st, 2006 94 Mapping RNA Folded Shapes to Lattice Paths Definition: A lattice path of length n is a sequence of points P1,P2,…, Pn with n = 1, such that each point Pi belongs to the plane integer lattice and consecutive points Pi and Pi+1 are connected by a line segment.
Definition: A Dyck path is a lattice path in the plane integer lattice consisting of steps (1,1) and (1,-1) which never passes below the x-axis.
95. USC, November 21st, 2006 95 Mapping RNA Folded Shapes to Lattice Paths Definition: A lattice path of length n is a sequence of points P1,P2,…, Pn with n = 1, such that each point Pi belongs to the plane integer lattice and consecutive points Pi and Pi+1 are connected by a line segment.
Definition: A Dyck path is a lattice path in the plane integer lattice consisting of steps (1,1) and (1,-1) which never passes below the x-axis.
Definition: A Motzkin path is a lattice path in the plane integer lattice consisting of steps (1,1), (1,-1), and (1,0) which never passes below the x-axis.
96. USC, November 21st, 2006 96 Mapping RNA Folded Shapes to Lattice Paths Definition: A lattice path of length n is a sequence of points P1,P2,…, Pn with n = 1, such that each point Pi belongs to the plane integer lattice and consecutive points Pi and Pi+1 are connected by a line segment.
Definition: A Dyck path is a lattice path in the plane integer lattice consisting of steps (1,1) and (1,-1) which never passes below the x-axis.
Definition: A Motzkin path is a lattice path in the plane integer lattice consisting of steps (1,1), (1,-1), and (1,0) which never passes below the x-axis.
Definition: A Schroeder path is a lattice path in the plane integer lattice consisting of steps (1,1), (1,-1), and (2,0) which never passes below the x-axis.
97. USC, November 21st, 2006 97 Lattice Path Examples ( ( ) ( ) )
||| ( | ( ) ( ( || ) ) | ( ) )
98. USC, November 21st, 2006 98 Main Idea: View `plausible’ secondary structures as Motzkin paths obeying certain constraints
99. USC, November 21st, 2006 99 Main Idea: View “plausible” secondary structures as Motzkin paths obeying certain constraints
Constraints account for biological properties of RNA molecules
100. USC, November 21st, 2006 100 Main Idea: View `plausible’ secondary structures as Motzkin paths obeying certain constraints
Constraints account for biological properties of RNA molecules
Use lattice enumeration techniques based on context-free grammars to find generating functions of counted objects
101. USC, November 21st, 2006 101 Main Idea: View `plausible’ secondary structures as Motzkin paths obeying certain constraints
Constraints account for biological properties of RNA molecules
Use lattice enumeration techniques based on context-free grammars to find generating functions of counted objects
“Extend” scope of constrained coding: from regular to context-free grammars
102. USC, November 21st, 2006 102 The Grammars
103. USC, November 21st, 2006 103 The Grammars
Regular : Production rules
104. USC, November 21st, 2006 104 The Grammars
Regular : Production rules
Context-free: Production rules
Each Xi terminal or non-terminal
105. USC, November 21st, 2006 105 Regular Grammars and Constrained Coding
106. USC, November 21st, 2006 106 Regular Grammars and Constrained Coding
107. USC, November 21st, 2006 107 Regular Grammars and Constrained Coding
108. USC, November 21st, 2006 108 Regular Grammars and Constrained Coding
109. USC, November 21st, 2006 109 Regular Grammars and Constrained Coding
110. USC, November 21st, 2006 110 Regular Grammars and Constrained Coding
111. USC, November 21st, 2006 111
112. USC, November 21st, 2006 112 Attribute Grammars: DSV and q-Method for Context Free Languages
113. USC, November 21st, 2006 113 Attribute Grammars: DSV and q-Method for Context Free Languages
114. USC, November 21st, 2006 114 Attribute Grammars
115. USC, November 21st, 2006 115 Attribute Grammars
116. USC, November 21st, 2006 116 The Stem-Loop Constraint: For a sequence over the alphabet
{ ( , ) , | },
let ls denote the length of a (maximal) run of ( symbols. The sequence is said to obey a stem-loop constraint if for each such ls, the maximal run ll of | symbols on the right of the ( run satisfies c1 = ll = c2 ls.
The Constraints
117. USC, November 21st, 2006 117 Example
118. USC, November 21st, 2006 118 Example
119. USC, November 21st, 2006 119 Example
120. USC, November 21st, 2006 120 Context-Free Grammar for the Stem-Loop Constraint M = Motzkin words
M+ = non-empty Motzkin words
T = non-empty trapezoids
Tk = non-empty trapezoids of height h>k, smallest width 3 and largest width k
Tk = non-empty trapezoids of height 2<h< k+1, smallest width 3 and largest width h
N = non-empty non-trapezoids
121. USC, November 21st, 2006 121 Schutzenberger’s approach
122. USC, November 21st, 2006 122 Generating Objects Described by Grammars
123. USC, November 21st, 2006 123 References O. Milenkovic and B. Vasic, “Information theory problems in genetics,” ITW 2004.
O. Milenkovic and N. Kashyap, “On the design of codes for DNA computing,” LNCS 2006.
O. Milenkovic, N. Kashyap, and B. Vasic, “Coding for DNA computers controlling gene expression levels,” CDC 2005.
O. Milenkovic, “Enumerating RNA motifs: a constrained coding approach,” Allerton 2006.
Note: Some of the pictures (cells etc) are taken from open Internet sources
124. USC, November 21st, 2006 124 LDPC Codes as BNs Boolean functions depend on:
Code graph;
Decoding algorithm;
Initial state of network variables (some cases);
Compared to random BN in Kaufmann’s model: network graph and functions chosen independently;
125. USC, November 21st, 2006 125 Properties of LDPC Boolean Networks
126. USC, November 21st, 2006 126 Properties of LDPC Boolean Networks
127. USC, November 21st, 2006 127 Properties of LDPC Boolean Networks
128. USC, November 21st, 2006 128 Robustness of LDPC Decoders
129. USC, November 21st, 2006 129 Density Evolution for Gene Regulatory Networks