440 likes | 513 Views
Applications of Symbolic Logic to Gene Regulation Systems. Speaker : Chuang-Chieh Lin. Department of Computer Science and Information Engineering of National Chung-Cheng University. Introduction to Myself. Chuang-Chieh Lin 林莊傑 Education Background
E N D
Applications of Symbolic Logic to Gene Regulation Systems Speaker : Chuang-Chieh Lin Department of Computer Science and Information Engineering of National Chung-Cheng University
Introduction to Myself • Chuang-Chieh Lin林莊傑 • Education Background • B.S. Department of Mathematics, National Cheng-Kung University, September 1998 – June 2002. • M.S. Department of Computer Science and Information Engineering, National Chi-Nan University, September 2002 – June 2004. • Advisor (2002 – 2004) • Professor R. C. T. Lee • Research • Biocomputing • Sequence Assembly • Evolutionary Trees • Gene Networks <recently> • Computational Geometry • Other topics in the field of Computer Algorithms Computation Theory Laboratory in National Chung-Cheng University
Outline • Introduction and Motivations • Symbolic Logic and the Resolution-Principle Method • Boolean Gene Regulatory Network • The State Determination Problem • The Implicit Interaction Finding Problem • Previous Work • Future Work Computation Theory Laboratory in National Chung-Cheng University
Introduction and Motivations • Genes are known as specific regions on a DNA sequence, and they carry information for manufacturing proteins. • A genome is all the DNA in an organism, including its genes. • DNA is made up of four similar chemicals (called bases and abbreviated A, T, C, and G) that are repeated millions or billions of times throughout a genome. The human genome has 3 billion pairs of bases. Computation Theory Laboratory in National Chung-Cheng University
Human genome sequencing was the most important target of Human Genome Project (HGP) which begun formally in 1990. • However, after the human genome sequencing was completed, the postgenomic era and the age of functional genomics have arrived. • One aspect of functional genomics is the understanding of how genes are expressed or regulated which is critically important to finding ways to fight diseases. • It has been found by scientists that diseases are often related to how genes are expressed and regulated. Computation Theory Laboratory in National Chung-Cheng University
To study genes, we have to understand gene expressions, which are the processes that hereditary information of genes transforms into mRNA or proteins. We also can call the gene expression of a gene “state”. • We say that a gene is activated if its process of making mRNA or a protein is executed; otherwise, we say that a gene is inhibited. Hereafter, we say that the gene expression or the state of a gene A denotes whether A is activated or inhibited. Computation Theory Laboratory in National Chung-Cheng University
catalyze phosphorylated protein catalyze P P transcription factor protein kinase protein phosphatase Protein transcription factor DNA Gene A Gene B Gene C Gene D Gene E Through the graph above, we know that each gene’s expression may affect other genes’ expressions. Actually, such affections include activations, inhibitions, etc. Computation Theory Laboratory in National Chung-Cheng University
Suppose we have “gene A activates gene B”, we obtain if gene A is activated, gene B will be activated and if gene A is not activated, gene B won’t be activated. • Similarly, we can obtain that if gene A is activated, gene B will be inhibited and if gene A is not activated, gene B will be activated from “gene A inhibits gene B”. activate A B inhibit A B Computation Theory Laboratory in National Chung-Cheng University
We say that “A is inhibited” is the same as “A is not activated”, and “A is activated” is the same as “A is not inhibited”. • Hence, we may consider the interactions and gene expressions as formulas in symbolic logic. • Now, let us go to get familiar with symbolic logic first. Computation Theory Laboratory in National Chung-Cheng University
Symbolic Logic • For symbolic logic, the symbols, such as A, B and C, are called atoms. • Formulas are defined recursively as follows: • An atom is a formula. • If G is a formula, then G is also a formula. • If G and H are formulas, then GH, GH, G H and GH are formulas, where , , and dente “or”, “and”, “imply” and “if and only if ” respectively. • All formulas are generated by applying the above three rules. Computation Theory Laboratory in National Chung-Cheng University
For example, • “A”, “B”, “C” are all formulas. • “A B” and “B C” are both formulas. • “(A B)” and“(A B) B C” are both formulas. Computation Theory Laboratory in National Chung-Cheng University
We define that an atom or the negation of an atom is a literal. For example, A, B, C are all literals. • Suppose we have formulas F1, F2, …, Fn, then F1 F2 … Fnis called the disjunction of F1, F2, …, Fnwhile F1 F2 … Fnis called the conjunction of F1, F2, …, Fn. Computation Theory Laboratory in National Chung-Cheng University
A disjunction of literals is called a clause. For example, AB, XYZ are both clauses. • A formula F is said to be in a conjunctive normal form if and only if F has the form F1 F2 … Fn, n 1, where each Fi is a clause, i = 1, 2, …, n. For example, (ABC) (PQR)is a formula in a conjunctive normal form. A (Q R) is also a formula in a conjunctive normal form. Computation Theory Laboratory in National Chung-Cheng University
An interpretation of G is an assignment of truth values to A1, A2, …, An in which every Ai, 1 i n, is assigned either T or F, but not both. A formula is said to be valid if and only if it is true under all its interpretations, while a formula is said to be inconsistent if and only if it is false under all its interpretations. • For example, “X Y X”is valid. “X X”is inconsistent. Computation Theory Laboratory in National Chung-Cheng University
Given formulas F1, F2, …, Fn and a formula G, G is said to be a logical consequence of F1, F2, …, Fn if and only if whenever F1 F2 … Fn is true then G is also true. That is, G is a logical consequence of F1, F2, …, Fn if and only if the formula (F1 F2 … Fn) Gis valid. • The resolution-principle method is a method for deducing logical consequences from a given set of clauses. We define the resolution principle method as follows. Computation Theory Laboratory in National Chung-Cheng University
The Resolution-Principle Method • For any two clauses C1 and C2, if there is a literal L1 in C1 that is complementary to a literal L2 in C2, then delete L1 and L2 from C1 and C2 respectively, and construct the disjunction of the remaining clauses. The constructed clause is a logical consequence of C1 and C2. • For example, Computation Theory Laboratory in National Chung-Cheng University
Through what we have discussed previously, how a gene regulates the other genes may be simply represented in symbolic logic. For example, activate A B inhibit A B Computation Theory Laboratory in National Chung-Cheng University
Note that we can also transfer the following case into formulas in symbolic logic. inhibit E A activate D inhibit inhibit F B activate C Computation Theory Laboratory in National Chung-Cheng University
In this thesis, “A” stands for “gene A is activated” while “A” stands for “gene A is not activated”, that is, “gene A is inhibited”. • For “A B”, “A B”, “A B” and “A B”, we have the following explanations. • “A B” means “If A is activated, B will be activated.” • “A B” means “If A is inhibited, B will be activated.” • “A B” means “If A is activated, B will be inhibited.” • “A B” means “If A is inhibited, B will be inhibited.” Computation Theory Laboratory in National Chung-Cheng University
Note that A B is equivalent to AB. Similarly, A B is equivalent to AB, A B is equivalent to AB A B is equivalent to AB. • Next, we are going to introduce a graphic model representing a system of given genes and the regulations between them. Computation Theory Laboratory in National Chung-Cheng University
+ + + – – – – Boolean Gene Regulatory Network • A Boolean gene regulatory network is shown as follows. • Genes A, B and C are called key regulators because no genes can affect each of them. A D AND E AND G C F B Computation Theory Laboratory in National Chung-Cheng University
After the Boolean gene regulatory network is given, we can consider two problems related to this graph model. • The State Determination Problem • The Implicit Interaction Finding Problem • To simplify our discussion, we abbreviate “the Boolean gene regulatory network” to “the Boolean network”. Computation Theory Laboratory in National Chung-Cheng University
+ + + – – – – The State Determination Problem • Assume that we are given the states of key regulators, determine other genes’ states. • Given: A Boolean network and the states of key regulators • Output: All genes’ states A 0 0: inhibited 1: activated D AND E AND G C F 1 B 1 Computation Theory Laboratory in National Chung-Cheng University
We can determine all genes’ states, that is, activated or inhibited, by the depth-first-search method or the resolution-principle method. • Note that we don’t consider any Boolean network with cycles or self-loops. In addition, the Boolean gates here we use are only AND gates. Computation Theory Laboratory in National Chung-Cheng University
+ + + – – – – • By the depth-first-search method: Stage 0: A 0 D AND E AND G C F 1 B 1 Key regulators: A, B, C Computation Theory Laboratory in National Chung-Cheng University
+ + + – – – – Stage 1: A 1 0 D 1 AND E AND G 0 C F 1 B 1 0 Computation Theory Laboratory in National Chung-Cheng University
+ + + – – – – Stage 2: A 1 0 D 1 AND E AND G 0 C F 1 B 1 0 Computation Theory Laboratory in National Chung-Cheng University
+ + + – – – – Stage 3: A 1 0 D 1 AND E AND G 0 C F 1 B 1 0 Computation Theory Laboratory in National Chung-Cheng University
+ + + – – – – • By the resolution-principle method: A 1 0 D 1 AND E A AND and B G 0 C C F 1 B 1 0 Computation Theory Laboratory in National Chung-Cheng University
…(1) …(2) …(3) …(4) …(5) …(6) …(7) …(8) …(9) …(10) …(11) Original Boolean network A …(12) B …(13) C …(14) Key regulators Computation Theory Laboratory in National Chung-Cheng University
(7)&(14) G ………………… (15) (1)&(12) BFD ………... (16) (13)&(16) FD ……………... (17) (5)&(13) F ………….......... (18) (17)&(18) D …………………. (19) (9)&(17) C E F ……….. (20) (14)&(20) E F ……………… (21) (18)&(21) E …………….…… (22) Computation Theory Laboratory in National Chung-Cheng University
The result can be summarized as follows. Computation Theory Laboratory in National Chung-Cheng University
This problem must be able to be solved based upon Lemma 1 and Theorem 1 as follows. • Lemma 1 A Boolean gene regulatory network which is free of cycles and free of self loops has at lease one node whose indegree, that is, the number of other genes that inhibits or activates it directly, is equal to 0. • Theorem 1 Assume that a Boolean gene regulatory network G and the states of all key regulators in G are given, then the states of all the nodes G can be all determined. Computation Theory Laboratory in National Chung-Cheng University
Lemma 1 and Theorem 1 are easy to be proved. Here we omit the detail of the proofs. • Now, let us go to discuss the other problem: the implicit interaction finding problem. Computation Theory Laboratory in National Chung-Cheng University
– + – + The Implicit Interaction Finding Problem • The implicit interaction finding problem is to derive more interactions which are previously unknown from a given Boolean gene regulatory network. • Given: A Boolean network • Output: Implicit interactions in the Boolean network A AND B D C Computation Theory Laboratory in National Chung-Cheng University
– + – + (1) (2) A (3) AND (4) B D (5) (6) (7) C (8) (9) (10) Computation Theory Laboratory in National Chung-Cheng University
– + – + – – + – – + • By applying the resolution principle method, we have A (11) (2)&(4) (1)&(3) (3)&(7) (13) B AND (12) D (13) (14) C (15) A AND B D C Computation Theory Laboratory in National Chung-Cheng University
Previous Work • In the analysis of gene regulation systems, a lot of results are related to constructing graphic gene regulatory networks. • For instance, Andreas Wagner proposed a method to reconstruct a gene regulatory network with core structure from given perturbation data. [W2001] How to Reconstruct a Large Genetic Network from n Gene Perturbations in fewer than n2 Easy Steps, Wagner, A., Bioinformatics, Vol. 17, No. 12, 2001, pp. 1183-1197. • Note that a perturbation is an experimental manipulation performed on a gene. Computation Theory Laboratory in National Chung-Cheng University
perturbation-list: 0: 2 16 1: 2: 3: 0 2 5 8 12 14 16 4: 5: 0 2 12 14 16 6: 0 2 5 12 14 16 7: 2 8 17 8: 9: 0 1 2 5 6 10 12 14 15 16 18 20 10: 0 1 2 5 6 12 14 16 18 20 11: 0 2 5 6 12 14 16 18 20 12: 0 2 14 16 13: 8 17 14: 0 2 16 15: 0 2 16 16: 2 17: 8 18: 19: 8 20: 0 2 5 6 12 14 16 18 Corresponding graph G will be very complicated, so we omit it here. Computation Theory Laboratory in National Chung-Cheng University
The modified perturbation-list 0: 16 1: 2: 3: 2 5 8 4: 5: 12 6: 5 12 7: 2 17 8: 9: 10 15 10: 1 20 11: 20 12: 14 13: 8 17 14: 0 15: 0 16: 2 17: 8 18: 19: 8 20: 6 18 Corresponding graph G 1 13 18 17 4 8 11 20 10 7 9 19 6 3 2 5 15 12 16 0 14 Computation Theory Laboratory in National Chung-Cheng University
Future Work • The identification problem • Other topics on biocomputing and computer algorithms Computation Theory Laboratory in National Chung-Cheng University
The Identification Problem • Given a set of genes and a set of results of perturbations performed on the genes. The identification problem is to determine whether there exists only one Boolean network consistent with the given data. • Akutsu et al. have shown that exponential perturbations are needed to identify the unique Boolean network. [AKMM98] Identification of Gene Regulatory Networks by Strategic Gene Disruptions and Gene Overexpressions, Akutsu, T., Kuhara, S., Maruyama, O. and Miyano, S., Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 1998, pp. 695-702. Computation Theory Laboratory in National Chung-Cheng University
– – – – – – + + + + + + + + Gene Name perturbations This Boolean network is consistent with the given data. However, we still have to test if there exists another Boolean network consistent with the given data. G I N B E A J F H I M OR Note that Boolean gates, including OR, AND, XOR, etc., are allowed in the solutions to this problem. K C D X2 AND X1 Computation Theory Laboratory in National Chung-Cheng University