650 likes | 794 Views
Reasoning on the Web: Theory, Challenges, and Applications in Bioinformatics. Contents. Motivation Beyond the web: Rules, Reasoning, Semantics, Ontologies Semantics of Deduction Rules Argumentation Semantics Fuzzy Reasoning Reaction rules Vivid Agents Prova
E N D
Reasoning on the Web:Theory, Challenges, and Applications in Bioinformatics
Contents • Motivation • Beyond the web: Rules, Reasoning, Semantics, Ontologies • Semantics of Deduction Rules • Argumentation Semantics • Fuzzy Reasoning • Reaction rules • Vivid Agents • Prova • Applications in Bioinformatics
LLNE YLEEVE EYEEDE The Web • A great success story, but… • it’s the web for humans, not machines • Many areas, such as biology, have fully embraced the web • Human genome project is only tip of the iceberg • More than 500 tools and databases online
>12.000.000 literature abstracts Great resource if one knows what one is looking for “Kox1” has 17 hits But “diabetes” will produce >200.000 Often need to automatically process abstracts Example: Pubmed
Title Author Year Journal Results of PubMed • Lorenz P, Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation.Biol Chem. 2001 Apr;382(4):637-44. • Fredericks WJ. An engineered PAX3-KRAB transcriptional repressor inhibits the malignant phenotype of alveolar rhabdomyosarcoma cells harboring the endogenous PAX3-FKHR oncogene.Mol Cell Biol. 2000 Jul;20(14):5019-31.... However, to a machine things look different!
Results of PubMed • Lorenz P, Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation.Biol Chem. 2001 Apr;382(4):637-44. • Fredericks WJ. An engineered PAX3-KRAB transcriptional repressor inhibits the malignant phenotype of alveolar rhabdomyosarcoma cells harboring the endogenous PAX3-FKHR oncogene.Mol Cell Biol. 2000 Jul;20(14):5019-31.... Solution: tag data (XML)
Results of PubMed • <author>Lorenz P</author><title>Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation. </title><journal>Biol Chem</journal><year>2001<year> • <author>Lorenz P</author><title>Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation. </title><journal>Biol Chem</journal><year>2001<year> • ... However, to a machine things look different!
Results of PubMed • <author>Lorenz P</author><title>Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation. </title><journal>Biol Chem </journal><year>2001<year> • <author>Lorenz P</author><title>Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation. </title><journal>Biol Chem </journal><year>2001<year> • ... Solution: use ontologies (Semantic Web)
Biologists have recognised the problem of semantic inter-operability between disparate information sources GeneOntology (GO) is effort to provide common vocabulary for molecular biology GO has >10.000 terms in three branches “function”, “process”, “localisation” GeneOntology
GeneOntology • Has 13 levels • Width broadens to level 6 (3885 terms wide) then shrinks • Number of leaves per levels broadens to level 6 (1223 leaves) then shrinks • Average term has 4 words • Maximal term has 29 words: Oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, 2-oxoglutarate as one donor, and incorporation of one atom each of oxygen into both donors Breadth of GO
Motivation Summary • Web in the old days • HTML (for humans) • Web these days • HTML • XML, Ontologies (for machines) • Web of the future • HTML • XML, Ontologies • rules, reasoning, semantics • access to computational resources (a la grid-computing)
Open Problems • Part I: Theory of rules and reasoning on the web: • Knowledge representation: Which level of expressiveness? • Semantics: How to guarantee inter-operability • Reasoning: Fuzzy reasoning and unification • Reactivity: Vivid agents • Part II: Applications of rules and reasoning on the web: • Integration and querying of information sources • Integration: transmembrane prediction tools • Integration: protein structure DB and structure classification • Consistency checking • Ontology: If A is B and B is C, then the ontology should not explicitly mention A is C, as it is already implicit • Annotation: Do different tools agree or disagree?
The wider Picture: www.RuleML.org • Goal: develop Web language for rules • using XML markup, • formal semantics, and • efficient implementations. • Rules: derivation rules, transformation rules, and reaction rules. • RuleML can thus specify queries and inferences in Web ontologies, mappings between Web ontologies, and dynamic Web behaviors of workflows, services, and agents. • Currently, some 30 international members and close collaboration with W3C
The wider Picture: REWERSE • Reasoning on the Web with Rules and Semantics • FP6 Network of Excellence with nearly 30 partners • Working groups on Infrastructure and Applications • Composition • Typing • Policies • Querying • Reactivity and evolution • Personalised Web sites • Calendar systems • Bioinformatics
Part I: Theory • Motivation: Expressive Knowledge Representation • Part I.a: Argumentation as LP semantics • Notions of attack and justified arguments • Hierarchy of semantics • Proof procedure • Part I.b: Fuzzy unification and argumentation • Fuzzy negation • Fuzzy argumentation • Fuzzy unification • Part I.c: Vivid Agents
Part I.a: A Hierarchy of Semantics • RuleML caters for different degrees of knowledge representation • A hierarchy of semantics is required to guarantee inter-operation. • Analogy: In HTML, <b>Michael</b> will be interpreted differently in Netscape (Michael) and the text-based browser Lynx (Michael). • Problem: How can we guarantee inter-operability between different interpretations of rules?
Knowledge representation • Pete earns 500.000$ p.a. • earns(pete,500000). • Cross the street if there are no cars • cross not car • cross car • The fridge is quite cheap • cheap(fridge):70% • Does Mike live in Londn? • address(mike,london) = address(mike,londn): 95%
fdFB fdDB dDB dFB fDB fFB rDB rFB fuzzy deductive negation Knowledge System Cube • r: relational • f: fuzzy • d: deductive • DB: database • FB: factbase
Part I.a:Argumentation as semantics for Extended Logic Programs fdFB fdDB • f: fuzzy • d: deductive • DB: database • FB: factbase dDB dFB fDB fFB rDB rFB fuzzy deductive negation
Extended Logic Programming • Logic Programming with 2 negations • Default negation: not p : true if all attempts to prove p fail. • Explicit negation: p : falsehood of a literal may be stated explicitly. • Coherence principle: p not p
Argumentation • Interaction between agents in order to • gain knowledge • revise existing knowledge • convince the opponent • solve conflicts • Elegant way to define semantics for (extended) logic programming • Dung • Kowalski, Toni, Sadri • Prakken & Sartor • Etc.
Arguments • An argument is a partial proof, with implicitly negated literals as assumptions. • Argument = sequence of rules
Attacking arguments • Two fundamental kinds of attack: • A undercuts B = A invalidates premise of B • P: Let’s go to the lake as it is not snowing anymore • O: Hang, it is snowing • A rebuts B = A contradicts B • P: Let’s go to the lake as it is not snowing • O: Let’s not, as I’ve got to prepare my talk • Derived notions of attack used in Literature: • A attacks B = A u B or A r B • A defeats B = A u B or (A r B and not B u A) • A strongly attacks B = A a B and not B u A • A strongly undercuts B = A u B and not B u A
Proposition: Hierarchy of attacks Attacks = a = u r Defeats = d = u ( r - u -1) Undercuts = u Strongly attacks = sa = (u r ) - u -1 Strongly undercuts = su = u - u -1
Fixpoint Semantics • Argumentation: • game between proponent and opponent • argument A is acceptable if opponent’s x-attack is countered by proponent’s y-attack, which proponent already accepted earlier. • Acceptable • Let x,y be notions of attack. • An argument A is x,y-acceptable w.r.t. a set of arguments S iff • for every argument B, such that (B,A) x, there is a C S such that (C,B) y • Fixpoint semantics • Fx/y (S) = { A | A is x,y-acceptable w.r.t. S } • x/y-justified arguments = Least Fixpoint of Fx/y. • x/y-overruledarguments = x-attacked by a justified argument. • x/y-defensible iff neither justified nor overruled
Theorem: Relationship of semantics Prakken and Sartor’ssemantics w/o priorities If opponent is allowed to attack,type of defense does not matter • Weakening opponent or strengthening proponent increases justified arguments • Different notions of acceptability give rise to different argumentation semantics If opponent is allowed defeat,type of defense does not matter Dung’s groundedargumentation semantics WFSX su/a=su/d If opponent is allowed undercut,defense with (a,u,sa) or without(su,u) rebut makes a difference su/u su/sa sa/u=sa/d=sa/a su/su u/a=u/d=u/sa sa/su=sa/sa u/su=u/u d/su=d/u=d/a=d/d=d/sa a/su=a/u=a/a=a/d=a/sa
Proof procedure • Dialogues: • x/y-dialogue is sequence of moves such that • Proponent and Opponent alternate • Players cannot repeat arguments • Opponent x-attacks Proponent’s last argument • Proponent y-attacks Opponent’s last argument • Player wins dialogue if other player cannot move • Argument A is provably justified if proponent wins all branches of dialogue tree with root A • Concrete implementation SLXA: • Since u/a=u/d=u/sa=WFSX compute justified arguments with top-down proof procedure SLXA for WFSX [Alferes, Damasio, Pereira] • SLXA can be adapted for other notions
Part I.b:Fuzzy unification and argumentation fdFB fdDB • r: relational • f: fuzzy • d: deductive • DB: database • FB: factbase dDB dFB fDB fFB rDB rFB fuzzy deductive negation
Classical Fuzzy Logic • Solution: • Truth values in [0,1] instead of {0,1}. • Assertions: • p:V (p a formula, V a truth value). • Conjunction: • p:V, q:W p q : min(V,W) • Disjunction: • p:V, q:W p q : max(V,W) • Inference: • p q1, …, qn ; q1:V1, …, qn:Vnp : min(V1, …, Vn)
Fuzzy Negation • Classical fuzzy negation: • L:V L: 1-V (Zadeh) • Our setting (fuzzy adaptation of WFSX): • L:V and L:V’ with V’ 1-V possible • L and L not directly related.
Fuzzy Coherence Principle • If L:V and V > 0, and not L:V’, then V’ > V. • “If there is some explicit evidence that L is false, then there is at least the same evidence that L is false by default.” • If L:V and V > 0, then not L: 1.
p p :V V > 0 possible Contradictory programs! not p p : V V > 0 possible By coherence principle! Contradiction removal not p p : V V > 0 p p : V V = 0 possible p is unknown Law of excluded... ...contradiction ...middle
Strength of an argument • Strength of an argument: • Fact: value is given • Rule: minimum of body literals • Argument: Conclusion • Least fuzzy value of the facts contributing to the argument.
Theorems • Theorem (Soundness and Completeness) There is a justified argument of strength V for L iff There is a successful T-tree of truth value V for L • Theorem (Conservative Extension) Argumentation semantics is a conservative extension of WFSX.
Application: Fuzzy unification • Open systems: • knowledge and ontologies may not match • interaction with humans • “Does Mike live in Londn?” • Approach: • address(mike,london) = address(mike,londn): 95% • adapt unification algorithm(normalised edit distance over trees net) • embed into argumentation framework
Finding Mismatches: Edit distance • Edit distance between strings A and B: • minimal number of delete, add, replace operations to convert A into B. • efficient implementation with dynamic programming • Example: • e(address,adresse)=2, e(007,aa7)=2 • Normalise: • ne(A,B) = e(A,B) / max{ |A|, |B| } • Trees: • net = sum of all mismatches divided by sum of all max lengths
Fuzzy unification and arguments • net is conservative extension of MGU (most general unifier) • net(t,t’) ne(t,t’) • Adapt definition of argument for fuzzy unification • V-argument: for all L in a body, there is L’ in head such that net(L,L’) 1-V • A V-undercuts B if A contains not L and B’s head is L’ and net(L,L’) 1-V • A V-rebuts B if A’s head is L and B’s head is L’ and net(L,L’) 1-V • Adapt previous definitions accordingly
Comparison: Argumentation • Our framework allows us to relate existing and new argumentation semantics: • Dung= a/su=a/u=a/a=a/d=a/sa • Prakken&Sartor = d/su=d/u=d/a=d/d=d/sa • WFSX = u/a = u/d = u/sa • Dung Prakken&Sartor WFSX • Proof Theory and Top-down Proof Procedure adapted from Alferes, Damasio, Pereira’s SLXA
Comparison: Fuzzy Argumentation • Wagner: • Scale: -1 to +1 • Unlike WFSX, he relates F and F: F: -V iff F:V • We adopted his interpretation for not:not F:1 if F:V, V>0 • Relates his work to stable models, but there is no top-down proof procedure for stable models [Alferes&Pereira] • Our approach conservatively extends WFSX, hence we can adapt proof procedure SLXA
Comparison: Fuzzy unification • Arcelli, Formato, Gerla • define abstract fuzzy unification/resolution framework • cannot deal with missing parameters (common problem [Fung et al.]) • no conservative extension of classical unification • we use concrete distance: edit distance • Evaluated idea on bioinfo DB
Conclusion • “A database needs two kinds of negation”(Wagner) • Argumentation is an elegant way of defining semantics • Our framework allows classification of various new and existing semantics • Efficient top-down proof procedure for justified arguments • Argumentation as basis for belief revision (REVISE) • We cover the whole knowledge system cube including fuzzy argumentation • Defined fuzzy unification, which is useful in open systems
Part I.c: Vivid Agent • A vivid agent is a software-controlled system, • whose state is represented by a knowledge base and • whose behaviour is represented by • action- and • reaction rules • Actions are planned and executed to achieve a goal • Reactions are triggered by events • Epistemic RR: Effect <- Event, Cond • Physical RR: Action, Effect <- Event, Cond • Interaction RR: Msg, Effect <- Event, Cond
Intentions Goals Action rules Goals Planner Believes Believes KB Vivid Agent Interface Events Reaction Rules Perception Reaction Cycle Believes/ Updates KB
Agent State and Transition Semantics • Agent State: • Event queue, Plan queue, Goal queue, Knowledge base • Transition semantics • Perception • Add event to agent’s event queue • Reaction • Pop event from event queue, execute reactions including update of knowledge base • Plan execution • Execute action of plan in plan queue • Replanning • If action fails, replan • Planning • Pop goal from goal queue and generate plan
Implementation in Prova • Original Implementation in PVM-Prolog • Course-grain parallelism (PVM) for each agent and Prolog threads for an agent’s components • Currently: Prova • is a Java-based rule engine • easy integration of all kinds of data sources. e.g., database, web services, etc.
LLNE YLEEVE EYEEDE Part II: Application to Bioinformatics • NSF and EU’s strategic research workshop found that bioinformatics could play the role for the semantic web, which physics played for the web. • Why? • Masses of information • Masses of publicly accessible online information • (e.g. 8000 abstracts per month and over 500 tools) • Data (more and more often) published in XML • Data standards are accepted and actively developed • Much valuable information scattered (as production cheap and hence not centralised) • Systemsintegration and interoperation prime concern (e.g. GeneOntology)
Source Source Example: Information Agents for… • … Protein interactions • PDB, SCOP • … Protein annotation • TOPPred, HMMTOP,… • Information source • Wrapper • Mediator • Facilitator Facilitator Mediator Wrapper Source Wrapper Wrapper
Example 1: Protein Interaction: • PDB: Protein structures • SCOP: Structure classification
Example 1: Protein Interaction: How it is currently done • PDB: 15 Gigabyte in flat files • SCOP: 3 flat files • How? • Download PDB, SCOP files • Think up DB schema and populate MySQL DB • Run some Perl scripts on various machines, that grind through the data and analyse it • Run some Java to visualise results • Problem: “Business logic” not separated