Structural Induction: towards Automatic Ontology Elicitation

Structural Induction: towards Automatic Ontology Elicitation Adrian Silvescu

Induction • We go around the world and we interact with it • In the stream of experiences we notice regularities which we call patterns/laws • Sometimes we lump these patterns/laws into more encompassing theories • We call knowledge the set of all patterns that we are aware of at a certain point in time.

Automatic Induction • How can we make artificial machines that are capable of induction? • We need to say • What do we mean by knowledge? – KRP • How to derive it form data? – LP

Outline • Introduction • What do we mean by Knowledge? • Abstraction SuperStructuring Normal Form • How to derive it from data? • Combining Abstraction + SuperStructuring • The other chapters from the thesis • Conclusions and Contributions

Computationalism • Computationalistic Assumption (CA): The most general way to represent (finite) theories is as an entity expressed in a Turing equivalent formalism. • a.k.a. Church-Turing thesis

Induction by Enumeration [Solomonoff’64] • Induction : Exp_streams → TM • Out: “smallest” TM which reproduces the data (Exp_stream) • for all Turing Machines • simulate their computations by dovetailing • If TM produces the Exp_stream • update “smallest” TM if needed

Theory1 Data EE vs. DP - Compositionality Induction Induction Theory2 … Theory2 Theoryn Theoryn Theory1 … … Data Data Data

Generative grammars • TM equivalent • G=(N,T,S,R) – Theory • N – NonTerminals – Internal Variables • T – Terminals – Observables • R – Rules {α → β} α,βє (NUT)*– Laws • w єT*–Observations Stream • Derivation S →*w, w єT* - Explanation

Example S • S → A|B • A → CD • B → EF • F → GH • EG → J • C|H → K • J → a • K → b B E F G H J K a b

Motivation • There are many (infinite) rules α → β that we can invent – how can we get a finite set of atoms? • We search for a fundamental set of operations based on which theories can be constructed

Fundamental Operations • Abstraction - grouping similar entities under one overarching category. • e.g., (cow, pig, monkey) → mammal • Super-Structuring - grouping into a unit topologically close entities - in particular spatio-temporally close • e.g., (chassis, [on top of] wheels) → car

Main Theorem: GEN -ASNF • Abstraction SuperStructuring Normal Form: • Any Grammar G=(N,T,S,R) can be rewritten using only rules of the form: • A → B – Renaming (REN) • A → BC – SuperStructure (SS) • A → a – TERMINAL • AB → C – Reverse SS (RSS) • 2-4 can be made strongly unique

Renamings and Abstractions • Renamings form a directed graph G=(N, {(A,B) єREN}) • A → B1, … ,A → Bn – Abstraction (ABS) • A1→ B, … ,An→ B – Reverse Abstraction (RABS) • S → A|B • A → CD • B → EF • F → GH • EG → J • C|H → K • J → a • K → b C H S A B K

Fundamental Operations • A → B|C – Abstraction (ABS) • A → BC – SuperStructure (SS) • A → a – TERMINAL • A|B → C – Reverse ABS (RABS) • AB → C – Reverse SS (RSS)

Example S • S → A|B • A → CD • B → EF • F → GH • EG → J • C|H → K • J → a • K → b B E F G H J K a b

Two types of hidden variables • Mixture models - RABS • H1 → A, H2 → A • Either cause can produce the effect • Co - occurring causes – RSS • H1H2 → A • Both causes need to be present and also respect the topological constraints • For complete topology this is just AND

Radical Positivism • Empirical laws only • Every contraption directly traceable to Observables • Hidden Variable are eliminated – only indirect connection • No RABS or RSS w ABS+SS S

God’s e-mail w RABS+RSS REN+SS Conjecture: ABS + SS S

Hume’s Claim • “I do not find that any philosopher has attempted to enumerate or class all the principles of association [of ideas]. ... To me, there appear to be only three principles of connexion among ideas, namely, Resemblance, Contiguity in time or place, and Cause and Effect” – David Hume, Enquiry concerning Human Understanding, III(19), 1748. • Resemblance – Abstraction (ABS) • Contiguity – SuperStructuring (SS) • Cause & Effect – RABS + RSS

Theory Review • Abstraction + SuperStructuring Thesis • ABS, SS, RABS, RSS enough for TM eq. • Rationales for Hidden Variable • RSS and RABS • Radical Positivism (ABS + SS) • Proof of Hume’s claim (under CA)

Induction of ABS+SS models • Abstraction and SuperStructuring only • No recursion • Radical Positivist Setup (w/o recursion) • Sequence classification setup • Superstructures are k-grams

Sequence Classification with feature construction Class Classifier (Naïve Bayes Multinomial) Construct Features S1 S2 S3 S4 S5 S6

Feature Construction – ABS + SS Class Classifier A1:{S1S2,S3S4} A2:{S2S3} A3:{S4S5,S5S6} S1 S2 S2 S3 S3 S4 S4 S5 S5 S6 S1 S2 S3 S4 S5 S6

Learning Abstractions All {k1,k2,k3,k4,k5,k6} {k2,k3,k4,k5,k6} Most similar! {k2,k3,k4} {k7,k8,k9} {k2,k3} {k5,k6} {k7,k8} {k1} {k2} {k3} {k4} {k5} {k6} {k7} {k8} {k9}

Similarity Distance • Distance between P(C|f1) and P(C|f2) where f1 appears n1 times and f2 appears n2 times in the dataset

Data Sets • Protein sequence classification based on their sub-cellular localization • 2 datasets: • Eukaryotes (2427 sequences) – 4 classes • Prokaryotes (997 sequences) – 3 classes • Average seq. length ~300 aminoacids • unigrams ~20, 2-grams ~400, 3-grams ~8000

Experimental setup • UNIGRAM – Base features (~20) • ABS_ONLY – Abstractions of unigrams • SS_ONLY – k-grams (either 2 or 3) => (either ~400 or ~8000 features) • FSEL+SS – Feature Selection applied to SS (k-grams) based on Information Gain • ABS+SS - Abstraction applied to SS (k-grams)

Eukaryotes – 3-grams

Eukaryotes – 2-grams

Prokaryotes – 3-grams

Prokaryotes – 2-grams

Experiments Review • Simplest ABS+SS combination • ABS+SS better that FSEL+SS, ABS alone or BASE features (Acc. & size). • For 1%-2% loss in Accuracy and sometimes even gain ABS+SS reduces model size by 1-3 orders of magnitude over SS alone

Temporal Boolean Networks (2) [Silvescu and Honavar 2001]

Naïve Bayes k – NB(k) (3) [Silvescu, Andorf, Dobbs and Honavar 2004], [Andorf, Silvescu, Dobbs and Honavar 2004] NBk Naïve Bayes S2 S3 S4 S5 S1 S2 S2 S3 S3 S4 S4 S5 S5 S6 S1 S2 S3 S4 S5 S6 JTT

AVT-Learner (Abstractions) (4) [Kang, Silvescu, Zhang and Honavar, 2004] Odor {m,s,y,f,c,p} {s,y,f,c,p} {s,y,f} {a,l,n} {s,y} {c,p} {a,l} {m} {y} {s} {f} {c} {p} {a} {l} {n}

Factorization Theorem: Pairwise to Holistic Decomposability (7) [Silvescu and Honavar 2006]

Conclusions • Abstraction + SuperStructruing thesis • ABS, SS, RABS, RSS are enough to produce any Turing equivalent Grammar • And everything else becomes derivative • Experiments • SuperStructuring only (spatial + temporal) • Abstraction only • Abstraction + SuperStructuring

Future Work • Explore additional setups – e.g., model based feature evaluation • Explore additional methods and search mechanisms • Use Algebraic Geometry / Algebraic Topology as a foundation

Contributions (Theory) • Abstraction SuperStructuring Normal Forms (ABS, SS, RABS, RSS) – enough to achieve Turing eq. (answer to the What? question) • Hidden Variables Characterization • Radical Positivism Position • Hume’s Claim (Computationalism) • Factorization theorem for arbitrary functions into Abelian Groups

Contributions (Experimental) • Exploration of SuperStructuring in both the temporal (TBN - 2) and spatial (NB(k) - 3) domains • Abstraction Learning in the Multivariate case (AVTL - 4) • Abstraction and SuperStructuring combination in the Multinomial case (6)

Structural Induction: towards Automatic Ontology Elicitation

Structural Induction: towards Automatic Ontology Elicitation

Presentation Transcript

OWL The Web Ontology Language

Induction Motor Drive

CLIENT ORIENTATION, INDUCTION AND SCREENING

Formal Ontology for the improvement and integration of biomedical terminologies J. Simon*

Chapter 6 The Structural Risk Minimization Principle

Preference Elicitation in Single and Multiple User Settings

LECTURE ON INDUCTION MACHINE

UNIT 20 : ELECTROMAGNETIC INDUCTION

An introduction to chemistry ontology

Principles of (Biomedical) Ontology Design

Chapter 4 Web Ontology Language: OWL

Mental Functioning and the Ontology of Language

Elicitation Experiments in Language Acquisition

Introduction to Ontology Development and Tools Part I: First Steps in Ontology Development

Automatic Sprinkler Systems

Asphalt Roadway

Ontology Technology and Its Applications on the Internet

OBJECTIVES

Foundations of the Semantic Web: Ontology Engineering

M249 Automatic Rifle Operators Course