100 likes | 242 Views
HNC Data Alignment Research Direction. Richard Rohwer Senior Principal Scientist, Advanced Technologies HNC Software / Fair Isaac. Cognition needs Semantics needs Massive Data. KNOWLEDGE. Tacit Knowledge. Theorem:
E N D
HNC Data Alignment Research Direction Richard Rohwer Senior Principal Scientist, Advanced TechnologiesHNC Software / Fair Isaac
Cognition needs Semantics needs Massive Data KNOWLEDGE Tacit Knowledge Theorem: Probability distributions are the UNIQUE logically consistent knowledge representation. includes Semantics / Meaning = Association Statistics Explicit Knowledge Statistics Information Organization Reasoning Statistics Massive Data
From massive data to machine cognition:The technical principles • Mathematical ingredients: • Association-Grounded Semantics (AGS) • To capture meaning mathematically. • Semantically-Driven Segmentation (SDS) • To extract the most meaningful patterns. • Distributional Alignment (DA) • To compare meanings abstractly. • Semantically Enriched Reasoning Engine • To think in terms of meanings instead of symbols.
Association-Grounded Semantics (AGS):Meaning = Usage fro onto reaching acrs btwn beyond frm inside alg across via thru ovr around near between within through into over by from at jun sept apr jul nov oct dec aug feb sep jan bsb msj tng opv adm atm cpo bdo notal u b captain mr gen msgt ltc tsgt cpt sgt ssgt capt maj lt Cables
Distributional Alignment (DA)Abstraction ~ Structural Commonality • Align semantic spaces by distribution of content. • No need to understand content. • Transport meaning between • Languages • Dialects • Cultures • Transport metaphorically between topics. transLign algorithm: • No language knowledge. • No tie words. • No aligned corpora.
Alignment: Terminology “bank note” “river bank” “bank” What ‘cha call it? AGS Semantic Space Cable English Foreign Newswire Automation AGS techniques do not require manually constructed resources… … but can use them when available. RP English Less Commonly Taught Language Newswire English Terror Cell Obfuscated Slang Blog Dialects Institutional Dialects Professional Dialects Information Loss (Unequal expressive power) Polysemy (Sense resolution) Good solutions from NIMD: • Entity Disambiguation (5.5% err vs. 13.5% err in KDD) • General terms fluffy snow Naïve Bayes
Alignment: Schemata Natural Language Corpora Natural Language Corpora Semantic Alignment Table name Table name Semantic Alignment Column name Column name Column name Column name Column name Column name Instance Statistics (Joined across schema) Instance Statistics (Joined across schema) I n s t a n c e I n s t a n c e I n s t a n c e I n s t a n c e I n s t a n c e I n s t a n c e I n s t a n c e I n s t a n c e I n s t a n c e I n s t a n c e I n s t a n c e I n s t a n c e I n s t a n c e I n s t a n c e Structural Alignment Schema Graph Schema Graph
Alignment: Ontologies • More complex graph structure • Reflecting multiple (transitive) relations • is-a, part-of, reports-to, prerequisite-for, … • Implies more options for defining AGS statistics • More relations, more ways to define co-occurrence. • Big Picture issue: • Ontological structure makes general statements about instances of relationships within data. • So does AGS. • How are these related?