240 likes | 463 Views
Getting Cyc-ed about Inference. Christopher Cox. What is Cyc?. “The World’s Leading Provider of Formalized Common Sense” (currently ~200,000 terms each with several assertions; over 1,000,000 rules ). What is Cyc? .
E N D
Getting Cyc-ed about Inference Christopher Cox
What is Cyc? “The World’s Leading Provider of Formalized Common Sense” (currently ~200,000 terms each with several assertions; over 1,000,000 rules )
What is Cyc? • Founded in 1984 by Stanford professor Doug Lenat, it was a project in the MCC (Microelectronics and Computer Technology Corporation) until 1994, when Lenat left to form Cycorp • Objective: Codify the millions of pieces of knowledge that comprise common sense • When people die, they stop buying things • Kerosene flows downhill • When a bowl is overturned, its contents fall out.
Common Sense • Cyc’s stated goal: “Break the software brittleness bottleneck once and for all by constructing a foundation of basic common-sense knowledge-a semantic substratum of terms, rules and relations, a deep layer of understanding that can be used by other programs to make them more flexible.” • Basic Common-Sense Knowledge "In modern America, this encompasses recent history and current affairs, everyday physics, household chemistry, famous books and movies and songs and ads, famous people, nutrition, addition, weather, etc”
Overview • What is Cyc • OpenCyc, ResearchCyc, Full Cyc • What’s in Cyc? • The Big Picture • Microtheories • Predicates and Functions • Arguments and Types • Lexicon • How do I use it? • Cyc at Stanford • Cyc Browser • Java and Applications Several examples and images come from the more extensive, online OpenCyc Tutorial www.cyc.com/doc/tut
What’s in Cyc? • A Knowledge Base (KB) consisting of terms Dog, DogFood, Doghouse, SnoopDoggyDogg • Assertions that relate these terms. • Ground Assertions: (isa MyDogSharkey BelgianSheepdog) (genls BelgianSheepdog Dog) • Rules, which derive assertions from Ground Assertions: (isa THING COL ) + (genls COL SUPERCOL) ---> (isa THING SUPERCOL)
Upper Ontology Core Theories Domain-Specific Theories Facts (Database) The Knowledge Base Upper Ontology: Abstract Concepts EVENT TEMPORAL-THING INDIVIDUAL THING Core Theories: Space, Time, Causality, … Knowledge Base Layers For all events a and b, a causes b implies a precedes b Domain-Specific Theories For any mammal m and any anthrax bacteria a, m’s being exposed to a causes m to be infected by a. Facts: Instances Johnis a person infected by anthrax.
A Dog is a ….. Agent Agent-Generic AirBreathingVertabrate Animal Agent Agent-Generic AirBreathingVertabrate Animal AnimalBLOBilateralObject BiologicalLivingObect CanineAnimal CarnivoreCarnivoreOrder ChordataPhylum Coelmates Container-Underspecified Dog EukaryoticOrganism Eutheria FrontAndBackSidedObject Heterotroph HexelateralObjectHomeotherm HumanScaleObject Individual IndividualAgentLeftAndRightSidedObject Location-Underspecified MammalNaturalTangibleStuff NonPersonAnimal OrganicStuff Organism-Whole PartiallyTangible PerceptualAgent Region-UnderspecifiedSentientAnimal SolidTangibleThing SomethingExistingSpatialThing SpatialThing-Localized System-GenericTemporalThing TerrestrialOrganism ThingTopAndBottomSidedObject Trajector-Underspecified VertebrateAnimalBLOBilateralObject BiologicalLivingObect CanineAnimal CarnivoreCarnivoreOrder ChordataPhylum Coelmates Container-Underspecified Dog EukaryoticOrganism EutheriaFrontAndBackSidedObject Heterotroph HexelateralObjectHomeotherm HumanScaleObject Individual IndividualAgentLeftAndRightSidedObject Location-Underspecified MammalNaturalTangibleStuff NonPersonAnimal OrganicStuff Organism-Whole PartiallyTangible PerceptualAgent Region-UnderspecifiedSentientAnimal SolidTangibleThing SomethingExistingSpatialThing SpatialThing-Localized System-GenericTemporalThing TerrestrialOrganism ThingTopAndBottomSidedObject Trajector-Underspecified Vertebrate
Microtheories • A way of grouping assertions and rules which share a set of assumptions; about a domain, level of detail, period in time, source, topic, etc. • Each KB assertion occurs within some microtheory • These allow for a KB that copes with global inconsistency and that can focus inference according to necessary detail
Microtheories • Though no monotonic contradictions are allowed inside a microtheory, assertions in different microtheries may be inconsistent • Time MT1: Mandela is an elder statesman MT2: Mandela is the President of South Africa MT3: Mandela is a political prisoner • Granularity/domain MT1: Tables are solid MT2: Tables are mostly space • Microtheories are arranged in an inheritance heirarchy
Microtheory Inheritance: genlMt #$BaseKB genlMt genlMt #$NaiveSpatialMt #$MovementMt genlMt genlMt genlMt #$NaturalGeographyMt #$NaivePhysicsMt genlMt #$TransportationMt
Predicates and Denotational Functions • Predicates are truth-functional relations which can be evaluated according to facts in the KB and used to make sentences that are true or false • Usually Lowercase (objectHasColor BrownDog Brown) (memberStatusOfOrganization Norway NATO FoundingMember) • Functions take arguments to denote Non-Atomic Terms (NATs), expressions that represent things • Usually Uppercase (FruitFn AppleTree) denotes an apple (BorderBetweenFn Sweden Norway)denotes the border between Sweden and Norway.
Arity and Argument Types • Every predicate or function is defined with particular arity and argument types • Arity: Number of Arguments (arity mother 2) (arity MotherFn 1) • Argument Types: use isa and genl relations (arg1Isa mother Animal) (arg2Isa mother FemaleAnimal) (arg1Isa TransportViaFn ExistingObjectType) (arg1Genl treatmentTypeAppliedToConditionType MedicalTreatmentEvent)
Predicates and Rules • Can be built to form meaningful, well-formed logical sentences • You can add your own, using ASSERT Mt: AgentGMt Rule: (implies (and (isa ?HELPHelpingAnAgent) (performedBy ?HELP ?HELPER) (beneficiary ?HELP ?HELPED) (positiveVestedInterest ?HELPER ?HELPED)
Specialized Content • Cyc has several specialized and useful areas of KB content: • Times and Dates temporallyIntersects,startsAfterStartingOf,YearsDuration • Spacial Properties and Relations constituent, ingredient,~60 inpredicates, ~60 Shape Attributes • Event Types, with Roles and Actors MovementEvent, MedicalTreatmentEvent, GivingSomething
The Cyc Lexicon • Cyc also knows a lot about English • There are entries for Lexical items as well Treat-TheWord Use-TheWord • Several predicates express relationships which translate English expressions into CycL (and vice versa) (verbSemTransUse-TheWord 0 TransitiveNPFrame (and (isa :ACTIONUsingAnObject) (performedBy :ACTION :SUBJECT) (instrument-Generic :ACTION :OBJECT)))
Important Lexical Predicates • denotation -- Relates a LexicalWord and SpeechPart to some denotedThing (e.g. some Individual or Collection). • multiWordString -- Relates a list of strings (e.g. ("hot")), a LexicalWord (e.g. Dog-TheWord), and a SpeechPart to some denoted Thing (e.g. HotDog); c.f. MultiWord -PhrasePrediciate. • verbSemTrans -- Relates a LexicalWord, sense number, and SubcategorizationFrame to a NLTemplateExpression; c.f. SemTransPredicate. • nameString -- Relates a Thing to a string which (conventionally) refers to it • We’ll do some examples
The Cyc Browser • To run the Cyc KB Browser • Run an image on a ja- machine. • Move to /scr/nlp/src/cyc/cyc1.0enterprise/ • Run ./run-cyc.sh , a Cyc will start to run on your desktop. • You can use the SubL interactor directly at the prompt • Or you can load up a browser from the ja- machine (you’ll need to forward the desktop image to your machine) and set the address to: http://localhost:3602/cgi-bin/cyccgi/cg?cb-start
Exploring Cyc • http://researchcyc.cyc.com/ • Playing around with the Browser is only way to really learn what’s in Cyc. • Logging In • The Search Box • The Heirarchy Browser • Documentation (usr/pass: rcyc/rcyc) • Ask • Assert • Query • Toolbar • Don’t use the parser
Example Application: Cyc in RTE • We’re looking at using Cyc the context of Recognizing Textual Entailment • Dependency parses are a good starting point for Cyc (PID 702, Hypothesis) In the late 1980s Budapest became the center of the reform movement.
RTE in a Nutshell bought object subj Synonym Match Cost: 0.2 Chris (person) car Exact Match Cost: 0.0 Hypernym Match Cost: 0.4 purchased subj object BMW Chris (person) Vertex Cost: (0.0 + 0.2 + 0.4)/3 = 0.2 Relation Cost: 0 (Graphs Isomorphic) Match Cost: 0.55 (0.2) + (.45) 0.0 = 0.11
Cyc and Java • We clearly need a way to interact with the Cyc KB programatically • Cyc APIs exist for Java and Python (check out /src/nlp/src/cyc/api/java/OpenCyc.jar) • Documentation is sparse • Cyc could be really valuable, if we can figure out a way to get around what’s missing • I’ve got code (soon to be in JavaNLP) for generic interactions with the CycKB, and for searching Cyc space along genls relationships as a measure of verb similarity • It’s a huge KB, so use your imagination