330 likes | 467 Views
Upper Ontology Symposium. C YC Custodian Communiqué Comments. Dr. Douglas B. Lenat , 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email : Lenat@cyc.com Phone: (512) 342-4001. 2 July 2005.
E N D
Upper Ontology Symposium CYC Custodian Communiqué Comments Dr. Douglas B. Lenat , 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 2 July 2005
OpenCycOpen Source release of: the entire 300k-term Cyc Ontology + 1M Simple Relns. + 720 Inference Engines, NL Lex/Parser/Generator • ResearchCycAll of Cyc • publicly available • free for R&D purposes
What Needs to be Shared? • bits/bytes/streams/network… • alphabet, special characters,… • words, morphological variants,… • syntactic meta-level markups (HTML) • semantic meta-level markups (SGML, XML) • content (logical representation of doc/page/...) • context (common sense, recent utterances, and n dimensions of microtheory-space: time, space, level of granularity, the source’s purpose, etc.) Semantic Web
To do the logical/arithmetic combination across information sources, we need tens of thousands of relations, not tens What Needs to be Shared? DAML+OIL adds a few more distinctions: inverses, unambiguous properties, unique properties, lists, restrictions, cardinalities, pairwise disjoint lists, datatypes, … • bits/bytes/streams/network… • alphabet, special characters,… • words, morphological variants,… • syntactic meta-level markups (HTML) • semantic meta-level markups (SGML, XML) • content (logical representation of doc/page/...) • context (common sense, recent utterances, and n dimensions of metadata: time, space, level of granularity, the source’s purpose, etc.) Tiny vocabulary (# distinctions) of standard relations: rdf:type, subclass, label, domain, range, comment,… Beyond which diversity is tolerated Which means divergence is inevitable “What do you mean we have no standard, we have lots of standards!”
What Needs to be Shared? • bits/bytes/streams/network… • alphabet, special characters,… • words, morphological variants,… • syntactic meta-level markups (HTML) • semantic meta-level markups (SGML, XML) • content (logical representation of doc/page/...) • context (common sense, recent utterances, and n dimensions of metadata: time, space, level of granularity, the source’s purpose, etc.)
What Needs to be Shared? I.e., share a formal ontology, including a full upper ontology, large portions of a middle ontology, and relevant slivers of a lower (domain-dependent) ontology. • bits/bytes/streams/network… • alphabet, special characters,… • words, morphological variants,… • syntactic meta-level markups (HTML) • semantic meta-level markups (SGML, XML) • content (logical representation of doc/page/...) • context (common sense, recent utterances, and n dimensions of formal ontological knowledge: time, space, level of granularity, the source’s purpose, etc.)
Thing Intangible Thing Individual Sets Relations Spatial Thing Temporal Thing Partially Tangible Thing Space Time Paths Events Scripts Spatial Paths Logic Math Agents Physical Objects Borders Geometry Artifacts Living Things Organ- ization Materials Parts Statics Actors Actions Movement Life Forms Plans Goals State Change Dynamics Organizational Actions Types of Organizations Ecology Human Beings Human Activities Physical Agents Natural Geography Organizational Plans Human Organizations Plants Human Anatomy & Physiology Nations Governments Geo-Politics Human Artifacts Political Geography Agent Organizations Business & Commerce Politics Warfare Animals Emotion Perception Belief Human Behavior & Actions Sports Recreation Entertainment Social Behavior Products Devices Conceptual Works Purchasing Shopping Professions Occupations Weather Law Vehicles Buildings Weapons Mechanical & Electrical Devices Software Literature Works of Art Social Relations, Culture Business, Military Organizations Earth & Solar System Social Activities Transportation & Logistics Travel Communication Everyday Living Language Cyc: A Large Formal Ontology • Represented in: • First Order Logic • Higher Order Logic • Context Logic • Micro-theories Cyc contains: 15,000 Predicates 300,000 Concepts 3,200,000 Assertions General Knowledge about Various Domains Specific data, facts, and observations
Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world Cyc is… • The typical bird has 1 beak, 1 heart, lots of feathers,… • Hearts are internal organs; feathers are external protrusions • Most vehicles are steered by an awake, sane, adult,… human • Tangible objects can’t be in 2 (disjoint) places at once • Badly injuring a child is much worse than killing a dog • Causes temporally precede (i.e., start before) their effects • A stabbing requires 2 cotemporal and proximate actors • etc.
Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world Cyc is… • Each of these represented in formal logic • Info. about a set of hundreds of thousands of terms • Language-independent
Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world Penitentiary EnglishWord-Plume WritingPen EnglishWord-Pen BirdFeather FrenchWord-Plume Authoring … … Cyc is… • Each of these represented in formal logic • Info. about a set of hundreds of thousands of terms • Language-independent ArabicWord-Qalam Corral
Knowledge Users User Interface (with Natural Language Dialog) Knowledge Authors Other Applications Knowledge Entry Tools Cyc API Cyc Reasoning Modules Cyc Ontology & Knowledge Base Interface to External Data Sources External Data Sources Data Bases Web Pages Text Sources Other KBs
Lexical Entry Example: Eat Constant: Eat-TheWord isa: EnglishWord (verbSemTrans Eat-TheWord 0 TransitiveNPCompFrame(and (isa :ACTION EatingEvent) (performedBy :ACTION :SUBJECT) (inputsDestroyed :ACTION :OBJECT))) Mt: EnglishMt infinitive: “eat” pastTense: “ate”perfect: “eaten” agentive-Sg: “eater” (subcatFrame Eat-TheWord Verb 0 TransitiveNPCompFrame)
Lexical Entry Example: Coke Constant : Coke-TheWordisa : EnglishWord Mt : EnglishMtsingular : “coke” pnSingular : “Coke” massNumber : “coke” pnMassNumber : “Coke” (denotation Coke-TheWord ProperCountNoun 0 (ServingFn CocaCola)) (denotation Coke-TheWord ProperMassNoun 0 CocaCola) (denotation Coke-TheWord MassNoun 0 Cocaine-Powder) (denotation Coke-TheWord MassNoun 2 ColaSoftDrink) (denotation Coke-TheWord SimpleNoun 0 (ServingFn ColaSoftDrink) <various other denotations of the English word “coke”> SLANG SLANG SLANG
OpenCycOpen Source release of: the entire 300k-term Cyc Ontology + 1M Simple Relns. + 720 Inference Engines, NL Lex/Parser/Generator ResearchCycAll of Cyc (free, for R&D purposes) FACToryFree online “match game” to check/add to the ontology and, more generally, to the Cyc KB
The OpenCyc Release • Runs on Windows, Linux • OpenCyc Knowledge Base • LGPL license • 47,000 terms 300k • 306,000 facts 1M+ • Cyc Inference Engine • Free license for binary runtime engine • Application Programming Interface • Java, SubL, Python • Extensive documentation • Ontological Engineer’s Handbook • Online Cyc 101 course • Much more
OpenCyc Popularity • Downloaded over 90,000 times • Used for teaching university AI courses • IEEE candidate for standard upper ontology • Cited in numerous books/publications • OpenCyc Users group (monthly meetings)
Support for Modular Use • CycL – documented and open-sourced • Ontology - selected regions of microtheory space • Exportable as OWL, XML • Export • Tuples exportable to OWL • Query results exportable in user-defined XML formats • Inference Engine • Support HL module creation • CycL evaluatable functions using external Web services • Cyc API accessible as Web services (in progress)
Integrating OpenCyc • a server component in an integrated system • provides a rich open-source API (Application Programming Interface) with Java bindings • a FIPA-OS and DARPA CoABS Grid compatible agent • The Cyc API is exposed as a Web Service • Implements DQL (DAML Query Language) service to support the Semantic Web
73 Active ResearchCyc User Groups (approx. 500 active ResearchCyc users) Government-related Xerox PARC Government Commercial Language Computer Corporation ANSER, Inc. Air ForceRome Labs NTTCommunications Science Laboratories (Japan) Stone’s Throw Technologies 21st Century Technologies SRI HoustonVA Medical Center ISI Austin Info Systems Fraunhofer Institute Daxtron Labs Lockheed Martin ATLD Sapio Systems (Denmark) Terra Incognita U of Illinois Urbana-Champaign University U of Maryland Trimtab Consulting MIT Media Lab Stanford NLP Dept. Northwestern U TNO-DMV (Netherlands) U of Pennsylvania Rensselaer AI and Reasoning Lab Microfabrica, Inc. Knowledge Media Institute, Open University LBJ School of Public Affairs New MexicoHighlands Univ. Institute for the Study Of Accelerating Change U of Stuttgart Harvard U U of Toronto U of Minnesota Witan International NPOs Radboud U (Netherlands) Tokyo Inst. of Technology Linkoping U (Sweden) U of Hawaii
Applying ResearchCyc • Identify as large and representative a set of task queries/challenge problems as possible • Write additional assertions. (Add new terms to the ontology as needed. In rare cases, add new specialized reasoner(s)). • Map the relevant information sources’ schemas (used in this task) to Cyc
A few sample ResearchCyc Projects • AIML interface - requested by other ResearchCyc users • Oracle PL-SQL interface • C/C++, PHP APIs • Business Process Execution Language (BPEL) extended with CycL • Auto-generated user interfaces • “MUD” Role-playing game • Knowledge collection applications • MIT Open Mind Common Sense Project, Phase 2
CDE USGS NIMA N-P CGKB Reasoning Modules Fused DBå1-8 CYC KB + GS Ontology Terrorism Terrorism Weather Data Basic Encyc Terrain Data Satellite Data Sensor Data Comprehensive Geospatial Knowledge Base GSK Transformation , Fusion and Retrieval What are all structures in the suburbs of Tikrit with walls impenetrable by small arms fire? Query Formulator Geospatial Knowledge Source Integrator
CrabFishery LakeBed MonsoonForest MudFlat USCS-Code-CL Glacier Ridge Butte Cave MinedArea PostalCodeRegion Prefecture Geospatial Classes 1100 Atomic types, 338 functionally specified ones • TownSquare • Quarry • Atoll • Continent • TrueContinent • (FieldFn OliveTree) • (CityInCountryFn Cuba) • Protectorate • IndependentCountry • Colony • SchoolDistrict • Monarchy
terrainType maximumDepth cloudCeiling importsFrom regionalPastimes populationDensity trafficableForVehicle Predicates of Geospatial EntitiesOver 500 • freightRailTrafficRate • internetCountryCode • hasClimateType • languagesSpokenHere • highestPointInRegion • waterAreaOfRegion • canopyClosureOfRegion
Construct a Comprehensive GS Ontology Spanning the following Areas Mereotopology: concepts such as spatial part of, overlapping, and connected to; applicable to any space, or things located in space, according to very general senses of these terms (e.g. where a space is a topology, in the mathematical sense). (Geo)Topography: metric and geometric concepts salient to describing regions of the earth’s surface (e.g. measurement of lengths along, and angles between, great circles and rhumb lines; geodetic models of the earth, their reference ellipsoids, and coordinate transformations between them; map projections and coordinate transformations between map projections). (Geo)Cartography: concepts used to define natural or conventional boundaries of earth surface regions, such as continent, desert, prefecture, dam, highway, or nation. Also, the loosely connected cluster of attribute dimensions that are used by maps to characterize these, or arbitrary (e.g. regular polygon) spatial regions. For example: Weather attributes, trafficability attributes, degrees of soil fertility, and numbers or types of a region’s inhabitants.
Global Terrain Data GSK Transformation and Fusion I.A.5.N.c Seasonally flooded tropical sclerophyllous leaved evergreen I.A.5.N.d Semi permanently flooded tropical sclerophyllous leaved evergreen I.A.5.N.e Saturated tropical broad sclerophyllous leaved evergreen III.A.1.N.c. Sclerophyllous tropical or subtropical broad-leaved shrubland
Is this shire a forest? Global Terrain Data GSK Transformation and Fusion I.A.5.N.c Seasonally flooded tropical sclerophyllous leaved evergreen I.A.5.N.d Semi permanently flooded tropical sclerophyllous leaved evergreen I.A.5.N.e Saturated tropical broad sclerophyllous leaved evergreen III.A.1.N.c. Sclerophyllous tropical or subtropical broad-leaved shrubland
Australian Agricultural Data GSK Transformation and Fusion From the cartography ontology: Three of the regular polygon regions are forests. The sum of a contiguous group of forests is itself a forest. Therefore: The shire is a forest From the topography ontology: The shire is a sub region of a group of forests From the mereotoplogy ontology: Every part of the shire is part of a forest.
Advanced GS Question Answering • Which elevated areas in sector CX provide good cover from aerial surveillance are nearest to trafficable roads in enemy territory? • What concrete structures have been occupied by enemy troops in the last 48 hours? • What are coordinates of radio broadcast facilities in region X capable of supporting megawatt broadcasts, and what are the coordinates of power sources near these facilities? • List all structures with walls impenetrable by small arms fire in the suburbs of Tikrit. • List each facility in Angola of a type that typically contains drums of fuel oil, and the term for that facility’s type in the predominant local language. • List all two story buildings with 20,000 feet or more of floor space, with no back entrance, and whose front entrance faces the prevailing wind. • List the locations of all underground bunkers adequate to contain all the members of a unit of type T.
Bottlenecks to Applying ResearchCyc • Several (NLU, NLG, Learning by…, Faster and easier knowledge entering/vetting by novices, etc.) • Many of these in turn have as a bottleneck: MAKING INFERENCE FASTER
characterize and harness systems that employ 6 types of “tricks” • Reasoners that exploit limitations in the expressivity of the repr. language they operate over • Domain-specific (incl. Context-specific) reasoners • Statistical/Bayesian Reasoners • Unsound reasoners (analogy, abduction,…) • Meta-reasoners (tacticians) and Meta2 (strategists) • Parallel Processing, Special-purpose hardware acceleration, Biotech, Nanotech, Quantum comp…
What Needs to be Shared? I.e., share a formal ontology, including a full upper ontology, large portions of a middle ontology, and relevant slivers of a lower (domain-dependent) ontology. • bits/bytes/streams/network… • alphabet, special characters,… • words, morphological variants,… • syntactic meta-level markups (HTML) • semantic meta-level markups (SGML, XML) • content (logical representation of doc/page/...) • context (common sense, recent utterances, and n dimensions of formal ontological knowledge: time, space, level of granularity, the source’s purpose, etc.)