130 likes | 225 Views
CYC: A Large-Scale Investment in Knowledge Infrastructure. Douglas B. Lenat Presenter: Cristina Nicolae. CYC. an expert system which encodes in axioms knowledge of everyday objects and actions, like : • You have to be awake to eat.
E N D
CYC: A Large-Scale Investment in Knowledge Infrastructure Douglas B. Lenat Presenter: Cristina Nicolae
CYC • an expert system which encodes in axioms knowledge of everyday objects and actions, like: • You have to be awake to eat. • You can usually see people’s noses, but not their hearts. • Given two professions, either one is a specialization of the other or else they are likely to be independent of one another. • You cannot remember events that have not happened yet. • If you cut a lump of peanut butter in half, each half is also a lump of peanut butter; but if you cut a table in half, neither half is a table. • These assertions embody knowledge the CYC authors safely assume is already known about the world.
CYC • Makes easy tasks like coreference resolution: • “The police arrested the demonstrators because they feared violence” • vs. • “The police arrested the demonstrators because they advocated violence” • or word sense disambiguation: • “The box is in the pen” • vs. • “The pen is in the box”
Issues and Lessons Learned • Assertions made are true only as a default (and in certain contexts). You can usually see people’s noses, but not their hearts. • heart during surgery can be seen • How likely is it for assertions to be true? • We don’t know the probabilities precisely (dozens of people, hundreds of thousands of rules) • avoid numeric certainty factors • each assertion is true by default, and we have additional meta-assertions: Assertion A is less likely than assertion B. • Use first order predicate calculus with a series of second-order extensions (instead of frame-and-slot language)
CYC – Numbers • 106general assertions in CYC’s knowledge base • 105 atomic terms (basic concepts) in the vocabulary • The exact numbers are not important:
Commercial Applications (1/2) • Information retrieval • detailed user models (hobbies, job, family status, values, personality..) • integrating heterogeneous external information sources – users will find info without knowing how it is stored. • examining retrieved data, recognizing inconsistencies, contradictions with other sources, violations of common sense • Word processing • words spelled incorrectly as valid words • grammar checking • content checking – possible in the future (“Later, we will address…”) • flesh out incomplete (even outlined) sentences, incomplete bibliographic references
Commercial Applications (2/2) • Simulations • greater fidelity of behavior of simulated agents • role-playing games: computer characters have hobbies, jobs, social cliques, chores, memories, factual knowledge; they change moods • Speech recognition and NLU • final “sanity check” on the transcribed sentence • generate captions for email messages based on an understanding of the message body
Vaughan Pratt’s CYC Report – 1994 • CYC demos • consistency check of relational databases from different sources • peaceful/violent, communist/capitalist, date of birth • retrieving online images by caption • “someone relaxing” 3 men in beachwear holding surfboards • “someone at risk for skin cancer” girl reclining on a beach • non-monotonicity (reclining on beach beach umbrella umbrella broken cloudy) - but image database doesn’t contain any examples • - “a tree” does not obtain “A girl with presents in front of a Christmas tree” (since fixed)
CYC Report (contd) • Other aspects of CYC • CYC’s knowledge is expressed as axioms (currently half a million) – manually obtained • staff = 22 individuals • 22 axioms per person per day
CYC Report (contd) – Other experiments • CYC bread is food, bread is edible stuff, but - even if CYC was told that bread is not drink, it still didn’t return that result at a subsequent query • CYC - people don’t need food (because there is no axiom that goes from lack of food to death) • CYC PlanetEarth is bigger than PlanetVenus, but - doesn’t know the exact size of the Earth • CYC Earth has a sky, but - doesn’t know what color it is • CYC cost of a car between $6K - $80K, - nothing else
CYC Report (contd) – Conclusions • The bulk of the tester’s questions were well beyond CYC’s present grasp • Expectations about number of questions answerable and general knowledge – disappointed • But the impression was that CYC is well along the path to having comprehensive general knowledge. What lacks: a quantitative measure of how far along.