Taxonomies & Classifications for Organizing Content

Taxonomies & Classifications for Organizing Content

What do we know about taxonomies? Ontology comes from the Greek ontologia. Onto = the science of existence Logia =talking about being Who gets credit for taxonomies? • Aristotle is the founder of taxonomy. • His ideas represent the foundation for object-oriented systems • He introduced a number of inference rules (syllogisms) used • in modern logic-based reasoning systems

Why is it, that in the last decade ( 2000 years after A) that knowledge representations & ontologies have gained importance? • Agent communication (Automated data mining) • Artificial Intelligence (Cyc) • Description of content to facilitate its retrieval (Intelligent searches) • Ecommerce (Amazon) • E-science experiments • E-learning systems • Information integration (Personalized newspapers & journals) • Intelligent devices (Management of Remote equipment) • Knowledge management (Corporate Intranet) • Speech and natural language understanding • Web Service discovery (Mobile devices) • Etc, etc, etc, whatever the humankind concocts (the MATRIX)

What do all of these things have in common? • Automated data mining • Artificial Intelligence • Intelligent searches • Amazon • E-science experiments • E-learning systems • Personalized newspapers & journals • Intelligent devices • Knowledge management • Speech and natural language understanding • Web Service discovery Through the use of ONTOLOGIES, they attempt to represent knowledge in such a way that it can be understood by a computer and have the computer use this knowledge in real time.

What are the ontological challenges? • Multiple groups of people are conceptualizing different ways • to represent knowledge and the programs they write have different • conceptual backgrounds: • learning theory, psychology, philosophy, logic, computer science • Ontologies can differ depending on the needs/conventions • of the producers & the consumers of the knowledge being represented. • The word ontology is used to describe different degrees of structure

Ontologies can differ depending on the needs/conventions of the • producers & the consumers of the knowledge being represented. For example the word APPLIANCE has many different meanings:

An ontology about the domain of APPLIANCE could model: • Household Appliances (small & major) - blenders, • expresso machine, stoves, washer/dryers, etc. • Computer Appliances - 1U, software, virtual, etc. • Orthodontic Appliances - braces, retainers, etc. Domain ontologies represent concepts in very specific and often eclectic ways, thus they are often incompatible. Furthermore, different ontologies in the same domain can also arise due to different perceptions of the domain based on cultural background, education, ideology, or because a different representation language was chosen

The word ontology has been used to describe artifacts with different • degrees of structure. • Simple taxonomies Metadata schemes Logical theories • YAHOO DUBLIN CORE CYC

Regardless of these differences, in one way or another an ontology looks at a domain in terms of: • Classes (general things) in the many domains of interest • • The relationships that can exist among things • The properties (or attributes) those things may have

Cyc A project started in Austin, Texas by Doug Lenat as part of Microelectonics and Computer Technology. It is an AI project that attempts to assemble a comprehensive ontology and database of everyday common sense knowledge, with the goal of enabling AI applications to perform human-like reasoning. The original knowledge base is proprietary, but now there is an open version.

WordNet • A semantic lexicon for the English language. • The purpose is twofold: • to produce a combination of dictionary and • thesaurus that is more intuitively usable • to support automatic text analysis and AI • applications.

The Dublin Core • A metadata element set is a standard for cross-domain information resource description. It provides a simple and standardized set of conventions for describing things online in ways that make them easier to find. Dublin Core is widely used to describe: • Digital materials such as video • Sound • Image • Text • Composite media like web pages.

Suggested Upper Merged Ontology or SUMO It was originally developed by the Teknowledge Corporation and now is maintained by Articulate Software. SUMO originally concerned itself with meta-level concepts and thereby would lead naturally to a categorization scheme for encyclopedias. It has now been considerably expanded to include a mid-level ontology and dozens of domain ontologies. SUMO was first released in December 2000.

Web Ontology Language • or OWL • W30 trying to define an ontology that can be used across all domains and applications: • Agent communication • Artificial Intelligence • Description of content to facilitate its retrieval • Ecommerce • E-science experiments • E-learning systems • Information integration • Intelligent devices • Knowledge management • Speech and natural language understanding • Web Service discovery

The General Formal Ontology (GFO) • Developed by Heinrich Herre, Barbara Heller and collaborators (research group at Onto-Med in Leipzig. • Primarily, the ontology GFO: • Includes objects as well as processes and both are integrated into one coherent system • includes levels of reality • • is designed to support interoperability by principles of ontological mapping and reduction • • contains several novel ontological modules in particular, a module for functions and a module for roles • is designed for applications, firstly in medical, biological, and biomedical areas, but also in the fields of economics and sociology.

EXAMPLES of ONTOLOGIES IN ACTION • Web Portals - define an ontology for its community An ontology for an information science portal includes the terms: "journal paper," "publication," "person," and "author." This ontology could include definitions that state things such as "all journal papers are publications" or "the authors of all publications are people." When combined with facts, these definitions allow other facts that are necessarily true to be inferred. These inferences can, in turn, allow users to obtain search results from the portal that are impossible to obtain from conventional retrieval systems. Such a technique relies on content providers using the web ontology language to capture high-quality ontology relationships.

EXAMPLES of ONTOLOGIES IN ACTION Multimedia Collection An indexer selects the value "Late Georgian" for the style/period of an antique chest of drawers, it should be possible to infer that the data element "date.created" should have a value between 1760 and 1811 A.D. and that the "culture" is British. Availability of this type of background knowledge significantly increases the support that can be given for indexing as well as for search. Another feature that could be useful is support for the representation of default knowledge. An example of such knowledge would be that a "Late Georgian chest of drawers," in the absence of other information, would be assumed to be made of mahogany. This knowledge is crucial for real semantic queries, e.g. a user query for "antique mahogany storage furniture" could match with images of Late Georgian chests of drawers, even if nothing is said about wood type in the image annotation.

EXAMPLES of ONTOLOGIES IN ACTION • Corporate Website Management • An ontology-enabled web site may be used by: • A salesperson looking for sales collateral relevant to a sales pursuit • • A technical person looking for pockets of specific technical expertise and • detailed past experience • • A project leader looking for past experience and templates to support a • complex, multi-phase project, both during the proposal phase and during • execution • A typical problem for each of these types of users is that they may not share terminology with the authors of the desired content. The salesperson may not know the technical name for a desired feature or technical people in different fields might use different terms for the same concept. For such problems, it would be useful for each class of user to have different ontologies of terms, but have each ontology interrelated so translations can be performed automatically.

Moving from the World Wide Web to the Semantic Web Ontologies figure prominently in the emerging Semantic Web as a way of representing the semantics of documents and enabling the semantics to be used by web applications and intelligent agents. There are studies on generalized techniques for merging ontologies, but this area of research is still largely theoretical.

Language - Expandable - language independent - machine understandable - understood by humans - ambiguous Knowledge - changes rapidly - may be local to an entity Information versus Knowledge • The World Wide Web is based mainly on documents written in Hypertext Markup Language (HTML). • When you enter a search query: • “Information Architecture and Design Fall 2007 and UT Austin” • the search engine is programmed to pull relevant documents based on an algorithm formula which factors metadata relevant to your query word: • number of keywords in the page • name of images • number of hyper links entering and exiting the page • etc.

Information versus Knowledge <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Information Architecture and Design Fall 2007</title> <meta name="keywords" content="Information Architecture, Information Design, Information Architecture and Design, School of Information, Web Information Seeking, Web Site Design, Information Seeking, Information Retrieval, Fall 2006"> <meta name="description" content="Course Web Site: Information Architecture and Design, Fall 2006"> <meta content="text/html; charset=iso-8859-1"> <link href="Web_files/iabeta.css" rel="stylesheet" type="text/css"> <link rel="stylesheet" type="text/css" href="Web_files/iaprint.css" media="print"> </head> <body>  <div id="headerlogo"> <p class="logoL1">School of Information, The University of Texas at Austin<br> <span class="logoL2">385E Information Architecture and Design l</span><br> <span class="logoL3">Fall 2007</span></p></div><div id="headersearch" class="noprint"> <form method="get" action="http://www.google.com/univ/utexas"><input name="q" size="30" maxlength="255" value="" align="top" type="text"><br> <input name="btnG" value="Search" align="center" type="submit"> <a href="http://www.google.com"><img src="Web_files/GoogleLogo.gif" border="0" height="27" width="64"></a></form> </div><ul id="topnavfolders" class="noprint"><li><a href="index.html" class="selected">Overview</a></li> <li><a href="policies.html">Policies</a></li><li><a href="schedule.html">Schedule</a></li<li><a href="assignments.html">Assignments</a></li<li><a href="resources.html">Resources</a></li </ul><div id="topnavsub" class="noprint"><a href="#1" class="overview">General Info</a> <a href="#2" class="overview">Description</a> <a href="#3" class="overview">Objectives</a> <a href="#4" class="overview">Textbooks</a> <a href="#5" class="overview">Mailing List</a> </div><div id="content"><a name="1"></a><h1>General Information:</h1><p>Instructor: A. Fleming Seay, PhD <br>Email: <a href="mailto:Fleming_Seay@Dell.com">Fleming_Seay@Dell.com</a><br>Phone: (412) 334-1682<br>Office Hours: by appointment</p><p>Class Meeting Time: Tuesday 6:30–9:30pm <br>Classroom: SZB 546<br>Course Website: <a href="http://www.ischool.utexas.edu/%7Ei385e/index.html">http://www.ischool.utexas.edu/~i385e</a><br>TA: Jade Anderson<br> Email: <a href="mailto:jade@ischool.utexas.edu">jade@ischool.utexas.edu</a> It has no real understanding, NO KNOWLEDGE of the page The search program pulls INFORMATION

Information versus Knowledge FACTS - what exists on the Web at the present time • INTERPRETATION OF FACTS • in light of: • Truths • Beliefs • Perspectives • Judgments • Methodologies • Know-how =ontology

Information versus Knowledge

Bibliography Cimiano, Phillip. Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. 2006. (New York: Springer Science & Business Media, LLC). Heflin, Jeff (editor). “OWL Web Ontology Language Use Cases and Requirements: W3C Recommendation 10 February 2004.”. http://www.w3.org/TR/webont-req/ . 2004. World Wide Web Consortium. Retrieved August 21, 2007. Hillman, Diane. “ Using Dublin Core.” http://dublincore.org/documents/usageguide/ . 1995-2007. Dublin Core Metadata Initiative. Retrieved July 25, 2007. Hillman, Diane. “Using Dublin Core - The Elements”. http://dublincore.org/documents/usageguide/elements.shtml . 1995-2007. Dublin Core Metadata Initiative. Retrieved July 25, 2007. Walton, D. Christopher. Agency and the Semantic Web. 2007. (NewYork: Oxford University Press). “about Cycorp.” http://www.cyc.com/cyc/company . 2002-2007. Cycorp, Inc. Retrieved September 29, 2007. “About Wordnet.” http://wordnet.princeton.edu/ . 2006. Princeton University. Retrieved September 29, 2007. “General Formal Ontology.” http://www.ontomed.de/en/theories/gfo/index.html . 2007. University Leipzig: Department of Formal Concepts. Retrieved September 29, 2007. “MODS: Metadata description Schema the Official Website. “ http://www.loc.gov/standards/mods/ . August 27, 2007. Library of Congress. Retrieved September 29, 2007.

A core glossary is a simple glossary or defining dictionary which enables definition of other concepts, especially for newcomers to a language or field of study. It contains a small working vocabulary and definitions for important or frequently encountered concepts, usually including idioms or metaphors useful in a culture.In computer science, a core glossary is a prerequisite to a core ontology. An example of this is seen in SUMO.[edit] The search engineGoogle provides a service to only search web pages belonging to a glossary therefore providing access to a kind of compound glossary of glossary entries found on the web.[1]

An upper ontology (or foundation ontology) is a model of the common objects that are generally applicable across a wide range of domain ontologies. It contains a core glossary in whose terms objects in a set of domains can be described. There are several standardized upper ontologies available for use, including Dublin Core, GFO, OpenCyc/ResearchCyc, SUMO, and DOLCEl. WordNet, while considered an upper ontology by some, is not an ontology: it is a unique combination of a taxonomy and a controlled vocabulary (see above, under Attributes).

RDF (XML based syntax) RDFS OWL Ontology Web Language

Taxonomies & Classifications for Organizing Content