560 likes | 587 Views
Understanding Taxonomies. Taxonomy. The classification of information entities in the form of a hierarchy, according to the presumed relationships of the real-world entities that they represent. Subclass.
E N D
Taxonomy • The classification of information entities in the form of a hierarchy, according to the presumed relationships of the real-world entities that they represent.
Subclass • A taxonomy is a semantic hierarchy in which information entities are related by either the subclassification of relation or the subclass of relation. Weaker Stronger
Physical and Functional Properties • Functional properties necessarily influence the physical properties; the physical properties depend on the functional properties. • There usually has a specific distinguishing property for each subclass of a taxonomy.
Taxonomies • Taxonomies are good for classifying your information entities. • They express at least the bare minimum of the semantics necessary to distinguish among the objects in your information space. • They are a simple model of the distinguishable items you are interested in. • They are a way of structuring and characterizing your content meta data.
Redundant • Because taxonomies are trees, sometimes there is redundant information in a taxonomy. • Because there is only one parent node for each child node, you may sometimes have to have duplicate children nodes under different parents. • For example, if you had the subclasses of Manager and Employee situated under Person, as in the example discussed previously, all managers would be placed under both nodes, since they are both managers and employees, resulting in duplication.
Ontologies vs. Taxonomies • Ontologies use taxonomies as their backbones. • The basic taxonomic subclass of hierarchies act as the skeleton of ontologies, but ontologies add additional muscle and organs—in the form of additional relations, properties/attributes, property values. • So, taxonomies provide the basic structure for the information space, and ontologies flesh it out.
Taxonomies for Searching • Because taxonomies are focused on classifying content (semantics or meaning), they enable search engines and other applications that utilize taxonomies directly to find information entities much faster and with much greater accuracy. • UDDI requires taxonomies and ontologies. A directory or registry of Web products and services absolutely needs some way of classifying those products and services.
UDDI has proposed the tModel as the placeholder for taxonomies such as UNSPSC and the North American Industry Classification System (NAICS) that can be used to classify Web products and services.
Examples • The entire Yellow Pages is a huge taxonomy. It is ordered alphabetically to be of additional assistance to a person looking for products or services, but its primary function is as a taxonomy classifying the available content. • The Yahoo and Google taxonomies act in much the same way: They assist a user looking for content by categorizing that content as naturally (as semantically realistically) as possible.
Ontology Spectrum • What is normally known as an ontology can range from the simple notion of a taxonomy (knowledge with minimal hierarchic or parent/child structure), to a thesaurus (words and synonyms), to a conceptual model (with more complex knowledge), to a logical theory (with very rich, complex, consistent, meaningful knowledge).
Ontology Spectrum: Taxonomy • In a taxonomy the semantics of the relationship between a parent and a child node is relatively underspecified or ill defined. • In some cases, the relationship is the subclass of relation; in others, it is the part of relation. In still others, it is simply undefined. • Ex: Your computer's directory structure has arbitrary relationship between any given directory and one of its specific subdirectories.
is subclassification of • In general, all you can say about the relationship between a parent and a child in a semantically weaker taxonomy is that it means is subclassification of: a fairly ill-defined semantics. • The relationship can be the stronger subclass of relation. It is a semantically stronger taxonomy that is the backbone of conceptual models and ontologies.
Ontology Spectrum: Thesaurus • A thesaurus concerns the relationships between terms (words or phrases) structured in a (semantically weak) taxonomy. • So a thesaurus is a taxonomy plus some term semantic relations. • The primary purposes of a thesaurus are to facilitate retrieval of documents and to achieve consistency in the indexing of written or otherwise recorded documents and other items.
Semantic Relations of a Thesaurus • The ANSI/NISO Monolingual Thesaurus Standard defines a thesaurus as "a controlled vocabulary arranged in a known order and structured so that equivalence, homographic, hierarchical, and associative relationships among terms are displayed clearly and identified by standardized relationship indicators
Thesaurus Usage • A thesaurus is typically used to associate the rough meaning of a term to the rough meaning of another term. • Thus, a thesaurus is controlled vocabulary designed to support information retrieval by guiding both the person assigning meta data and the searcher to choose the same terms for the same concept. • Example: Center for Army Lessons Learned Thesaurus (next page)
Thesaurus • A thesaurus ensures that: • Concepts are described in a consistent manner. • Experienced users are able to easily refine their searches to locate the information they need. • Users do not need to be familiar with technical or local terminology.
Examples • Roget's Thesaurus • WordNet thesaurus: http://www.cogsci.princeton.edu/~wn/ • In WordNet, a word is given a definition or definitions (distinct definitions for an individual word are usually called word senses for that word; the notion that a given word such as bank has multiple meanings or word senses is called polysemy ("multiple senses") in linguistics.
Associated Information • A word (or common phrase) in WordNet also has the information typically associated with it according to the ISO 2788 standard. • Synonyms—Those nodes in the taxonomy that are in the mean the same as relation • Hypernyms—Those nodes that are in the parent of or broader than relation in the taxonomy; if the taxonomy is a tree, there is only one parent node. • Hyponyms—Those nodes that are in the children of or narrower than relation;
Example: The word bank • In WordNet, the word bank has the following definitional (word sense) and hypernymic (parent of or broader than) information associated with it: • depository financial institution, bank, banking concern, banking company—(a financial institution that accepts deposits and channels the money into lending activities; "he cashed a check at the bank"; "that bank holds the mortgage on my home") • bank—(sloping land (especially the slope beside a body of water); "they pulled the canoe up on the bank"; "he sat on the bank of the river and watched the currents") • bank—(a supply or stock held in reserve for future use (especially in emergencies) • bank, bank building—(a building in which commercial banking is transacted; "the bank is on the corner of Nassau and Witherspoon") • bank—(an arrangement of similar objects in a row or in tiers; "he operated a bank of switches") • savings bank, coin bank, money box, bank—(a container (usually with a slot in the top) for keeping money at home; "the coin bank was empty") • bank—(a long ridge or pile; "a huge bank of earth") • bank—(the funds held by a gambling house or the dealer in some gambling games; "he tried to break the bank at Monte Carlo") • bank, cant, camber—(a slope in the turn of a road or track; the outside is higher than the inside in order to reduce the effects of centrifugal force) • bank—(a flight maneuver; aircraft tips laterally about its longitudinal axis (especially in turning); "the plane went into a steep bank")
Synsets • In WordNet, the various relations are really between synonym sets, or synsets. • This means that the distinct words or phrases, called terms, that are synonymous at roughly the same level of abstraction (i.e., they are not at either the parent or child level, with respect to each other) are grouped together as a set. • The synset therefore acts as a concept—a mental construct in the human being (or the information system or knowledge representation) that stands behind the term and represents the mental signification of the term.
Term vs. Concept • So, we distinguish term from concept in knowledge or semantic representations. • A term is the label or string representation for the underlying meaning indexed by that term; the underlying meaning is the concept and the attributes, attribute values, and relationships to other concepts that that concept participates in. • Figure 7.8 displays the taxonomic structure for the first word sense for bank.
The first 3 of 10 total word senses for the word bank The immediate parent of the preceding term
Ontology Spectrum: Conceptual Model • A conceptual model is a model of a subject area or area of knowledge, sometimes called a domain, that represents the primary entities (the things of the domain), the relationships among entities, the attributes and attribute values (sometimes called properties and property values) of the entities and the relationships, and sometimes rules that associate entities, relationships, and attributes (or all three) in more complicated ways.
Object-model Level • In a conceptual model, it truly is possible to define and express the subclass of relation between a parent class and a child class. • What is also important is that the definitions of a class, superclass, and subclass be semantically well specified at the meta-model level so that the object-model level classes such as Person and its subclass Employee can be well specified semantically.
Meta-model Level • The object-model level is the level that we are interested in. It is the level at which we construct our domain and system models. • The meta-model level is the level that defines the constructs such as class, relation, and attribute that we will use at the object-model level to define our content models. • The meta-model level is often the level where the conceptual modeling language (such as UML) itself is defined. What is defined at the modeling language level enables us to express things in that language (i.e., construct our own models using the language) at the object level.
ER Model • The Entity-Relational (ER) model or language (and the Enhanced or Extended ER or EER model) that is used to define a conceptual schema for a database is also considered a conceptual modeling language. • When one designs a database, one first creates a conceptual schema, reduces that to a logical schema, and finally reduces that in turn to a physical schema. • These schemas represent levels of abstraction: from the human conceptual level to the database table/column level to the actual implemented tables, columns, and keys.
Ontology Spectrum: Logical Theory • The high-end notion of an ontology: a logical theory. • Much of current ontological engineering and knowledge representation aspires to building ontologies as logical theories. • Logical theories are built on axioms (a range of primitive to complex statements asserted to be true) and inference rules, which together are used to prove theorems about the domain represented by the ontology-as-logical-theory. • The whole set of axioms, inference rules, and theorems together constitute the logical theory.
Express the Semantics • In a logical theory, we can express the semantics of a model to the highest degree possible. • The subclass of relation can become a richer relation, perhaps defined as the disjoint subclass of relation with the property of transitivity. • A class's superclass relation to its subclasses can also be defined as exhaustive—that is, the subclasses exhaustively partition the superclass. • Similar fine semantic distinctions can be made of relations and attributes, and other modeling constructs such as facets, which represent meta data associated with relations (or assertions on assertions).
Ontology • An ontology defines the common words and concepts (meanings) used to describe and represent an area of knowledge, and so standardizes the meanings. • Ontologies are used by people, databases, and applications that need to share domain information . • Ontologies include computer-usable definitions of basic concepts in the domain and the relationships among them. They encode knowledge in a domain and also knowledge that spans domains. So, they make that knowledge reusable.
Ontology • An ontology includes the following: • Classes (general things) in the many domains of interest • Instances (particular things) • Relationships among those things • Properties (and property values) of those things • Functions of and processes involving those things • Constraints on and rules involving those things
Topic Maps • Topic Maps is a technology that has arisen in recent years to address the issue of semantically characterizing and categorizing documents and sections of documents on the Web with respect to their content. • Topic Maps provides a content-oriented index into a set of documents, much like the index of a book but with this qualification: an index of a book does not typically characterize the contents of that book as a set of linked topics, but rather as a set of mostly isolated subject references with occasional cross-references to other subjects.
Topic Maps • A Topic Map, however, does act as a set of linked topics that index a document collection. • In addition, in the Topic Maps paradigm, one can have multiple topic maps indexing the same Web document collections (much as a book may have multiple indexes, such as a subject index, a name index, and so forth; the important point here is that one can have multiple topic maps indexing the subjects in different ways).
Topic Maps • Topic maps can be viewed as information overlays on documents or arbitrary information resources. • They enable content-based navigation over these resources irrespective of the latter's form. • Topic maps thus act as taxonomies—ways of describing, classifying, and indexing an information space consisting of Web and non-Web objects.
TM Standards • The early Topic Maps standard was in fact based on SGML and used a non-XML syntax. • Topic Maps today are specified in terms of two different interchange syntaxes: a more recent one based on XML and an older one based on an SGML DTD. XML TM syntax, referred to as XTM
Topic Maps Standard • The Standard Application Model (SAM) defines the formal data model of Topic Maps and its semantics in natural language. • The Reference Model is intended to be a more abstract model of Topic Maps than SAM and to enable Topic Maps to semantically interoperate with other knowledge representation formalisms and Semantic Web ontology languages. • The Topic Map Query Language (TMQL) will be an SQL-like language for querying topic map information. • The Topic Map Constraint Language (TMCL) will give a database schemalike capability to Topic Maps enabling constraints on the meaning to be defined for Topic Maps. • Both TMQL and TMCL are dependent on the final elaboration of SAM, which is itself dependent on RM
Topic Maps Concepts • The XTM standard identifies the key concepts of Topic Maps. The key concepts are • Topic • Association • Occurrence • Subject descriptor • Scope.
Topic • Anything can be a topic—that is, any distinct subject of interest for which assertions can be made. • A subject is the what; a topic is an information representation of the what. So a topic represents the subject that is referred to.
Occurrence • An occurrence is a resource specifying some information about a topic. The resource is either addressable (using a URI) or has a data value specified inline. • For the former, resourceRef is used:<occurrence> ... <resourceRef xlink:href="http://www.ci.front-royal.va.us/"/> </occurrence> • For the latter, the inline value, resourceData, is used <occurrence> ... <resourceData>Front Royal is on the Shenandoah River </resourceData> </occurrence>
Association • An association is the relationship between (one or more) topics. • Example:the association located-in is asserted to hold between two topic references. <association> <instanceOf> <topicRef xlink:href="#located-in"/> </instanceOf> <member> <roleSpec><topicRef xlink:href="#city"/></roleSpec> <topicRef xlink:href="#Front-Royal"/> </member> <member> <roleSpec><topicRef xlink:href="#state"/></roleSpec> <topicRef xlink:href="#Virginia"/> </member> </association>
Association • An association is similar to the database notion of a relation or to the ontology notion of a predicate. • An association role specifies how a particular topic acts as a member of an association, its manner of playing in that association. If there were a uses association between Sammy Sosa and a Rawlings 34-inch Pro Model baseball bat, then Sammy would be in the batter role and the Rawlings would be in the bat role: <association> <instanceOf><topicRef xlink:href="#uses"/></instanceOf> <member> <roleSpec><topicRef xlink:href="#batter"/></roleSpec> <topicRef xlink:href="#Sammy-Sosa"/> </member> <member> <roleSpec><topicRef xlink:href="#bat"/></roleSpec> <topicRef xlink:href="#Rawlings-34-inch-Pro-Model- Baseball-Bat"/> </member> </association>
SubjectIndicator • A subject indicator is just a way of indicating subjects. Typically a subject is indicated by defining a resource. If two given topics in fact use the same resource, then their subjects are identical. • Example: Both topics (Front Royal and Front Royal, Virginia) are really about the same subject. (next page)
Scope • Scope in Topic Maps is similar to the notion of namespace in other markup languages. • Scope specifies the applicability or context of the topic, its occurrences and resources, and its associations. • The names of topics are unique within a scope. Resources specified within a particular topic have the same scope as that topic. • That is why topic maps should be merged if they have the same base name; they indicate the same subject having the same scope.