420 likes | 494 Views
IN350 Summary and Overview. Judith Molka-Danielsen Nov.28.2003. Major Topics. What is Content Management? What is Document Management and Information Steering?(2003)
E N D
IN350 Summary and Overview Judith Molka-Danielsen Nov.28.2003
Major Topics • What is Content Management? What is Document Management and Information Steering?(2003) • What is the role of Markup Languages in Content Management? Document Properties and Markup Languages (Text + Multimedia Languages + Properties, Ch. 6, Baeza-Yates) • File Organization and Storage Structures, Connolly). • Text Properties, Zipf's Law, Heap's Law (Text Operations, Ch.7, Baeza-Yates). • Text Operations ch. 7 (doc) • Search Enhancements (older notes) (htm) and Compression of text. (Compressing, Ch. 7, Cyganski). • Oracle Text Operations... Creating an Index And Types of Indicies
Major Topics • Retrieval Evaluation Measures, with Precision and Recall. (Read: Retrieval Evaluation, Ch.3, Baeza-Yates). • The role of taxonomies in content management. • Searching the Web (Read: Searching on the Web, Ch.13, Baeza-Yates). • Multimedia Management (Read: Image Compression, Ch.8, Cyganski, and Digital Video, Ch. 9, Cyganski). • Data Warehousing (Read: Data Warehousing, Ch. 13). • Large Capacity Storage, Ch. 17, Cyganski). • Document Publishing and Distribution and older notes on Online Publishing • B2B e-commerce standards for document exchange • Ontologies in Document Exchange
Ontologies Reference:Jon Atle Gulla Nicola Guarino:Formal Ontology and Information Systems. Robert Jasper and Mike Uschold: A Framework for Understanding and Classifying Ontology Applications. From the IDILecture series
Ontology ABC • Ontology attracts attentions across many fields in computer science recently. • There exists no consensus definition about ontology. • One most cited is “Ontology is an explicit representation of a conceptualization, the conceptualization includes a set of concepts, their definition and inter-relationships”. • In many cases, the term ontology is another name denoting the result of familiar activities like conceptual analysis and domain modeling. • The roles of ontology vary from knowledge management to semantic interoperability. • One important reason for that ontology attracts so much attention recently is the semantic web, since ontology is considered the key enabler of semantic web.
More terminology • Ontology: engineering artifact • Constituted by a vocabulary (concepts, relations) • Assumptions about intended meaning • Formalization: • Logical theory accounting for the intended meaning of a formal vocabulary • Committed to a particular conceptualization of the world • Ontology vs. conceptualization • Conceptualization is language-independent • Ontology is language-dependent
Example 1. Ontology of American Universities • SHOE ontology of university concepts <?xml version = “1.0” encoding=“ISO-8859-1” standalone=“no” ?> <!DOCTYPE ontology SYSTEM “http://…/onto.dtd”> <ontology id=“university-ont” version=“2.1” description=“…”> <def-category name=“Department” isa=“EducationalOrganization” short=“university department /> <def-category name=“Activity” isa=“SHOEEntity” short=“activity /> <def-category name=“Work” isa=“Activity” short=“work /> <def-category name=“Course” isa=“Work” short=“teaching course /> …. </ontology>
Example 2. Business Process Ontology • MIT process handbook Sell reserve credit Sell credit card Sell installment loan Sell loan Sell letter of credit Sell mortgage Sell credit line Sell account Sell financial service Sell certificate of deposit Sell savings & investment service Sell mutual funds Sell retirement plan Sell management service Sell ATM access Sell account access services Sell telephone access
Example 3. Hierarchical Categories? • Can hierarchical categories be ontologies? Conceptualization of medical domain?
More Confusion • Differences and similarities ? Thesaurus Ontology Categories Dictionary
The Semantic Web • Goal: Evolve the Web – • From sites designed for human consumption • To sites also understandable and usable by computer programs. • What would that do for us? • Query answering rather than document retrieval • Services findable, usable, and composable by automated agents • Information exchange among independently designed programs • How do we get there from here? • For services – • Service description • Ontologies to provide intended meaning of service item. • For documents – • Structure, ala XML • Ontologies to provide intended meaning of terms “The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”
XML Describes Document Structure • HTML • Language for describing how to display document content E.g., tag a word to be displayed in bold or italic • XML • Language for describing the structure of document content E.g., declare data to be a retail price, a sales tax, a book title, ... • Uniform method for describing and exchanging data using HTTP • Provides a “syntactic schema” • XML allows authors to create their own markup (e.g. <AUTHOR>), which seems to carry some semantics. However, from a computational perspective tags like <AUTHOR> carries as much semantics as a tag like <H1>. A computer simply does not know, what an author is and how the concept author is related to e.g. a concept person.
Bibliographic Entry in XML <Publication URL = "ftp://db.stanford … xml.ps”> <Title> From Semistructured Data ... Language </Title> <Author> R. Goldman </Author> <Published> Proceedings of ... Databases </Published> <Location> <City> Philadelphia </City> <State> Pennsylvania </State> </Location> <Date> <Month> June </Month> <Year> 1999 </Year> </Date> </Publication> Location of what? When in June?
XML Is Not Enough • Language for describing the structure of document content E.g., declare data to be a retail price, a sales tax, a book title, ... • Uniform method for describing and exchanging data using HTTP • Provides a “syntactic schema” • Provides no means of specifying intended meaning of tags • Ontologies enable independently developed programs to exchange data • Ontologies specify intended meaning in a computer interpretable form
W3C Semantic Web Activity • Semantic Web Activity (http://www.w3.org/2001/sw/) • “Established to serve a leadership role, in both the design of enabling specifications and the open, collaborative development of technologies that support the automation, integration and reuse of data across various applications.” • Successor to the W3C Metadata Activity • RDF Core Working Group (http://www.w3.org/2001/sw/RDFCore/) • Responsible for the Resource Description Framework (RDF) • Web Ontology Working Group (http://www.w3.org/2001/sw/WebOnt/) • Charter: Build upon the RDF Core work a language for defining structured web based ontologies which will provide richer integration and interoperability of data among descriptive communities • Developing Ontology Web Language (OWL) • Based on DAML+OIL, developed in DARPA’s Agent Markup Language program
Resource Description Framework • A simple representation language for describing Web resources • All sentences are triples of the form “(Property Subject Object)” • Property is a binary relation • Subject is a URI reference • Object is either a URI reference or a literal E.g., (creatorOf http://www.w3.org/Lassila “Ora Lassila”) • XML external syntax • Model theoretic semantics • Includes a resource “Class” and properties “type”, “subclassOf”, etc. • Supports classes of resources and literals E.g., (type Elephant Clyde) • Supports subclass hierarchies E.g., (subclassOf Elephant Mammal) • Like a primitive frame representation language
Properties type subject predicate object RDF • Classes • Resource • Property • Literal • Statement • Container • Bag • Seq • Alt
RDF Schema • Classes • Class • ContainerMembershipProperty • Properties • subClassOf • subPropertyOf • seeAlso • isDefinedBy • comment • label • range • domain • member Literal Resource Container Class Property Statement ContainerMembershipProperty Bag Seq Alt
RDF-S Class and Property Definitions <rdf:Class ID="MotorVehicle"> <rdfs:subClassOf rdf:resource="http.../PR-rdf-schema-19990303#Resource"/> </rdf:Class> <rdf:Class ID="PassengerVehicle"> <rdfs:subClassOf rdf:resource="#MotorVehicle"/> </rdf:Class> <rdf:Class ID="Van" <rdfs:subClassOf rdf:resource="#MotorVehicle"/> </rdf:Class> <rdf:Class ID="MiniVan"> <rdfs:subClassOf rdf:resource="#Van"/> <rdfs:subClassOf rdf:resource="#PassengerVehicle"/> </rdf:Class> Christine is a passenger vehicle. Is Christine a motor vehicle? Yes. Christine is registered to Arnie. What is Arnie? A person. <rdf:Property ID = "registeredTo"> <rdfs:domain rdf:resource = “#MotorVehicle” /> <rdfs:range rdf:resource = “#Person” /> </rdf:Property>
Comments on RDF and RDF-S • Severely lacking in expressive power • Domain and range constraints rather than Value-Type E.g., can’t define class of people all of whose children are male • No cardinality constraints • Particularly important for “exactly 1” and “at most 1” • No decompositions • Particularly important for “disjoint” and “exhaustive” • No axioms • No negation (!) • Not useful for checking consistency E.g., can’t prove an object is not an instance of a class • Basically a typing system • More powerful ontology representation languages are needed.
The DAML Program • Web site: http://www.daml.org/ • DAML: DARPA Agent Markup Language • Goal: achieve semantic interoperability between Web pages, databases, programs, and sensors. • DAML+OIL: • This language gets its strange name because it was created by a Joint Committee of US and European researchers who were working on two different, but similar languages. • DAML stands for the DARPA Agent Markup Language, which is a project being funded by the US Defense Advanced Research Projects Agency -- the same organization that funded much of the original work on the Internet (which was then called the ARPAnet). • OIL stands for the Ontology Interchange Language and is developed by a number of researchers, primarily a group funded by the European Union's Information Society Technologies Program. • The joint committee created a new language with the best features of SHOE, DAML, OIL and several other markup approaches. At the time of this writing, DAML+OIL is the most advanced web ontology language, and it is expected to provide the basis for future web standards for ontologies (OWL.
DAML+OIL • A representation language for user-defined ontologies • An ontology added to RDF and RDF-Schema • Specification document: http://www.daml.org/2000/12/daml+oil-index.html • Expressive power analogous to: • Description logics (e.g., CLASSIC) • Monotonic frame languages (e.g., OKBC knowledge model) • Designed in collaboration with the European Community Designers of the Ontology Inference Layer (OIL) • Basis for OWL, the candidate W3C standard
Thing Restriction List Ontology AbstractProperty TransitiveProperty DatatypeProperty UniqueProperty UnambiguousProperty Nothing DAML+OIL Classes
Equivalence equivalentTo, sameClassAs, samePropertyAs Lists first, rest, item Properties inverseOf Ontologies versionInfo, imports Classes disjointWith Defining Non-primitive classes unionOf, disjointUnionOf, intersectionOf, complementOf, oneOf Restrictions onProperty, toClass, hasValue, hasClass, hasClassQ minCardinality, maxCardinality, cardinality minCardinalityQ, maxCardinalityQ, cardinalityQ DAML+OIL Properties
All objects all of whose parents are persons All objects that have exactly 1 father Person Property Restrictions on Classes <Class ID = "Person"> <comment> Person is a subclass of objects whose parents are persons. </comment> <rdfs:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource = “#hasParent” /> <daml:toClass rdf:resource = “#Person” /> </daml:Restriction> </rdfs:subClassOf> <comment > Person is a subclass of resources that have one father.</comment> <rdfs:subClassOf> <daml:Restriction> <daml:onProperty rdf:resource = “#hasFather” /> <daml:cardinality> 1 </daml:cardinality> </daml:Restriction> </rdfs:subClassOf>
Formal ontology and information systems • This paper is trying to offer a systematic account of the central role ontologies may play in information systems. • Ontology may have impacts for the three main components of information systems: information resources, user interfaces and application programs. • In AI, an ontology is an engineering artifact. In the simplest case, an ontology describes a hierarchy of concepts related by subsumption relationships; in more sophisticated cases, suitable axioms are added in order to express other relationships between concepts and to constrain their intended interpretation.
Top-level ontologies: general concepts like space, time, matter, object, event,etc… which are independent of a particular problem or domain. Domain ontogies: the vocabulary related to a generic domain (medicine) , by specifying the terms in the top-level ontology. Task ontologies: describe generic tasks or activities (diagnosing or selling) Application ontologies: describe concepts depending both on a particular domain and task. Application ontology is a particular knowledge base, describing facts assuming to be always true by a community of users. Kinds of ontologies, depending on level of generality
Ontology-driven information systems • An IS consists of components of three different types: application programs, information resources, and user interfaces. Ontologies can play a central role here. • Two dimensions for analysis: • Temporal dimension: using ontologies at development time or run time. • Structural dimension: impact of ontologies on different IS components.
The structural dimension: impact of ontologies on IS components • Using an ontology for the database component. • An ontology can be compared with the schema component of a database. • At development time, the resulting conceptual model of requirement analysis can be represented as a computer processable ontology and from there mapped to concrete target platforms. • Another main use of ontology in development time is information integration. • At run time, explicit ontologies (run-time accessible database schema) are at the core of the mediation based approach to information integration.
Impact.. • Using an ontology for the user interface component. • Allow the user to query and browse the ontology. • The user can browse the ontology in order to better understand the vocabulary used by the IS, being able therefore to formulate queries at the desired level of specificity. • Another usage is vocabulary detaching: the user can use his own natural language terms which are mapped to the IS vocabulary with the help of the ontology
Impact.. • Using an ontology for the application program component. • Application programs encode knowledge in the form of type or class declaration and procedures. • The ontological commitment of the program should be made explicit using ontologies • Further, for the benefits of ease of maintenance and flexibility, we can turn the program into knowledge based system.
Conclusion • Ontology driven information system. • Different types of ontologies. • The role of ontology in IS • Time dimension • Development time vs. run time • Structural dimension. • Information resource, user interfaces, and application program.
Common access to information • Use ontologies to enable multiple target applications (or humans) to have access to heterogeneous sources of information (ontology based information integration). • Four categories. • Human communication • Data access via shared ontology • Data access via mapped ontology • Shared services
Promote common understanding among knowledge workers. Supporting technologies include ontology editors and browsers. Example: the work flow management coalition reference documents. Maturity: library classification skills have a long history (KWs sharing an ontology in the form of a glossary) Human communication
An ontology can be used as an interchange format to enable common access to operational data. Example: Process Interchange Format (PIF) and EcoCyc Maturity: commercial success exists in some context, while in others, the technology is a long way from being mature. Difficult to agree on common ontology Data access via shared ontology
Similar to data access via shared ontology, but different in the focus of what is being shared. The ontology defines interfaces in multiple target languages. Example: Using UML to create an ontology for product data management, this ontology is then used to generate interface code for the client and server. Maturity: relatively mature Shared services
Use an ontology for searching an information repository for desired resources. Example: Yahoo Maturity: Many commercial internet portals are beginning to explore the use of concepts for ontology-based search. Endeca, Kaidara Ontology based search
Conclusion • The paper presents a framework for understanding ontology applications. • We studied • The framework • Various ontology application scenarios (use cases). • Ontology as specification • Common access to information • Human communication • Data access via shared ontology • Data access via mapped ontology • Shared services • Ontology-based search
Summary • Ontology ABC • Motivation • Semantic web • RDF and RDFS • Brief introduction to state of art ontology languages. • In depth introduction to one of such languages - DAML+OIL • Impact of ontology to information system. • Classification of ontology applications.
IN350 Document Management and Information Steering • What is Content Management? What is Document Management and Information Steering?(2003) • What is the role of Markup Languages in Content Management? Document Properties and Markup Languages (Text + Multimedia Languages + Properties, Ch. 6, Baeza-Yates) • File Organization and Storage Structures, Connolly). • Text Properties, Zipf's Law, Heap's Law (Text Operations, Ch.7, Baeza-Yates). • Text Operations ch. 7 (doc) • Search Enhancements (older notes) (htm) and Compression of text. (Compressing, Ch. 7, Cyganski). • Oracle Text Operations... Creating an Index And Types of Indicies • Retrieval Evaluation Measures, with Precision and Recall. (Read: Retrieval Evaluation, Ch.3, Baeza-Yates). • The role of taxonomies in content management. • Searching the Web (Read: Searching on the Web, Ch.13, Baeza-Yates). • Multimedia Management (Read: Image Compression, Ch.8, Cyganski, and Digital Video, Ch. 9, Cyganski). • Data Warehousing (Read: Data Warehousing, Ch. 13). • Large Capacity Storage, Ch. 17, Cyganski). • Document Publishing and Distribution and older notes on Online Publishing • B2B e-commerce standards for document exchange • Ontologies in Document Exchange