170 likes | 300 Views
CS652 Spring 2004 Summary. Course Objectives. Learn how to extract, structure, and integrate Web information Learn what the Semantic Web is Learn how to build ontologies for the Semantic Web Investigate class-related research topics Be introduced to Semantic Web services.
E N D
Course Objectives • Learn how to extract, structure, and integrate Web information • Learn what the Semantic Web is • Learn how to build ontologies for the Semantic Web • Investigate class-related research topics • Be introduced to Semantic Web services
Generally Applicable Ideas • Semantic Understanding • Data: attribute-value pairs • Information: data in a conceptual model • Knowledge: information with agreement • Meaning: useful knowledge • Measuring Success • Recall: NrCorrect/TotalCorrect • Precision: NrCorrect/(NrCorrect+NrIncorrect) • F-measure: (β2+1)PR/(β2P+R)
Information Extraction • Get relevant information • Not: • Information retrieval: get relevant pages • Web mining: discover unknown associations • Wrapper: maps data to a suitable format • Generation techniques • Machine learning (e.g. RAPIER) • Natural language processing (e.g. RAPIER) • Hidden Markov Models • By-example generation tools (e.g. Lixto) • By-pattern generation (e.g. RoadRunner) • Wrapper Maintenance
Information Extraction – BYU Ontos • Ontology-based • Data frames • Strengths • Resilient to page changes • Robust across sites within the same domain • Works well with all types of data-rich text • Weaknesses • Hand-crafted ontologies and data frames • Requires record-boundary recognition • Does not learn • Applications • Extraction • High-precision classification • Schema mapping • Semantic Web annotation • Agent communication • Ontology generation
Semantic Web • Tim Berners-Lee • “information [has a] well-defined meaning” • “[enables] computers and people to work in cooperation” • Adds context and structure via metadata • Agent computing paradigm • Knowledge markup; semantic annotation
Ontologies • “a formal, explicit specification of a shared conceptualization” [Gruber93] • Formal: machine readable; FOL • Explicit: concepts and constraints explicitly defined • Shared: community accepted • Conceptualization: abstract model (OSM) • “shared vocabulary”
Ontology Formalism Ontology O = <V, A> where V = vocabulary = predicate symbols (each with some arity) A = axioms = formulas (constraints and rules) Predicates: Owner(x), Vehicle(x), Car(x), Truck(x), Owner(x) owns Vehicle(y) Formulas: x(Car(x)Truck(x) Vehicle(x)) x(Owner(x) 1y(Owner(x) owns Vehicle(y)) Inference Rules: TruckOwner(x) :- Owner(x), Owner(x) owns Vehicle(y), Truck(y)
Semantic Web Ontologies • RDF • DAML+OIL • OWL
Semantic Web Annotationwith BYU Ontos BYU Ontos Extraction Ontology OWL Ontology osm.cs.byu.edu/CS652s04/ontologies/OWL/carads.owl Annotated Semantic Web Page osm.cs.byu.edu/CS652s04/ontologies/annotatedPages/carSrch1_semweb.html
Ontology Generation for the Semantic Web • Necessary for the Semantic Web • Ontology engineering • Tools • Methodology • Languages (e.g. SHOE, OWL) • Semiautomatic generation • NLP + machine learning (e.g. OntoText) • Create from dictionary or lexicon (e.g. Doddle) • Generation from tables (e.g. TANGO) • Ontology maintenance
Ontology Libraries for theSemantic Web • Locating ontologies • Indexing and organization • Search mechanisms • Reusing ontologies • Find one and modify • Find several, merge and modify
Ontology Mapping, Merging, and Integration for the Semantic Web • Ontology reuse • Heterogeneous agent communication • Agent commitment to a new ontology • On the fly: map, merge, integrate (nontrivial to automate) • Can we do well enough? • Can we synergistically involve a user? • Information extraction wrt target • Table extraction (BYU Ontos) • Semiautomatic wrapper/mediator construction by automatically providing mappings
Schema Mapping • Schema-level matchers • Name matchers (dictionaries – WordNet) • Structural context matchers • Instance-level matchers • Value characteristics • Data-frame matchers • Mapping cardinality • 1:1 (direct) • 1:n, n:1, n:m (indirect, complex) • Multi-faceted mapping techniques
Schema Integration • FCA merge using lattices • Global as View (GAV) • Global mediator relations are views over source relations • Dynamic mediator schema – changes to accommodate new sources (hard to add new sources) • Query only requires view unfolding • Good for static, centralized systems • TSIMMIS • Local as View (LAV) • Local source relations are views over mediator relations • Fixed mediator schema – new sources identify components covered (easy to add new sources) • Complex query rewriting • Good for dynamic, distributed systems • Information Manifold
What is your dream for the Semantic Web? • Intelligent personal agents that can: • Gather (just) the information we want and deliver it to us when we want it • Help us with scheduling • Help us buy the goods we want • Negotiate and conduct business for us • … • Intelligent business agents • Intelligent discovery agents • … What can you do to make your dreams come true?