Automatic Creation and Simplified Querying of Semantic Web Content

Automatic Creation and Simplified Querying of Semantic Web Content An Approach Based on Information-Extraction Ontologies Yihong Ding, David W. Embley, and Stephen W. Liddle Brigham Young University

Fundamental Problems • Lack of semantic web content • Difficulty of content creation • Inability to use semantic web content easily

Proposed Solutions • Automatically annotate data-rich web pages (turning them into semantic web pages) • Provide for free-form, textual queries of semantic web content

A Show-Case Vision Find me the price and mileage of red Nissans – I want a 1990 or newer.

Demo I: Data Extraction

Demo II: Semantic Annotation

Demo III: Free-Form Query

Explanation: How it Works • Extraction Ontologies • Semantic Annotation • Free-Form Query Interpretation

Extraction Ontologies Object sets Relationship sets Participation constraints Lexical Non-lexical Primary object set Aggregation Generalization/Specialization

Formalism & Extraction Ontologies (a quick side note) • Fully formalized in predicate calculus • Object set ~ 1-place predicate • N-ary relationship set ~ n-place predicate • Constraint ~ closed predicate-calculus formula • As a description logic ~ ALCN (Attributive Language with Complement and Numeric Restrictions)

Extraction Ontologies Data Frame: Internal Representation: float Values External Rep.: \s*[$]\s*(\d{1,3})*(\.\d{2})? Left Context: $ Key Word Phrase Key Words: ([Pp]rice)|([Cc]ost)| … Operators Operator: > Key Words: (more\s*than)|(more\s*costly)|…

Data-Extraction Results: Car Ads Salt Lake Tribune Recall % Precision % Year 100 100 Make 97 100 Model 82 100 Mileage 90 100 Price 100 100 PhoneNr 94 100 Feature 91 99 Training set for tuning ontology: 100 Test set: 116

Car Ads: Comments • Dynamic sets • Missed: MERC, Town Car, 98 Royale • Could use lexicon of makes and models • Unspecified variation in lexical patterns • Missed: 5 speed (instead of 5 spd), p.l (instead of p.l.) • could adjust lexical patterns • Misidentification of attributes • Classified AUTO in AUTO SALES as automatic transmission • Could adjust exceptions in lexical patterns • Typographical errors • “Chrystler”, “DODG ENeon”, “I-15566-2441” • Could look for spelling variations and common typos

General Extraction Results • ~ 20 Domains (cars, obituaries, cameras, jobs, games, prescription drugs, …) • Simple, unified domains: nearly 100% recall and precision • Complex, loosely defined domains (e.g. obituaries: 82% recall and 74% precision) • Typical: 80%+ recall and precision

Generality & Resiliency ofExtraction Ontologies (another quick side note) • Assumptions about web pages (generality) • Data rich • Narrow domain • Document types • Simple multiple-record documents (easiest) • Single-record documents (harder) • Records with scattered components (even harder) • Declarative (resiliency) • Still works when web pages change • Works for new, unseen pages in the same domain • Scalable, but takes work to declare the extraction ontology

Semantic Annotation

Free-Form Query Interpretation • Parse Free-Form Query (with data extraction ontology) • Select Ontology • Formulate Query Expression • Run Query Over Semantically Annotated Data

Parse Free-Form Query “Find me the and of all s – I want a ” price mileage red Nissan 1996 or newer >=Operator

Select Ontology “Find me the price and mileage of all red Nissans – I want a 1996 or newer” Similarity value: 2 Similarity value: 5

Formulate Query Expression • Conjunctive queries and aggregate queries • Mentioned object sets are all of interest in the result. • Values and operator keywords determine conditions. • Color = “red” • Make = “Nissan” • Year >= 1996 >= Operator

Formulate Query Expression For Let Where Return

Run QueryOver Semantically Annotated Data

Query Interpretation Results:Pilot Experiment with Car Ads • 15 car-ads free-form queries from 3 volunteer CS students • Results • Recognizing object sets of interest • Recall: 85% • Precision: 90% • Recognizing constraints • Recall: 61% • Precision: 79% • Problems • Regular expressions not tuned up and lexicons incomplete • Ambiguities: “Are there any Ford mustangs, 2002, that are red?” (Is 2002 a year, mileage, or price?) • Caveats • No disjunction • No negation

GeneralQuery Interpretation Results AskOntos (Pilot Experiment on 5 domains: cars, real estate, countries, movies, diamonds) • Object sets of interest recognized • Recall: 90% • Precision: 90% • Conditions recognized • Recall: 71% • Precision: 88%

Pragmatics All is not rosy … • Technical problems • Extraction and query-interpretation accuracy • Execution speed • Harvesting • Crawling?! • Information behind forms on the hidden web • Social problems • Cooperation from web site developers • End-user concerns • Motivation • Trust

Conclusions • Automatically create semantic-web content • Do data extraction over an ordinary web page • Create semantic-web page • Cache page • Store external semantic annotation wrt an ontology • Query semantic web pages • Free-form queries • Return results • Table • Link to original web page (scrolled and highlighted) • Pragmatic considerations www.deg.byu.edu

Automatic Creation and Simplified Querying of Semantic Web Content

Automatic Creation and Simplified Querying of Semantic Web Content

Presentation Transcript

Querying the Semantic Web with RQL *

Storing and Querying Fuzzy Knowledge in the Semantic Web

 -Queries: Enabling Querying for Semantic Associations on the Semantic Web

Semantic Basics: Markup, Querying, and Reasoning

A Comprehensive Framework for Semantic Annotation of Web Content

SOWL:Spatiotemporal Representation , Reasoning and Querying over the Semantic Web

Schema Free Querying of Semantic Data

Content Creation

Automatic Creation of Web Services from Extraction Ontologies

Markup Languages in Semantic Web and Application of Semantic Web

Natural Language Querying of the Semantic Web

Searching, Navigating, and Querying the Semantic Web with SWSE

A Semantic Web Content Model and Repository

Tool for Ontology Paraphrasing, Querying and Visualization on the Semantic Web

Querying Dynamic and Context-Sensitive Metadata in Semantic Web

Chapter 3 Querying the Semantic Web

Knowledge Streams: Stream Processing of Semantic Web Content

Automatic Labeling of Semantic Roles

Semantic Access: Semantic Interface for Querying Databases

Basics Of Content Creation

Providing Intelligent Content by Using Semantic Web and Web Mining

Content Creation and Distribution