140 likes | 152 Views
Explore how semantic annotation enables machine-understandable web content, benefits of ontology-based information extraction, scalability solutions, and contributions to the Semantic Web. Dive into divide-and-conquer architecture, two-layer annotation models, large domain handling, and ontology language unification.
E N D
A New Web Semantic Annotator Enabling A Machine Understandable Web BYU Spring Research Conference 2005 Yihong Ding Sponsored by NSF
Ontology Machine Understandable Web • Content is represented in • commonly shared, • explicitly defined, • generic conceptualizations. • Also known as the Semantic Web
Why Machine Understandable? • Meaningful data • Exchangeable information • Interoperable programs/services • “… allows data to be shared and reused across application, enterprise, and community boundaries …” --- Tim Berners-Lee etc. 2001
Semantic Annotation: A Way to Achieve Machine Understandable • Add explicit, formal, and unambiguous notes to web documents • Explicit: publicly accessible • Formal: publicly agreeable • Unambiguous: publicly identifiable
Ontology-based IE Wrapper Document Semantic Annotation Using Automated IE Engines Non-ontology-based IE Wrapper Document
Augmentations for the Annotator Semantic annotator using data-extraction ontologies: • a two-layer annotation model to achieve fast, high accurate, and resilient semantic annotation • a divide-and-conquer style architecture to scale system to large domains • a web ontology language augmentation to compliment OWL for semantic annotation purposes
Same-Layout Documents Two-Layer Annotation Model Massive Annotation Process Structural Annotator Document Sample Annotation Process Conceptual Annotator using ontology-based IE tool
Two-Layer Annotation Model, Benefits • Achieve both resiliency and fast speed of execution • Require no training for generating structural annotators • Demand no labeling to results from structural annotators
Scalability Issues • Large domain containing many concepts • Large annotation task dealing with many web pages
Observation • A large domain is a combination of several small domains. • Consistently clustered domains exist, where each this type of domain is • Composed with same cluster of concepts • Consistent to any larger domain in which it participates • Usually with small number of concepts
(1) Selected Domain Ontologies (2) Document Document • Text classification • Scalable annotation Collection of small atomic domain ontologies …… Divide-and-Conquer Style Architecture for Scalability Issue
Divide-and-Conquer, Benefits • Comparing to large ontologies, small ontologies are • Simpler to construct • Faster to execute • Easier to check and update • More convenient to reuse • Identify the range of an ontology dynamically in the web page level • Avoid the problem of narrowing a large domain ontology down to the web page level • Maximize the reuse of existing ontologies
Ontology Representation • Two ontology languages • Data-extraction ontology (OSMX) • Semantic web ontology (OWL) • Language unification
Contributions • Automatically semantic annotator using ontology-based IE wrapper • Two level annotation: layout-based annotator on top of conceptual annotator • Divide-and-conquer style solution to scale annotation process to large number of concepts • Web ontology language unification