200 likes | 418 Views
SAC 2005. Survey of Semantic Annotation Platforms. Lawrence Reeve Hyoil Han. Semantic Annotation. Creating semantic labels within documents for the Semantic Web Used to support: Advanced searching (e.g. concept) Information Visualization (using ontology) Reasoning about Web resources
E N D
SAC 2005 Survey of Semantic Annotation Platforms Lawrence ReeveHyoil Han
Semantic Annotation • Creating semantic labels within documents for the Semantic Web • Used to support: • Advanced searching (e.g. concept) • Information Visualization (using ontology) • Reasoning about Web resources • Converting syntactic structures into knowledge structures (humanmachine)
Semantic Annotation Concerns • Scale, Volume • Existing & new documents on the Web • Manual annotation • Expensive – economic, time • Subject to personal motivation • Schema Complexity • Storage • support for multiple ontologies • within or external to source document? • Knowledge base refinement • Access - How are annotations accessed? • API, custom UI, plug-ins
Semantic Annotation Platforms • Why semantic annotation platforms (‘SAPs’)? • Reduces human involvement • Consistent application of ontologies • Reduced cost – economic & time • Scalability • Multiple ontologies for single document
Semantic Annotation Platforms • Characteristics • Provide many services, not just annotation • Storage: ontology, KB, and annotation • Access APIs (query annotations) • Integrate information extraction methods • Support for IE (gazetteers) • Extensible
SAP Classification • Pattern-based • Pattern-discovery • Iterative learning • provide initial seed set • find new entities find new patterns • repeat • Rules • Manually define rules to find entities in text • Simple label matching
SAP Classification • Machine-learning based • Wrapper Induction • LP2 • Uses structural and linguistic information • Produces tagging & correction rules as output • Statistical models • Hidden Markov Model
SAP Classification • Multistrategy • Combine pattern and machine-learning approaches • Did not find a platform that implements this approach • Platform extensibility important for implementation
Semantic Annotation Platforms • Selection • Idea is to get a representative sample of platforms using various information extraction techniques • System needed to be a platform offering services, not just algorithm
Language Toolkits • GATE – language processing system • Component architecture, SDK, IDE • ANNIE (‘A Nearly-New IE system’) • tokenizer, gazetteer, POS tagger, sentence splitter, etc • JAPE – Java Annotations Pattern Engine • provides regular-expression based pattern/action rules • Amilcare • adaptive IE system designed for document annotation • based on LP2 • uses ANNIE
KIM (2003) • ontology, kb, semantic annotation, indexing and retrieval server, front-ends (Web UI, IE plug-in) • KIMO ontology • 250 classes, 100 properties • 80,000 entities from general news corpus in KB • (plus >100,000 aliases) • IE • Uses GATE, JAPE • Gazetteers (from KB) Source: http://www.ontotext.com/kim/SemWebIE.pdf
Ont-O-Mat (2002) • Uses Amilcare • Wrapper induction (LP2) • Extensible • Adapted in 2004 for PANKOW algorithm • Disambiguation by maximal evidence • Proper nouns + ontology linguistic phrases Source: http://www.aifb.uni-karlsruhe.de/WBS/sha/papers/ kcap2001-annotate-sub.pdf
MUSE (2003) • Pipeline of processing resources (PRs) • PRs called conditionally based on text attributes • Makes use of JAPE • Adaptive rules • Can link multiple resources together • Gazetteer + part-of-speech tagger • Resolve entity ambiguities Source: http://gate.ac.uk/sale/expertupdate/muse.pdf
SemTag (2003) • Large-scale annotation • Annotations separate from source • “Semantic Label Bureau” • Uses the TAP taxonomy • Approach is: • Find match to label in taxonomy • Save window before & after match • Perform disambiguation • Main contribution is using taxonomy for disambiguation Source: http://www.almaden.ibm.com/webfountain/ resources/semtag.pdf
Platform Effectiveness *as reported by platform authors
Summary • Several platforms developed in last several years • Large implementation effort; many services • Differentiated by • IE methods used • Services provided • Future • IE integration will likely improve annotation accuracy • Extension of existing platforms will allow for quicker research