140 likes | 333 Views
OntoWeb SIG5 Language Technology in Ontology Development and Use. Paul Buitelaar Language Technology Department Competence Center Semantic Web DFKI GmbH Saarbrücken, Germany Thierry Declerck Computational Linguistics Department Saarland University Saarbrücken, Germany.
E N D
OntoWeb SIG5Language Technology in Ontology Development and Use Paul Buitelaar Language Technology Department Competence Center Semantic Web DFKI GmbH Saarbrücken, Germany Thierry Declerck Computational Linguistics Department Saarland University Saarbrücken, Germany
LT-based Semantic Web Semantic Web Knowledge Markup, Ontology Development Information Extraction (Extracting Meaningful Units) Linguistic/Semantic Annotation (Analyzing Dependency Structure) Part-of-Speech Tagging, Morphological Analysis, Phrase Recognition, Grammatical Functions, Semantic Tagging
LT in Knowledge Markup • Turning the web into a semantic web implies widespread knowledge markup of web documents with ontology-based metadata (Semi-)automatic authoring support for knowledge markup of textual/multimedia documents will be provided by language technology tools • Linguistic/Semantic Annotation > Information Extraction > LT-based Knowledge Markup
LT in Ontology Development • Ontologies evolve rapidly over time and between applications, therefore (semi-)automatic adaptation of ontologies is essential for their efficient use • Adaptation involves the use of NLP and ML methods • Linguistic/Semantic Annotation > Information Extraction > Text Mining
Linguistic Annotation (MuchMore) Balint syndrom is a combination of symptoms including simultanagnosia, a disorder of spatial and object-based attention, disturbed spatial perception and representation, and optic ataxia resulting from bilateral parieto-occipital lesions. <text> <token id="w1" pos="NN">Balint</token> <token id="w2" pos="NN">syndrom</token> <token id="w3" pos="VBZ" lemma="be">is</token> <token id="w4" pos="DT" lemma="a">a</token> <token id="w5" pos="NN" lemma="combination">combination</token> <token id="w6" pos="IN" lemma="of">of</token> <token id="w7" pos="NNS" lemma="symptom">symptoms</token> ... <token id="w20" pos="JJ" lemma="spatial">spatial</token> <token id="w21" pos="NN" lemma="perception">perception</token> ... <chunks> <chunk id="c1" from="w1" to="w2" type="NP"/> <chunk id="c7" from="w20" to="w23" type="NP"/> </chunks>
Semantic Annotation (MuchMore) Balint syndrom is a combination of symptoms including simultanagnosia, a disorder of spatial and object-based attention, disturbed spatial perception and representation, and optic ataxia resulting from bilateral parieto-occipital lesions. <umlsterm id="t7" from="w20" to="w21"> <concept id="t7.1" cui="C0037744" preferred="Space Perception" tui="T041"> <msh code="F2.463.593.778"/> <msh code="F2.463.593.932.869"/> </concept> </umlsterm> <umlsterm id="t8" from="w26" to="w26"> <concept id="t8.1" cui="C0029144" preferred="Optics" tui="T090"> <msh code="H1.671.606"/> </concept> </umlsterm> <semrel id="r7" term1="t7.1" term2="t8.1" reltype="issue_in"/>
Information Extraction (MUMIS) Ein Freistoss von Christian Ziege aus 25 Metern geht ueber das Tor. (A 25-meter free-kick by Christian Ziege goes over the goal.) <events> <event id="e1" clause=”cl1” event-name=”free-kick”> <relations> <relation id="r1" reltype="player” player-name=”Christian Ziege” in-team=”Germany”/> <relation id="r2" reltype="location” loc-from=”25-meter”/> <relation id="r3" reltype="time” time-val=”07:00”/> </relations> </event> </events>
OntoWeb SIG5 Some History • Proposal for SIG5 at OntoWeb2 Meeting (Dec. 2001) • SIG5 Launch at OntoWeb3 Meeting (June 2002) • SIG5 Portal Launch in August 2002 SIG5 Portal Development • Repository of Best Practice, Resources, Tools • Platform for Cooperation in R&D • Contact Point for “Semantic Technology” Industry SIG5 Network Activities • Dissemination (Talks, Workshops, Summerschools) • Concertation (SIGs, Standardization Bodies)
Future SIG5 Network Activities Meetings, Workshops, Summerschools • Networks Meeting OntoWeb/AgentLink, Barcelona, Spain • Endorsement of the 3rd XMLNLP Workshop, Budapest, Hungary • OntoWeb Summerschool, Madrid, Spain • 5th EuroLan Summerschool (Topic: Language Technology and the Semantic Web), Rumania • SIG5 Workshop (@ Major Conference in LT, AI, and/or KM) SIGs and Standardization Bodies • ACL: SIGSEM, SIGLEX • ISO/TC37/SC4 • Global WordNet Association Portal: Knowledge Markup with LT-World Ontology
SIG5 Input/Output Input • Textual Documents - Text, Transcripts, HTML/XML with Metadata (also in Multimedia Context),… • Terminology, Thesauri, Ontologies – linguistic (e.g. WordNet), Domain-Specific • Markup Formats – XMLS, RDFS, OWL Output • Enriched Documents through Linguistic/Semantic Annotation (Semi-Structured Data) > Knowledge Markup – Ontology Based Annotation > Text/Data Mining, Classification Feature Extraction