950 likes | 1.08k Views
Evolution of OWL 2 QL and EL Ontologies . Bernardo Cuenca Grau , Ernesto Jiménez-Ruiz Computer Science Department, University of Oxford, UK Evgeny Kharlamov , Dmitriy Zheleznyakov KRDB research centre , Free University of Bozen -Bolzano, Italy. Outline. Ontologies and evolution
E N D
Evolution of OWL 2 QL and EL Ontologies Bernardo Cuenca Grau, Ernesto Jiménez-RuizComputer Science Department, University of Oxford, UK Evgeny Kharlamov, DmitriyZheleznyakovKRDB research centre, Free University of Bozen-Bolzano, Italy
Outline • Ontologies and evolution • Domain ontologies • Web knowledge bases • Semantic markup • Logic-based approaches • Model-Based approaches • Formula-Based approaches • Syntactic-deductive approach • Experiments • Conclusion and directions
Ontologies: schema + data • Schema provide • standard vocabularies for data • a way to structure data • means for machines to be able to understand data • Schemas are in terms of • classes: Person, Country, ... • (binary) properties: State-of-Origin, Subclass-of, ... • Data is a collections of facts • Instantiations of classes • Instantiations of properties
Domain ontologies • Goal: to provide standard vocabularies to communities • Clinical sciencesontologies: • SNOMED CT: Systematized Nomenclature of Medicine - Clinical Terms • > 311k concepts • NCIt: National CancerInstitute thesaurus • ~ 89k concepts, 200m cross links between them [NCI] • FMA: Foundational Model of Anatomy • 75k classes, 168 relations, 120k terms, 3.1m relat. inst.
Languages for domain ontologies • Domain ontologies are • complex and large • manually created • should be error free • Languages that are natural for domain ontologies • flexible to capture complex interaction • logic-based (e.g., based on Description Logics) • Ontology Web Language: OWL 2 • OWL DL • OWL 2 QL • OWL 2 EL • e.g. SMOMED forall x: instance-of (x, Common cold) exists y: instance-of (y, Virus) and causative-agent (y, x)
Evolution in SNOMED • Development teams • 1 main team and • 4 geographically distributed teams • each team makes modifications • Every 2 weeks the main team • integrates changes, resolve conflicts • From 2002 to 08 SNOMED went from 278k to 311k concepts [SM-1] • Example of modifications: • In Jan. 2006 a number of concepts from the “Clinical finding” hierarchywere moved to the “Event hierarchy”[SM-2]
Evolution in NCI and FMA • Developers of NCI do over 900 monthly changes [HKR’08] • 20 full time editors for NCI • they work • independently • on a separate copy of the ontology • There is one curator for NCI • every 2 weeks curator • reviews changes using a workflow management tool • approves the changes • they merge results once a month • there is one curator who curates once a month • FMA “is an evolving computer-based knowledge source ...” [FMA]
Evolution of domain ontologies • Evolution of domain ontologies is common • Ontologies are changed by • insertion of axioms • deletion of axioms • Evolution affectsboth • schema level • data level Evolution of domain ontologies should be error free
Design errors: incoherency • incoherencyis a schema level design error: • incoherent concept = empty concepts • can be caused by disjointnessand cardinality restrictions • incoherent role = empty role • can be caused by disjointness and cardinality restrictions EquivalentClasses( :Nothing ObjectIntersectionOf( :Airplane :Boat)) SubClassOf( :Amphibian :Airplane) SubClassOf( :Amphibian :Boat )
Design errors: inconsistency • Inconsistencyis an error that involves both • data level and • schema level • Inconsistency: • disjoint concepts are Instantiated • functionality is violated • number restrictions are not respected EquivalentClasses( :Nothing ObjectIntersectionOf( :Airplane :Boat )) ClassAssertion(:Airplane :BerievA-40 ) ClassAssertion(:Boat :BerievA-40 )
Insertions bring errors • Insertions introduce errors which should be repaired • Incoherency • Inconsistency • Challenge: how to repair the ontology after “bad” insertions? SubClassOf( :Amphibian :Boat ) ClassAssertion(:Boat :BerievA-40 ) EquivalentClasses( :Nothing ObjectIntersectionOf( :Airplane :Boat )) SubClassOf( :Amphibian :Airplane) EquivalentClasses( :Nothing ObjectIntersectionOf( :Airplane :Boat )) ClassAssertion(:Airplane :BerievA-40 )
Deletions bring headache • Deletions do not introduce (design) errors • no inconsistency • no incoherency • Contraction can provoke • restoring of implicit data • deletion of implicitly related data SubClassOf( :Airplane :Transport ) ClassAssertion( :Transport :BerievA-40 ) SubClassOf( :Airplane :Transport ) ClassAssertion( :Airplane :BerievA-40)
Deletions bring headache • Deletions do not introduce (design) errors • no inconsistency • no incoherency • Contraction can provoke • restoring of implicit data • deletion of implicitly related data SubClassOf( :Airplane :Transport ) ClassAssertion( :Airplane :BerievA-40 ) ClassAssertion( :Transport :BerievA-40 ) SubClassOf( :Airplane :Transport ) • Challenge: how to respect implicit relations while deleting knowledge?
SPARQL 1.1 Update • Proposed by HP and based on SPARUL extension of SPARQL for • adding • deleting • updating RDF triples • Deletion without deletion effect • only explicit occurrences of triples are deleted • there is no validationwhether the tupleis still there implicitly SubClassOf( :Airplane :Transport ) ClassAssertion( :Airplane :BerievA-40 ) ClassAssertion( :Transport :BerievA-40 ) SubClassOf( :Airplane :Transport ) ClassAssertion( :Airplane :BerievA-40 )
Syntactic approaches to evolution • In the ontology: • “Children are baklava fans” • “Children are not cats” • To delete: “Children are baklava fans” • To this end it is enough to delete [HS’05] [JRCGHB’11] [KPSCG’06] and • In the resulted ontology: • “Children are not baklava fans” • “Children are not cats” is lost OK Not desirable
Semantic approaches to evolution • How to restore knowledge which • was semantically deleted and • is desirable • One has to find semantic difference between • the original and • the obtained ontology • There is a number of approaches and tools to find semantic difference • CollaborativeProtege • DOGMA-MESS • Content CVS approach • .... [FDCM’08] [MDM’06] [JRCGHB’11]
Limitations of current sem. approaches • Quite application and language oriented • Heuristic based • What is missing: the big picture • a general understandingof evolution of logic based ontologies • proper theory that explains relationships among • different types of ontology modifications • different ontology languages • feasibility and complexity of evolution computation • There are several attempts to understand logic based evolution • We are working on that too 2nd part of this tutorial is about current achievements in this direction!
Summary on domain ontologies • Domain ontologies are • large • logic based • Changes in domain ontologies • are frequent • are about insertion and deletions • Insertions easily introduce errors • incoherency • inconsistency • Deletions • do not introduce (logical) errors • not trivial: implicit knowledge relationships should be traced
Outline • Ontologies and evolution • Domain ontologies • Web knowledge bases • Semantic markup • Logic-based approaches • Model-Based approaches • Formula-Based approaches • Syntactic-deductive approach • Experiments • Conclusion and directions
Web knowledge bases (ontologies) • Goal: gathering general purpose knowledge from the Web • DBpedia: • structural counterpart of Wikipedia • 320 classes, 1.650 different properties, 19m facts • Yago: • combines Wikipedia and WordNet, GeoNames, • 10m entities, 120m facts about them • (Open)Cyc: • started in 1984, formalizing knowledge manually • logic based KB with reasoing • 47.000 concepts, 306.000 facts • These ontologies are not static • they constantly change, since Wiki does so • Yagocrawls Wikipedia every couple of weeks ...
Languages for Web KBs • Web KBs • have rather simple and small schemas • should be error free • errors are rare • Languages that are natural for domain ontologies • able to describe basic things • SubClassOf, Domain, Range, etc. • These languages are: • Resource Description Framework with Schema: RDF and RDFS • a bit of OWL 2: owl:equivalentClass • Some rule languages: OWL 2 RL • Evolution is performed ad hoc • Each KB has its approach
Evolution in DBpedia • DBpedia • 18 functional properties • new information is obtained from Wikipedia • new data can violate functional properties • Inconsistency is possible FunctionalObjectProperty( :netIncome) FunctionalObjectProperty( :co2Emission) FunctionalObjectProperty( :height) ...
Evolution in Yago • Yago is a clean (inconsistency fee) ontology • 95%of accuracy - manually validated on 6k facts • New knowledge should not cause contradictions
Yago consistency check [Yago-1] • Yago has rules to check consistency • check uniqueness of entities and functional arguments • domains and rages of relations • type checking subclassOf Singer subclassOf Guitar Guitarist Rock Singer type born born Physics 1935
Summary on Web KBs • Web KBs aim at consistency • Schemas of Web KBs are rather simple and small • it is hard to make errors • Evolution is performed ad hoc
Outline • Ontologies and evolution • Domain ontologies • Web knowledge bases • Semantic markup • Logic-based approaches • Model-Based approaches • Formula-Based approaches • Syntactic-deductive approach • Experiments • Conclusion and directions
Ontologies for semantic markup • Goal: • to nest semantics within existing content on web pages • to help search engines, crawlers and browsersfind the right data • Person: • name • photo • URL • ... text embedding semantic annotations
Standards for semantic markup • Microformats, since 2003 • Small set of fixed formats. E.g.: • hcard: people, companies, organizations, and places • XFN : relationships between people • hCalendar: calendaring and events • RDFa: Resource Description Framework – in – attributes • since in 2004, W3C recommendation • serializationformatforembedding RDF data into HTML pages • canbeusedtogetherwithanyvocabulary, e.g. FOAF • Microdata • alternativetechniquesforembeddingstrucuted data • proposed in 2009, comeswith HTML 5
Is semantic markup popular? [CB’12] • Yahoo Crawl of 2011 • 12 billion pages were crawled • 431 million of then contain RDFa in 2011 - 3.5%of the HTML pages had structured (meta) data
Big step in promoting ontologies • Schema.orginitiative: • started on June 2011 • initiated by Bing, Google, Yahoo!, Yandex • they propose: to mark up / annotate websites with metadata • they support: Microdata
Schema.org ontologies • Metadata by Schema.org: • Person • Organization • Event • Place • Product • ... • 200+ types
Semantic markup today • Common Crawl foundation • goal: building and maintaining an open crawl of the Web • current data is about 5 billion web pages • WebDataCommons.org project • goal: extracting Microformats, Microdata,RDFa from Common Crawl corpus • Feb 2012: • processed 1.4 billion HTML pages of CC corpus • 20.9 Terabyte of compressed data • this is a big fraction of the Web
Structured Web data is fast growing • 1.4 billion HTML pages processes • 188 millions of them contain structural datain Microformat, Microdata, RDFa [CB’12] • This data is 3.2 billions RDF triples 13% of the HTML pages contain structured (meta) data from 2011 to 2012 the fraction of structured data went from 3.5% to 13%
Evolution at schema level: Schema.org • It is a very simple and coherent schema • Coherency • basic Schema.org vocabulary can be mapped to RDFS • RDFS schemas are always coherent so does Schema.org • What is used from RDFS: [SO-2] • subclass • domain, range restriction of properties • literal, • ... • Schema can be extended • mechanism: specialization • of classes, properties, enums • Person/Engineer [SO-3] PloiceStation A police station. Subclass of: CivicStructure Subclass of: EmergencyService creator The creator/author of this Creative Work Domain: CreativeWork Domain: UserComments Range: Person Range: Organization
Evolution at data level: Schema.org • It is RDFS embeddable no inconsistency is possible • Schema.orgconvention: on range restriction [SO-1] • each property may have 1 or more types at its range • the value(s) of the property should be instances of at least one of these types • Thus, they accept that data can be inconsistent
Evolution at data level: Schema.org • Is data inconsistency important? • Data gathered by crawling the Web is inconsistent by nature • data consistency is not important • data consistency is unrealistic • Data maintained locally can be consistent • consistency of data can be important [SO-1] In the spirit of "some data is better than none", we will accept this [inconsistent] markup and do the best we can.
Summary on semantic markup • Semantic mark up schemas are • small • very simple • In many cases logical errors with semantic markup are simply impossible • Consistency and coherency is in general not important
Outline • Ontologies and evolution • Domain ontologies • Web knowledge bases • Semantic markup • Logic-based approaches • Model-Based approaches • Formula-Based approaches • Syntactic-deductive approach • Experiments • Conclusion and directions
Summary: ontologies and evolution • Three major groups of ontologies • unification of terminology by specific communities • domain ontologies • storing general purpose web content in • Web knowledge bases • enriching Web content with information understandable by agents, e.g. crawlers – 13% of Web data is enriched! • ontologies for semantic markup • In all these cases ontologies are dynamic • insertions and • deletions happen at the level of • schema and • data
Summary: attitude to evolution • schema is simple (RDFS): errors are (almost) impossible • data may disrespect the schema • “some data is better than none” • “do the best we can” • schema is complex (OWL 2) – incoherency • data can easily be inconsistent • coherency + consistency are vital • logical reasoning can guarantee it • schema is more involved but still no incoherency(RDFS + some OWL e.g., functionality) • data may be inconsistent • conflicts can be detected by simple reasoning • many problems are solved by type checking don’tcare logic based Web knowledge bases • ontologies for semantic markup domain ontologies
Outline • Ontologies and evolution • Domain ontologies • Web knowledge bases • Semantic markup • Logic-based approaches • Model-Based approaches • Formula-Based approaches • Syntactic-deductive approach • Experiments • Conclusion and directions
Logic-Based Evolution • The main principle of logic-based evolution isthe principal of minimal change • Ontologies should change as little as possible • There are two main classes of logic-based approaches: • Model-based approach (MBA) • Formula-based approach (FBA) • There are two main types of evolution: • Update (or revision), when new information is added • Contraction (or erasure), when some old information is retracted • We illustrate • update with MBA • contraction with FBA [Wins’90] [Wins’90] [WWT’10] [KZ’11] [QD’09] [LLMW’06] [DGLPR’09] [CKNZ’10] [EG’92] [KM’91]
Outline • Ontologies and evolution • Domain ontologies • Web knowledge bases • Semantic markup • Logic-based approaches • Model-Based approaches • Formula-Based approaches • Syntactic-deductive approach • Experiments • Conclusion and directions
MBA: Evolution Process • Ontology • Models • ModelTransformer • Newdata • Evolvedontology • Evolvedmodels
MBA: Ontology to Models Model 1: Model 2: …
MBA: Evolution Process • Ontology • Models • ModelTransformer • Newdata • Evolvedontology • Evolvedmodels
MBA: Data Evolution Model 1: Model 2: • Winslett’s operator • Dalal’s operator … • Satoh’s operator • …
MBA: Data Evolution Model 1: ✔ Model 2.1: Model 2: ✔ ✘ Model 2.2: ✔ • Winslett’s operator
MBA: Evolution Process • Ontology • Models • ModelTransformer • Newdata • Evolvedontology • Evolvedmodels