1 / 30

XML on Semantic Web

XML on Semantic Web. Outline. The Semantic Web Ontology XML Probabilistic DTD References. The Semantic Web (1/4). The first generation Web The second generation Web : current Web The third generation Web : Semantic Web

karik
Download Presentation

XML on Semantic Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML on Semantic Web

  2. Outline • The Semantic Web • Ontology • XML • Probabilistic DTD • References

  3. The Semantic Web (1/4) • The first generation Web • The second generation Web:current Web • The third generation Web:Semantic Web • The conceptual structuring of the Web in an explicit machine-readable way • Requirements:Universal expressive power、Support for syntactic Interoperability、Support for Semantic Interoperability

  4. The Semantic Web (2/4) • Syntactic interoperability talks about parsing the data, and semantic interoperability means to define mappings between unknown terms and known terms in the data • Semantic interoperability:requires standards syntactic form of document and semantic content • A further representation and inference layer is needed on top of the currently available layers of the WWW:Ontology

  5. The Semantic Web (3/4)

  6. The Semantic Web (4/4)

  7. Ontology (1/5) • An explicit machine-readable specification of a shared conceptualization • Crucial role:representation of a shared conceptualization of a particular domain • reusable • find pages that contain syntactically different but semantically similar words • Construct:concepts (which are usually organized by taxonomies), relations, functions, axioms, instances

  8. Ontology (2/5)

  9. Ontology (3/5) • Concepts: • Be anything about which something is said • Also known as classes (XOL, RDF(s), OIL, DAML+OIL), objects (OML), categories (SHOE) • Taxonomies: • used to organize ontological knowledge using generalization and specialization relationships through which simple and multiple inheritance could be applied

  10. Ontology (4/5) • Relations and functions: • An interaction between concepts of the domain and attributes • Be called relations in SHOE、OML, roles in OIL • Functions are a special kind of relation • Axioms: • Constraining information, verifying correctness, deducting new information • Also known as assertions (OML), rule, logic

  11. Ontology (5/5) • Instances: • Represent elements in the domain attached to a specific concept • Measurement of the expressiveness: • XOL, RDF(s), SHOE, OML, OIL, DAML+OIL

  12. XML (1/7) • As a serialization syntax for other markup language, ex:SMIL、XOL、SHOE • As semantic markup of Web-pages • As a uniform data-exchange format

  13. XML (2/7) • Universal expressive power:anything can be encoded in XML if a grammar can be defined for it • Syntactic interoperability:XML parser can parse any XML data and is usually a reusable component • Semantic interoperability:there is no way of recognizing a semantic unit from a particular domain of interest (not yet widely recognized)

  14. XML (3/7)

  15. XML (4/7) • Data exchange: • Build a model of the domain of interest • From the domain model a DTD or an XMLs is constructed • Advantage:reusability of the parsing software components • There exists multiple possibilities to encode a given domain model into a DTD, so the direct connection from the DTD to the domain model is lost and it cannot be easily reconstructed

  16. XML (5/7)

  17. XML (6/7) • A direct mapping based on the different DTDs is not possible • So we have to define the mappings between the different domain models, then between the different DTDs: • Reengineering of the original Domain Model from the DTD or XML Schema • Establishing mappings between the entities in the domain model • Defining translation procedures for XML Documents • Using a more suitable formalism than pure XML can save much of the additional effort

  18. XML (7/7)

  19. Probabilistic DTD(1/11) • Describes the most likely orderings of XML tags and that contains statistical properties for each tag • Utilize association rule discovery algorithm and sequence mining techniques

  20. Probabilistic DTD (2/11) • Objectives:tagging all text documents and deriving an appropriate preliminary flat XML DTD • A knowledge discovery in textual databases (KDT) process to build clusters of semantically similar text units and then new documents can be converted into XML documents

  21. Probabilistic DTD (3/11) • UML schema:are initially conceived by experts serves as a reference for the DTD, but there is no guarantee that the final DTD will be contained in or contain this schema • KDT process: • Tagging initial text documents • Domain knowledge constitutes such as thesaurus、preliminary UML schema, input to process • Pre-processing • Iterative clustering • Post-processing • Establishing a probabilistic DTD

  22. Probabilistic DTD (4/11)

  23. Probabilistic DTD (5/11) • Pre-processing: • Setting the level of granularity • NLP processing such as tokenization、normalization、word stemming • Building text unit descriptors—a reduced feature space(now are chosen by engineer) • Mapping all text units into Boolean vectors of this feature space • Extract named entity

  24. Probabilistic DTD (6/11) • Clustering: • Performed in multiple iterations, each iteration outputs a set of clusters • All text unit vectors are clustered • Partition clusters into “acceptable” and “unacceptable” according to quality criteria • Members of “unacceptable” are input data to the next iteration

  25. Probabilistic DTD (7/11) • Post-processing: • “acceptable” clusters are semi-automatically assigned a label • Ultimately, cluster labels are determined by the engineer • All default cluster labels are derived from text unit descriptors • Automatically derived XML DTD from XML tags

  26. Probabilistic DTD (8/11)

  27. Probabilistic DTD (9/11) • Establishing a probabilistic DTD: • Deriving the most likely ordering of the tags • Computing the statistically properties of each tag inside the document type definition • Deriving the ordering of the tags • Backward Construction of DTD Sequences:builds “maximal” sequences • Forward sequence construction

  28. Probabilistic DTD (10/11) • Backward Construction of DTD Sequences • Starts with an arbitrary tag ﺡand then identifies the tag most likely to appear before it • If no such tag exists, then shifts to the next sequence. If there is one, then the next iteration starts. If there are k tags, then duplicates k incomplete sequences. • Each tag Xi leading to ﺡ with a confidence Ci • If there is a Ci larger than the others, then Xi is the predecessor of ﺡ in the sequence • If C0 where is the confidence where ﺡ has no predecessor is largest, then ﺡ is the first element • Confidence is the tag’s TagSupport multiplied by the accuracy

  29. Probabilistic DTD (11/11)

  30. References • The Semantic Web—on the respective Roles of XML and RDF • Stefan Decker, Frank van Harmelen, Jeen Broekstra, Michael Erdmann, Dieter Fensel, Ian Horrocks, Michel Klein, Sergey Melnik • Intelligent Information Agent with Ontology on the Semantic Web • Weihua Li • Ontology Languages for the Semantic Web • Asuncion Gomez-Perez, Oscar Corcho • Extraction of Semantic XML DTDs from Texts Using Data Mining Techniques • Karsten Winkler, Myra Spiliopoulou

More Related