790 likes | 929 Views
Semantic web role and its method: Domain ontology. Rung Ching Chen ( 陳榮靜 ). 黃鶴樓 崔灝. 昔人已乘黃鶴去,此地空餘黃鶴樓。 黃鶴一去不復返,白雲千載空悠悠。 晴川歷歷漢陽樹,芳草萋萋鸚鵡洲。 日暮鄉關何處是,煙波江上使人愁。. 黃鶴樓送孟浩然之廣陵 李白. 眼前有景道不得,崔顥題詩在上頭 故人西辭黃鶴樓,煙花三月下揚州。 孤帆遠影碧空盡,惟見長江天際流。. Outline. Introduction Literature reviews Ontology application
E N D
Semantic web role and its method: Domain ontology Rung Ching Chen (陳榮靜)
黃鶴樓 崔灝 • 昔人已乘黃鶴去,此地空餘黃鶴樓。黃鶴一去不復返,白雲千載空悠悠。晴川歷歷漢陽樹,芳草萋萋鸚鵡洲。日暮鄉關何處是,煙波江上使人愁。
黃鶴樓送孟浩然之廣陵李白 • 眼前有景道不得,崔顥題詩在上頭 • 故人西辭黃鶴樓,煙花三月下揚州。孤帆遠影碧空盡,惟見長江天際流。
Outline • Introduction • Literature reviews • Ontology application • Ontology construction • Experimental results • Conclusions and current research
Introduction • Introduction • Background • Motivation • Objective • Literature reviews • Ontology construction • Experimental results • Conclusions and future works
Background (1/2) • The content of web sites changes rapidly and grows very fast • How to understand querist’s needs and how to find related web pages from the Internet are very important. • Yahoo vs. Google
Background (2/2) • The main drawback of current search engines is that they can’t read the real semantic of the web page content. They don’t use the domain specific knowledge for web page analyses. • The concept of Semantic Web has been proposed recently.
Motivation • Semantic web and ontology • The construction of successful semantic web depends on whether the ontology can be constructed rapidly and easily. • Most of the research on ontology construction is determined by domain experts. It is difficult to modify the concepts of an existed domain ontology for a semantic web.
Objective • A large number of ontology representation methods have been proposed. • we use the hierarchical tree structure to represent the domain ontology because it is the most general one . • Methods of construct ontology • Manual construction • Semi-automatic construction • full-automatic construction
Literature reviews • Introduction • Literature reviews • Semantic web • Ontology • Information classification model • Single value decomposition • Adaptive resonance theory network • Ontology construction • Experimental results • Conclusions and future works
Semantic web (1/2) • Drawbacks of existing network • The information is presented in documents. • It is unable to process or extract the information that people actually need. • Semantic web is an extension of the existing network structure • Provide a new foundations of data description. • Promotional development network service automatically. • Make the information understandable to machines.
Semantic web (2/2) • Builds the high-level languages on low-level languages progressively. • Offers the information that the computer can read without revising the existing webpage content.
Ontology (1/4) • The W3C has defined ontology as knowledge for describing and expressing various domains using concepts, definitions, and relations. • Ontology usually appears in the form of semantic web. • A node represents a concept or an individual entity on the semantic web.
Ontology (2/4) • Gruber definition “An ontology is a formal, explicit specification of a shared conceptualization” • Conceptualization: a certain existing phenomenon or the relevant abstract model of concept of the definite phenomenon in the field. • Share: ontology is shared by a group, not an individual. • Formal: ontology can be read and understood by computers. • Explicit: the concept form and restriction of ontology can be expressed in clear way.
Ontology (3/4) • Gruber thought the elements of ontology include: • Concept: Concept can be used to represent any thing in the real world. It is usually organized as a tree structure in ontology. • Relation: Relation means the connection between concepts of the certain types. • Function: Function is a special case for Relation. • Axiom: The axiom is used to model the fact. • Instance: The instance is the appearance of concretized concept.
Ontology (4/4) • Ontology language is extended from the XML (Extensible Markup Language) syntax. • It is responsible for W3C to formulate and renew.
Domain Ontology Applications Grigoris Antoniou Frank van Harmelen
Horizontal Information Products at Elsevier • Data Integration at Audi • Skill Finding at Swiss Life • Think Tank Portal at EnerSearch • E-Learning • Web Services • Other Scenarios
Elsevier – The Setting • Elsevier is a leading scientific publisher. • Its products are organized mainly along traditional lines: • Subscriptions to journals • Online availability of these journals has until now not really changed the organisation of the productline • Customers of Elsevier can take subscriptions to online content
Elsevier – The Problem • Traditional journals are vertical products • Division into separate sciences covered by distinct journals is no longer satisfactory • Customers of Elsevier are interested in covering certain topic areas that spread across the traditional disciplines/journals • The demand is rather for horizontal products
Elsevier – The Problem (2) • Currently, it is difficult for large publishers to offer such horizontal products • Barriers of physical and syntactic heterogeneity can be solved (with XML) • The semantic problem remains unsolved • We need a way to search the journals on a coherent set of concepts against which all of these journals are indexed
Elsevier – The Contribution of Semantic Web Technology • Ontologies and thesauri (very lightweight ontologies) have proved to be a key technology for effective information access • They help to overcome some of the problems of free-text search • They relate and group relevant terms in a specific domain • They provide a controlled vocabulary for indexing information
Elsevier – The Contribution of Semantic Web Technology (2) • A number of thesauri have been developed in different domains of expertise • Medical information: MeSH and Elsevier’s life science thesaurus EMTREE • RDF is used as an interoperability format between heterogeneous data sources • EMTREE is itself represented in RDF
Elsevier – The Contribution of Semantic Web Technology (3) • Each of the separate data sources is mapped onto this unifying ontology • The ontology is then used as the single point of entry for all of these data sources
Information classification model • There are three traditional information classification models: • Vector space model • Probabilistic model • Boolean model
Vector space and probabilistic model • Vector space model: • The element represents the number of keywords that appear in a document. The cosine similarity method is used to find the related web pages. • Probabilistic model: • This model uses a probabilistic approach to evaluate the relationships among web pages and to judge whether they are related.
Concept A Concept A Concept B Concept A Concept B inheritance intersection independence Concept B Boolean model • It is the simplest categorized method, which is based on set theory and Boolean algebra. Boolean model can be divided into three relations: inheritance, intersection and independence
keywords documents M × N Single Value Decomposition (1/2) • Row represents documents and column indicates keywords. • Whether a keywords appears in a document is represented as an element.
k t k k = * * Single Value Decomposition (2/2) • Latent Semantic Analysis, LSA project document and keywords to a low dimension. • Using Singular Value Decomposition, SVD to remove unnecessary information.
Adaptive resonance theory network (1/3) • ART network is an unsupervised learning network • Principle: • The theory of ART grew from the theory of cognition. • It is similar to a human neural system. Not only does it learn new examples, but also preserves old memories.
Adaptive resonance theory network (2/3) • Characteristic: • It has the features of both stability and plasticity. • In order to resolve the antinomy of stability and plasticity, the ART network adjusts the vigilance value. • Advantage: • The learning speed is quick. • The consumption memory space is small. • Does not have beforehand to establish the group number.
Output vector Output layer Connection layer Input layer Input vector Adaptive resonance theory network (3/3) • The structure of the ART network: • Input layer: The input data is training samples. • Output layer:This presents the results of the trained network. • Weight connections: This connects the input layer and the output layer
Ontology construction • Introduction • Literature reviews • Ontology construction • Analyzing web pages • Finding the TF-IDF values of terms • Reducing the matrix and transfer elements to duality data • Using a recursive ART network to cluster the web pages • Applying a Boolean model to construct an ontology • Representing the ontology using a Jena package • Experimental results • Conclusions and future works
Document Ontology construction WWW Use TF-IDF to find the concept of each group Boolean method Constructrelation Web pages analysis Whether satisfied low document Create ontology Stop-word Produce RDF ontology Finding TF-IDF ART networkfor cluster SVD operation
Analyzing web pages (1/2) • After collect web page, the system removes stop words. • Stop words can avoid wrong judgment when there are some non-important words but appear the frequency to be high.
Analyzing web pages (2/2) • Most web pages are written in HTML. HTML uses open/closed tags to indicate web page commands. • Tij = nij × Wm • Tij: expressed concept Cj appears in web page di weight. • nij: expressed concept Cj the frequency which appears under the different tag. • Wm: expressed the weight of tag.
TF-IDF • Our research uses the product of TF and IDF to represent the importance of a keyword in the document. • TFi,j’:it is the term relative to the frequency of keyword i in a document j after weight operation. • IDFi: it is the inverse document frequency of term i, that is the reciprocal of appear frequency of term i in all document. • N: is the number of all documents • ni: is the number of appearances of term i in the number of documents N.
Reducing the matrix and transfer elements to duality data • We list out the keyword and webpage documents to make a duality matrix. • If the keywords appear in the documents, the keyword is set to 1; if not, it is set to 0. The SVD operation is used to reduce the large matrix to a small one
100 Doc. ART 50 Doc. 25 Doc. 25 Doc. 20 Doc. 10 Doc. 5 Doc. 15 Doc. 30 Doc. 20 Doc. ART ART ART Using the recursive ART network to cluster the web pages • We propose a recursive ART network algorithm to produce a tree structure
1 Recursive ART
1 Recursive ART
Applying Boolean operation • The Boolean model is used to modulate and construct the relation between different concepts. • For example, imagine ten documents involving four types of concepts: Transports, flying, boats, and airplanes. • Documents containing “transports”: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. • Documents containing “fly”: 2, 3, 6, 7, 9, 10. • Documents containing “boat”: 1, 4, 5, 8. • Documents containing “airplane”: 6, 7, 10.
Generating ontology through the Jena package (1/3) • A Resource description framework (RDF) is a framework developed by W3C and metadata groups. • It is able to carry several metadata while roaming on the Internet. • RDF provides interoperability between applications that exchange machine-understandable information on the web
Predicate Subject Object Generating ontology through the Jena package (2/3) • Describe Web resource data • Resource:anything that have URI • Description:describe property of resource • Three main elements • Subject • Predicate • object
author http://www.cyut.edu.tw/~s9214639/ John Generating ontology through the Jena package (3/3) • A given problem may be represented by a meaning graph of the RDF • where the URI is a web resource and author is a property with the value “John
Experimental results • Experiment environment • Pentium-4 2.4G • 512MB RAM • JAVA program language • RDF ontology language