160 likes | 372 Views
인공지능 특강 프로젝트 - Development of Decision Tree Algorithm for Semantic Web data -. 2010313148 전동규. Agenda. Project Purpose Motivation Related Work Algorithm Description of Problem. 1. Project Purpose. Relational Data - Semantic Web Data (Linked data) -. Semantic Decision Tree.
E N D
인공지능 특강 프로젝트 - Development of Decision Tree Algorithm for Semantic Web data - 2010313148 전동규
Agenda Project Purpose Motivation Related Work Algorithm Description of Problem
1. Project Purpose Relational Data - Semantic Web Data (Linked data) - Semantic Decision Tree Decision Tree Algorithm - C4.5 Algorithm -
1. Project Purpose • Goal of project • Development of new kind of Decision Tree algorithm which supports decision making based on Semantic Web environmental information • Solve the several problems which is already solved by other related researches • Data : Linked Data(Semantic Web ontology)
1. Project Purpose • Ontology • Class : Definition of Set • Property : Relations between instances • Instance : Individuals which are belonged in classes String Int Boolean String Class name Rich Range type age name Location Workplace Datatype Property String working_at Object Property Hospital Person School hasParent Person Doctor hasChild Teacher Student Person Schema of Example Ontology
2. Motivation What is Semantic Web ? • Definition of Semantic Web • The Semantic Web is an evolving development of the World Wide Web in which the meaning (semantics) of information and services on the web is defined, making it possible for the web to understand and satisfy the requests of people and machines to use the web content • Semantic Web is based on ontologies which corresponds to Semantic Web data • Linked Data • The term Linked Data is used to describe a method of exposing, sharing, and connecting data on the Web [1] Berners-Lee, T. (2001), “ The Semantic Web,”Scientific American, Vol. 501.
2. Motivation Development status of Semantic Web • Increase of Semantic Web Data • Appearance of Semantic Web Document Search Engines • Falcons: Twenty millions overRDF/XML Documents • Swoogle : Three millions overSemantic Web Documents • Open data in Semantic Web • LINKINGOPENDATA : • The goal of the W3C SWEO Linking Open Data community project is to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting RDF links between data items from different data sources • The data sets consist of over 4.7 billion RDF triples, which are interlinked by around 142 million RDF links (May 2009).
2. Motivation Development status of Semantic Web • Increase of Semantic Web Data • Semantic Web based Portal Site • Twine : • Twine is a Semantic Web Portal that making networks of information based on user’s posts which consist of their own information and favorite things. • Every information composing Twine is written in RDF and OWL format. • Twine have millions of visitors in a month, and they have over millions of relationships between 3 millions of semantic tags (March 2009) • The necessity of mining useful knowledge from huge size ontology is highly expected. Therefore, Data Mining methodology for Semantic Web should be ready for this necessity.
2. Motivation Decision Tree in Semantic Web • Traditional Decision Tree algorithm is impossible to apply in Semantic Web • Semantic Web based Ontology has special characteristics for mining • Since Semantic Web document has network structure, multi-value issue is occurred • Traditional Decision Tree just uses value of variables. Therefore, additional information of Semantic Web are can not be applied • Converting Semantic Web data into single table style that used to use in traditional decision tree algorithm is impossible
3. Related Work • Arno J.Knobbe[2]developed decision tree algorithm for Multi-relational database • Selection Graphis suggested to do decision tree on RDB • Selection Graph is composed of Node, Edge, and condition and it can be expressed in SQL syntax This research suggested partial solution about multi-value issue which also happened in Semantic Web ontology. However, this methodology can not be applied to Semantic Web which contains a lot of information than RDB • David Jensen[3] suggested methodology that converting social network data to single table data which can be applied to Traditional Decision Tree algorithm • ‘QGraph' that kind of query language to get the local network from entire social network is suggested • QGraph is composed of Node, Edge, and condition and it can query many objects at once Since ontology information are manually converted to single table form, missing information will be occurred a lot [2] Arno J. Knobbe., Arno Siebes., Danil Van Der Wallen., Syllogic B. V. (1999). “Multi-Relational Decision Tree Induction,”In Proceedings of PKDD’99, [3] D. Jensen., and J. Neville.(to appear) (2002). “Data mining in networks,”Papers of the Symposium on Dynamic Social Network Modeling and Analysis.
4. Algorithm • Search procedure of algorithm follows C4.5 algorithm • New methodologies are required to learn concepts in ontology • ‘Constructor’ can be used as similar as attributes in traditional Decision tree • Related works used the terms ‘Refinement’ as an attribute in Decision Tree
4. Algorithm Refinements • What is a Refinement? • Refinement is a condition for split branches in decision tree. In this algorithm, property and class from ontology are used as a refinement. • When define a refinement, Role Constructors from Description Logic are applied to make the best use of information in Semantic Web • Type of Refinements • Concept Constructor Refinement : Applying type information of instances • Cardinality restriction Refinement : Applying cardinality information on object property • Domain restriction Refinement : Applying value of datatype property • Qualification Refinement : Applying information of quantification restrictions and range class of object property
4. Algorithm Refinements • Developed Refinements
4. Algorithm • The list of syntax information which can be expressed in ontology
5. Description of Problem Train problem Ten trains • After learning, found definition of eastbound train is as follows
5. Description of Problem Train problem • Artificial task of learning to predict whether a train is headed east or west • Data is consist of relation tuples • Relations • eastbound(T) : train T is eastbound • has-car(T,C) : C is a car of T • infront(C,D) : car C is in front of D • long(C) : car C is long • open-rectangle(C) : car C is shaped as an open rectangle similar relations for five other shapes • jagged-top(C) : C has a jagged top • sloping-top(C) : C has a sloping top • open-top(C) : C is open • contains-load(C,L) : C contains load L • 1-item(C) : C has one load item similar relations for two and three load items • 2-wheels(C) : C has two wheels • 3-wheels(C) : C has three wheels