An Empirical Investigation of Learning from the Semantic Web

An Empirical Investigation of Learning from the Semantic Web Pete Edwards Gunnar AAstrand Grimnes Alun Preece Computing Science Department University of Aberdeen {pedwards,ggrimnes,apreece}@csd.abdn.ac.uk Semantic Web Mining Workshop @ ECML 2002 An Empirical Investigation of Learning from the Semantic Web

Motivation • The Semantic Web should: • Facilitate learning from the Web. • Facilitate reuse of learning outcomes. Hypothesis : Learning from Semantically Marked-up data should outperform learning from plain text. An Empirical Investigation of Learning from the Semantic Web

Methods • Compare performance of learning from plain text and from semantic Meta-data. • Using traditional ML algorithms as baseline approach. • Naïve Bayes • K-Nearest Neighbour • Explore application of more knowledge intensive approaches, such as ILP. An Empirical Investigation of Learning from the Semantic Web

Datasets • Semantic Web still in its infancy, so available datasets are limited. • Need dataset with instances represented in plain-text and in some semantic markup-language. • Forced to use artificial data-sets. • No ontological support. An Empirical Investigation of Learning from the Semantic Web

ITTalkshttp://ittalks.org • ITTalks is a real Semantic Web application. • Information about seminars at Universities in the US. • Plain HTML and DAML+OIL versions of each talk has slightly different content, but largely overlapping. • No classification of data, so we did personal preference labelling. An Empirical Investigation of Learning from the Semantic Web

ITTalks example <rdf:RDF> <rdf:Description about="http://www.ittalks.org/jsp/Controller.jsp?action=ViewTalk&as=HTML&talkid=20010620141011"> <Talk rdf:parseType="Resource"> <Title>PROBABILISTIC OPTIMIZATION TECHNIQUES FOR MULTICAST KEY MANAGEMENT … </Title> <Abstract>Multicast is a key technology to support large group communications over the Internet… </Abstract> <BeginTime> <time:Year>2001</time:Year> <time:Month>06</time:Month><time:Day>20</time:Day> ... </BeginTime> ... <Audience>General Public</Audience> <DomainName>umbc</DomainName> <Location rdf:parseType="Resource"> <Institution>UMBC</Institution> </Location> <Speaker rdf:parseType="Resource"> <Name>Ali Selcuk</Name> <Organization>UMBC</Organization> <Email>aselcu1@csee.umbc.edu</Email> </Speaker> </Talk> </rdf:Description> </rdf:RDF> An Empirical Investigation of Learning from the Semantic Web

ResearchIndexhttp://citeseer.nj.nec.com • ResearchIndex is scientific literature digital library. • Articles from 17 different subject areas within Computing Science. • Full text of article and BibTeX provided. • BibTex converted to RDF. • Full text is typically 6000 words. • BibTex is typically 10 RDF Statements. An Empirical Investigation of Learning from the Semantic Web

BibTeX  RDF mapping @inproceedings{ davies94agentk, author = "W. H. E. Davies and P. Edwards", title = "Agent-K: An Integration of AOP and KQML", booktitle = "Proceedings of the CIKM'94 Workshop on Intelligent Agents", address = "Gaithersburg, MD, USA", editor = "T. Finin and Y. Labrou", year = "1994", url = "citeseer.nj.nec.com/15298.html" } <inproceedings rdf:about="davies94agentk"> <author>W. H. E. Davies and P.Edwards</author> <title>Agent-K: An Integration of AOP and KQML</title> <booktitle>Proceedings of the CIKM'94 Workshop on Intelligent Agents</booktitle> <address>Gaithersburg, MD, USA</address> <editor>T. Finin and Y. Labrou</editor> <year>1994</year> <url>citeseer.nj.nec.com/15298.html</url> </inproceedings> An Empirical Investigation of Learning from the Semantic Web

Knowledge Sparse LearningRepresentation • For each algorithm we use 3 instance representations: 1. Conventional plain text 2. Meta-data as plain-text 3. Meta-data tags to feature mapping An Empirical Investigation of Learning from the Semantic Web

Method 3 Meta-data tags to feature mapping Meta-data instance: <xml> <rdf> <talk id='mlsemweb1'> <title>An Empirical Investigation of Learning from the Semantic Web</title> <speaker> <name>Gunnar AAstrand Grimnes</name> <url>http://www.csd.abdn.ac.uk/~ggrimnes</url> </speaker> ... Feature tags: talk, title, speaker, name, url ... Instance representation: {}, {empirical, investigation, learning, semantic, web}, {}, {gunnar, aastrand, grimnes}, {csd, abdn, ggrimnes} ... An Empirical Investigation of Learning from the Semantic Web

Knowledge Sparse LearningResults ITTalks ResearchIndex • ITTalks: • Meta 2 performs poorly, caused by redundant features. • Text & Meta 1 are very similar, as those instances in this dataset are almost identical. • ResearchIndex: • KNN performs better for the full text instances, as it is better at dealing with large numbers of features. An Empirical Investigation of Learning from the Semantic Web

Knowledge Intensive LearningRepresentation • Ignore the plain-text representations. • RDF maps to 1st order logic Prolog representation. • Using the ILP algorithm Progol4.4 to learn Prolog rules for class descriptions. • Solve binary classification problems. An Empirical Investigation of Learning from the Semantic Web

RDF  Prolog mapping url( davies94agentk, 'citeseer.nj.nec.com/15298.html' ). editor( davies94agentk, 'T. Finin' ). editor( davies94agentk, 'Y. Labrou' ). titleword( davies94agentk, 'agent' ). titleword( davies94agentk, 'integration' ). titleword( davies94agentk, 'aop' ). titleword( davies94agentk, 'kqml' ). author( davies94agentk, 'W. Davies' ). author( davies94agentk, 'P. Edwards' ). address( davies94agentk, 'Gaithersburg, MD,USA'). year( davies94agentk, '1994' ). type( davies94agentk, ‘#inproceedings' ). booktitleword( davies94agentk, 'proceedings' ). booktitleword( davies94agentk, 'cikm94' ). booktitleword( davies94agentk, 'workshop' ). booktitleword( davies94agentk, 'intelligent' ). booktitleword( davies94agentk, 'agents' ). <inproceedings rdf:about="davies94agentk"> <author>W. H. E. Davies and P.Edwards</author> <title>Agent-K: An Integration of AOP and KQML</title> <booktitle>Proceedings of the CIKM'94 Workshop on Intelligent Agents</booktitle> <address>Gaithersburg, MD, USA</address> <editor>T. Finin and Y. Labrou</editor> <year>1994</year> <url>citeseer.nj.nec.com/15298.html</url> </inproceedings> An Empirical Investigation of Learning from the Semantic Web

Knowledge Intensive LearningResults Agents experiment (155 clauses): inClass(A) :- author(A,'A. Rao'). inClass(A) :- author(A,'D. Lambrinos'). inClass(A) :- titleword(A,agent), titleword(A,mobile). inClass(A) :- type(A,'http://www.csd.abdn.ac.uk/òggrimnes/exp/#misc'), textword(A,agent), titleword(A,agent). inClass(A) :- year(A,1999), titleword(A,agents). inClass(A) :- titleword(A,bdi). Machine Learning (259 clauses): inClass(A) :- publisher(A,'Morgan Kaufmann'), booktitleword(A,learning). inClass(A) :- titleword(A,based), titleword(A,case). Theory (279 clauses): inClass(A) :- volume(A,18). An Empirical Investigation of Learning from the Semantic Web

Future workLearning Personal Profiles Gunnar’s profile. Based on 200 manually rated articles from the ResearchIndex dataset. inClass(A) :- titleword(A,image). inClass(A) :- type(A,'http://www.csd.abdn.ac.uk/~ggrimnes/exp/#misc'), textword(A,learning). inClass(A) :- booktitleword(A,mining). inClass(A) :- author(A,'N. Jennings'). inClass(A) :- titleword(A,indexing). inClass(A) :- pages(A,143). An Empirical Investigation of Learning from the Semantic Web

Conclusion • In terms of accuracy learning from the Semantic Web was not superior. • Learning from RDF requires less resources. • Datasets have no ontological support. • Learning outcomes from the Semantic Web can be real, reusable knowledge. An Empirical Investigation of Learning from the Semantic Web

An Empirical Investigation of Learning from the Semantic Web

An Empirical Investigation of Learning from the Semantic Web

Presentation Transcript

Learning the Semantic Meaning of a Concept from the Web

An Overview and Underview of the Semantic Web

An Introduction To The Semantic Web

The Semantic Web an introduction

Learning Objects on the Semantic Web

The Colonial Origins of Comparative Development: An Empirical Investigation

From Silos to the Semantic Web :

The Effects of Virtual Reality on Consumer Learning “An empirical Investigation”

What Makes Users Refuse Web Single Sign-On? An Empirical Investigation of OpenID

An Empirical Investigation of Volume in Equity-Contingent Claims

Learning organization and mentoring practice: An empirical investigation

Learning Descriptions from the Semantic Web

Learning the Semantic Meaning of a Concept from the Web

An Empirical Investigation of the Three Selves

Learning Knowledge Rich User Models from the Semantic Web

An Introduction to the Semantic Web

An Investigation of Learning Behavioural Functions

The Colonial Origins of Comparative Development: An Empirical Investigation

Learning the Semantic Meaning of a Concept from the Web

Learning organization and mentoring practice: An empirical investigation