140 likes | 272 Views
Personalised Customer Services : The Impact of Semantic MetaData. Gunnar AAstrand Grimnes Alun Preece & Pete Edwards University of Aberdeen ggrimnes@csd.abdn.ac.uk. We can identify two groups: ShopBots: BargainFinder (Krulwich, 1995) WebMate, mySimon, LatestPrices, DealTime
E N D
Personalised Customer Services : The Impact of Semantic MetaData Gunnar AAstrand Grimnes Alun Preece & Pete Edwards University of Aberdeen ggrimnes@csd.abdn.ac.uk
We can identify two groups: ShopBots: BargainFinder (Krulwich, 1995) WebMate, mySimon, LatestPrices, DealTime Recommendations RINGO (Shardanand, 1994) Amazon, MovieFinder, PTV (Barry Smyth, 1998) Personal Agents in Electronic Commerce
Profiling Representation Acquisition Deployment Trust Feedback Issues and Challenges for Personal Agents
Activities within IR, ML and Agent research communities. Challenges: Representation of training events. Data volume. Noise, redundancy, inconsistencies. Ill defined semantics. Tools Statistical analysis of text. TF/IDF, SVD. Naïve bayes, Nearest Neighbour, Bagging Boosting. Machine Learning for personalisation
The Semantic Web Ontologies Well defined syntax. XML Well defined semantics RDF/DAML+OIL WebServices Truly flexible autonomous agents assisting users.
ShopBots: No more “screen scraping” Less human effort required. Recommenders: Semantic model of the user. Semantics for products and services. New recommender techniques which marry user models & product/service descriptions to deliver meaningful recommendations. Learning profiles Today: sparse vector representation Tomorrow: semantically enriched representation Semantic Web & Agents
Aims: To investigate how semantically enriched representation affects profile learning. To explore methods for mapping semantic markup to training instance representations. To explore various performance metrics (accuracy, time to learn profile, time to use profile. Hypothesis: Semantic representation should outperform simple sparse-vector (text) representation. How can semantics help?
We were unable to find an electronic commerce dataset with semantic markup. We have used two datasets: 1. ITTalks, binary classification (like/dislike), 58 instances, classes average 19 and 38 instances. 3200 different terms. http://www.ittalks.org 2. CiteSeer Papers, 17 classes (subject areas of CS) We also did binary classifications. 5066 instances, classes average 298 instances. 400.000 distinct terms reduced to 1500 by choosing most significant, based on TF/IDF. http://citeseer.nj.nec.com/directory.html Datasets
<Talk rdf:parseType="Resource"> <Title>Bidding Algorithms for Simultaneous Auctions</Title> … <Abstract>This talk is concerned with computational problems … </Abstract> <Speaker rdf:parseType="Resource"> <Name>Amy Greenwald</Name> <Organization>Department of Computer Science Brown University</Organization> </Speaker> <Host rdf:parseType="Resource"> <Name>Timothy Finin</Name> <Organization>UMBC</Organization> </Host> ... Dataset 1 - IT Talkswww.ittalks.org Class: Pete - likes, Alun - dislikes, Gunnar - dislikes.
Class: Human Computer Interaction RDF generated from BibTex: <?xml version="1.0"?> <article key="pelachaud96generating"> <author>Catherine Pelachaud and Norman I. Badler and Mark Steedman</author> <title>Generating Facial Expressions for Speech</title> <journal>Cognitive Science</journal> <volume>20</volume> <number>1</number> <pages>1-46</pages> <year>1996</year> <url>citeseer.nj.nec.com/pelachaud94generating.html</url> </article> Dataset 2 - CiteSeer papers
Approaches: 1. Conventional textual representation 2. Treating semantic data as text 3. Mapping semantic markup to attributes Classifier Naïve Bayes classifier. Binary term vector. Data pre-processing: Stoplist, no stemming, length>=3, no numbers. TFIDF. Experimental Methodology
<?xml version=“1.0” ?> <talk> <speaker>Gunnar Grimnes</speaker> <title>Personalised Customer Services : The impact of the Semantic Web</title> <description>A talk describing some experiments with learning from metadata</description> </talk> Utilising MetaData Approach 2 Binary vector of words that appear in the text: For example, if the term vector was made up of these words: talk, agent, speaker, daml, shop … This instance would look like this: Class: 1,0,1,0,0, ... Approach 3 Each tag mapped to an attribute. For example, given a list of all tags like this: talk, speaker,title, venue, description, date, … This instace would look like this: Class: {}, {gunnar, grimnes}, {personalised, customer, services, impact, semantic, web}, {}, {talk, describing, experiements, learning, metadata} …
Results Dataset 1 - ITTalks Dataset 2 - CiteSeer papers
Initial Hypothesis: “Semantic representation should outperform simple sparse vector (text) representation.” What do our results to date tell us? Outstanding Issues / Questions: Other Datasets? How to exploit the semantics further? The role ontological inference. Discussion and future directions