1 / 15

A Semi-Universal E-Commerce Agent

Explore how wrappers extract information from various sources, including table and list displays, for effective comparison shopping.

Download Presentation

A Semi-Universal E-Commerce Agent

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Semi-Universal E-Commerce Agent International Conference on Enterprise Information Systems 2002 Cuidad Real, Spain Aleksander PivkDepartment of Intelligent SystemsJozef Stefan InstituteLjubljana, Slovenia 03. April 2002

  2. What is an (intelligent) agent? • An intelligent agent is a computer system capable of flexible,autonomous action in some environment. • Examples: • Environment: internet agent, OS agent, desktop agent, www agent, etc. • Task: information agent, shopping agent, interface agent, email agent, notification agent, etc. ICEIS 2002

  3. Information agents • Task: • access/integrate information from a variety of data sources • Types: • Information Retrieval Agents • search engines • Information Filtering Agents • mail agents, news-delivery agents • Information Extraction Agents • wrappers • Information Integration Agents • meta-search engine, comparison-shopping ICEIS 2002

  4. Information Extraction • IE is the task of identifying the specific fragments of a single document that constitute its core semantic content. Examples: a) from weather report identify locations, dates, temperatures (high and low); b) from online stores get product names, their images, and prices. NAME Casablanca Restaurant STREET 220 Lincoln Boulevard CITY Venice PHONE (310) 392-5751 ICEIS 2002

  5. Wrappers • A wrapper is … • a procedure or a rule that explains how to extract information from an information source • tailored to a particular document collection • appropriate to semi-structured information source • Why using wrappers? • heterogeneous information sources • different styles of user interface and different formats of output display ICEIS 2002

  6. Wrapper Learning • Why learning? • ad hoc formatting conventions used at one site are rarely relevant elsewhere • sites often change their formatting • scalability is the major challenge to IE • Automatic wrapper construction • A site’s wrapper is constructed from a set of example pages • Wrapper induction ICEIS 2002

  7. Implemented Systems • EMA – Employment Agent • memory-based approach • hand-coded wrappers • depends upon the profession ontology (domain-knowledge) • ShinA – Customized Comparison Shopping Agent • simple heuristic-based approach • little domain-knowledge used ICEIS 2002

  8. ShinA – Shopping Assistant ICEIS 2002

  9. Our focus • Wrapper learning in real time • to realize customized comparison shopper • Little use of domain knowledge • rather use simple heuristics • exploit the characteristics of semi-structured documents • Flexible and Practical • handle both table-type and list-type displays • handle noisy product description (missing attributes) • handle single product description in multiple lines ICEIS 2002

  10. Learning Query Scheme Templates <form site= "amazon.com"> <name>searchform</name> <method>post</method> <action>www.amazon.com/exec/obidos/search-handle-form</action> <input type= "text" name="field-keywords" size=“15" /> <input type= "image" name= "Go"/> <select name= "index"> <option value= “all products" selected /> <option value= "books" /><option value= "…" /> </select> </form> ICEIS 2002

  11. Learning product descriptions • Table-type display of 5 different PDU’s • Task • recognize each PDU • recognize attributes within PDU • learn rules to extract attributes PDU - Product Description Unit ICEIS 2002

  12. PDU Pattern Learning: Algorithm • First phase • remove irrelevant parts of HTML source (header, advertisements, footer) • the remaining HTML source is broken into logical lines • Second phase • categorize each logical line • 9 different categories (PRICE, TITLE, IMAGE, URL_LINK, TTAG, LBTAG, etc.) • Third phase • find most frequent pattern(s) for PDU(s) in the sequence of logical line categories ICEIS 2002

  13. PDU Pattern Learning: Example A fragment of the HTML source of the search result for the query “intelligent agent“ to Amazon bookstore. <img src="http://g-images.amazon.com/images/G/01/v9/130668.jpg" width="80“ height="80" vspace="2" alt=""> --2 </td> --4 <td> --4 <p> --5 <a href="http://www.amazon.com/book.asp?id=010101&book=130668"> --3 Intelligent Internet Agents: Agent-Based Information Discovery on the Internet --1 </a> --9 <br> --5 $59.95 --0 { 0:price; 1:title; 2:image; 3:link; 4:table tag; 5:line tag, 9:other tag; } Extracted PDU pattern: 244531950 ICEIS 2002

  14. Simple Heuristics • Recognizing a title • contains at least one query word • text line that corresponds to pre-determined pattern’s title • Recognizing a price • contains a currency symbol ($, €) • contains a currency token (EUR, SIT) • contains digit(s) with relevant delimiters (‘,’; ‘.’) • Recognizing an image • unique image url-address within pattern • Able to recognize attributes with heuristic rules • examples: ISBN numbers, dates, discount rates • Unable to recognize other attributes • authors, review comments, recommendation status ICEIS 2002

  15. Conclusion • Limitations • query search box must exist • price information must exist • extracts only a few attributes (title,price,image,link) • Future work • more use of domain knowledge (ontologies) • extract other non-price attributes • use of XML-based wrappers • applications to other domains ICEIS 2002

More Related