620 likes | 1.06k Views
Internet Agents. Web search Agents Information filtering agents Off-line delivery agents Notification agents Service agents Web site agents Mobile agents. Information Search. Ways to Find Information Browsing: Following hyper-links that seem of interest
E N D
Internet Agents • Web search Agents • Information filtering agents • Off-line delivery agents • Notification agents • Service agents • Web site agents • Mobile agents
Information Search • Ways to Find Information • Browsing: Following hyper-links that seem of interest • Searching: Sending a query to a search engine such as Lycos • Categories: Following existing categories such as Yahoo • Problems • Spent a lot of time and effort to navigate. Can search be made more efficient? • Search but it is difficult to accurately express the user’s intention. • Search engines are not personalized
Search EnginesWeb etiquette guidelines for spiders • Identify the name of the agent • Identify the user deploying the agent • Announce the agent by posting a message to the comp.infosystems.www.providers Usenet newsgroups • Announce the agent to the Webmasters of the servers the agent will visit • Provide additional information (using the Referrer field) • Be accessible to fix problems the agent may cause • Design the agent so it does not consume lot of resources (e.g. does not use successive hits on a single server, does not loop, runs at appointed times, etc.)
Limitations of current search engines • Lack of personalization; this results in low precision of answers • Unscaleability: *the robot must visit not only new links but also old ones to keep them up to date; *the information gathering is centralized Some solutions to scalability issues: • use specialized information brokers for building information indices • use massive replication and caching of popular information • distributed information gathering by placing gatherers on the provider’s site; thus information is ready for analysis as new information comes in, but the provider must implement the software.
Information Filtering Agents • Information Filtering agents find the content of interest to a user. • Information Filtering agents could gather information from different sources • They could filter information based on user’s personal interest • Filtering agents typically use a fixed number for information sources • Information filtering agents may use Information Retrieval techniques *Vector space models, where a document is represented as a vector of attributes *Tree structure, which represents a hierarchical view of a document
Filtering Agent Architecture Figure 3.4 Filtering based on word usage Insignificant Low-frequency words Insignificant High-frequency words Words usage frequency
Functionality of WebMate • Learning user’s interests for information filtering • Multiple TF-IDF vectors representation • Incremental and adaptive Learning • Compile personal newspaper • Support for efficiently finding information • Automatic refinement using Trigger Pairs • Relevance feedback _____________________________ Chen, Sycara, “WebMate: A Personal Agent for Browsing and Searching”, Proceedings of the Second International Conference on Autonomous Agents, Minneapolis, MN, May 1998
Profile Representation • Multiple TF-IDF vectors representation • How many vectors are used? (Settable parameters; depends on # User’s interests, Computational complexity) • How many dimensions are used in a vector? (Computational complexity, typical lexicons in a domain)
Learning Algorithm • Preprocess: Parse HTML page, delete stop words, stemming • Extract TF-IDF vector of the current interesting document • If the number of vectors in the profile is less than predefined number, add the vector to the profile • Otherwise, calculate the cosine similarity between every two TF-IDF vectors in the profile • Combine the two vectors with the greatest similarity. • Sort the weights in the new vector in decreasing order and keep the highest several elements
Compile Personal Newspaper • Automatically spide a list of URLs or Construct a query from the profile • Calculate the similarity and check whether the similarity is greater than some threshold • Experiments: Accuracy in top 10 is between 50% and 60%; Accuracy in top 20 is about 50%; Accuracy in the whole is about 30%
Search Refinement • Trigger Pairs Based Automated Refinement • If a word S is significantly¹ correlated with another word T, then (S, T) is considered a “trigger pair”, with S being the trigger and T the triggered word. • Relevance Feedback • The context of the search keywords in the “relevant” pages is used to automatically refine the search • Parallel Search and Rerank • Similarity-based Query ___________________ ¹Significance is measured by mutual information (MI):
Examples of Trigger Pairs • Broadcast News Corpus: 140M words, Distance between S and T is 500 • Examples1: product << {maker,company, corporation, industry, incorporate, sale, computer, market, business,…} • Example 2: car <<{motor, auto, model, maker, vehicle, for, buick, honda, inventory, assembly, chevrolet, sale, …} • Example 3: fare << {airline, maxsaver, carrier, discount, air, coach, flight, traveler, continental, unrestrict, ticket,…} • Example 4: music << {symphony, orchestra, composer, song, concert, tune, concerto, sound, musician, album, …}
Automatic Search Refinement • The user chooses the domain, and the system automatically expands the query using domain specific triggers or ontology • The user chooses the intended definition of the ambiguous words, and the system according to the definition expands the query • For a search with only one keyword, the top several triggers to the keyword are used to expand the search • For a search with more than 2 keywords, the intersection of the triggers to the keywords are used to expand the search
Relevance Feedback Algorithm • The context of the search keywords in the “relevant” pages is used to refine the search • Given a relevant page, the system looks for the context of the keywords, and calculates the frequency in order to use the top several frequent words to expand the query
The Query Restart Problem • Agent A sends query to Agent B. • Agent B can complete the query in time X, where • X = 1 with probability p. • X = c (c > 1) with probability 1 - p. Expectation: EX = p + (1 -p) c • If not done by time 1, should agent A abort and restart, or wait? • Can restarting reduce expectation? The variance? Both? • Does it help to repeatedly restart k times? _______________________ Chalasani, Jha, Shehory, Sycara, “Query Restart Strategies for Web Agents”,Proceedings of Autonomous Agents 98, Minneapolis, MN, May 1998
A Simple Scenario: Single restart Strategy: restart just after time 1, if not done by then. Let Xi = completion time of i'th query, i = 1,2. X1, X2 are independent, identically distributed. New completion time is Y: Y = New expectation EY = p + (1 - p)(1 + E X2) (X1, X2 indep.) = 1 + p (1 - p) + (1 - p)c If (and only if) c > 1 + 1 / p, EY < X1 ! { 1 if X1 = 1, 1 + X2 if X1 = c.
A Simple Scenario: k Restarts Number of Restarts k
Off-Line Delivery Agents Off-line Delivery of Agents Attributes Element Description Environment Internet, news feeds Task skills Information Knowledge Web, news, finance, sports, weather Communication skills HTTP, Meta tags, Desktop OS Information filtering agents that deliver personalized information without the need for a direct Internet connection
Notification Agents A notification agent is one that notifies a user of significant events, i.e. a change in the state of information, e.g. • Content change in a particular Web page • Search engine additions for specific keyword queries • User-specified reminders for personal events (e.g. birthdays) • Notification Agent Attributes
Other Service Agents • Announcement Agents • Business information monitoring agents • Classified ads agents: search database of ads • Direct mail agents: deliver direct mail advertising • Financial service agents: deliver e-mails with prices or other financial news • Food and wine agents • Job agents: virtual recruiters to find appropriate employees • Entertainment agents: find communities of interests similar to the user and recommend items, such as music, movies etc. • Shopping agents: comparison shopping for user-specified items • Site agents: virtual hosts at sites
Shopbots Advantages: • Provide unified interface to different stores, thus mitigating need to navigate and deal with different interfaces • Find best price and availability of a product Challenges • Virtual stores stop agents since they do not want to be compared on price and availability alone • User’s trust in a shopbots’s ability to notice sales and promotions. Solutions: • Cooperative vendor/agent model • Vendor form learning agent
Collaborative Filtering A collaborative filtering system makes recommendations based on the preferences of similar users. People: Yenta, Referral Web Products: Firefly, Tunes, Syskill & Webert Readings: Wisewire, Phoaks
Content vs. Collaboration • Content-based retrieval returns documents that are similar to a query (search) or a user profile (preference) • Collaborative recommendation retrieves documents liked by others with similar profiles
Early Apps • Group Lens (1994) Filtered newsgroups.. news client displays predicted scores & user rates after reading.. • Phoaks Recommended webpages.. uses frequency of mention data within Usenet news groups to rate URL’s
Getting the Data Explicit: Firefly rate match recommend Implicit: Amazon purchase match recommend Priming the Pump: Lifestyle Finder uses demographic data to assign users to market research categories Over the Shoulder: Letizia uses observed browsing behavior & heuristics to recommend links
Problems in Collaborative Filtering Incentives & Startup • Need a critical mass of users/recommenders to make meaningful predictions • Need mechanisms to maintain participation Reliability • Spoofing- will content providers inflate their ratings • Technical problems with clustering & similarity measures Privacy • Once you share your profile who else may want it?
Synthetic Agents (e.g. Julia) Julia is a chatterbot that tries to convince users of its humanlike behavior: • ·Repeating user’s input in questions • ·Admitting ignorance • ·Changing the topic of conversation • ·Using conversational statements • ·Using humorous statements • ·Providing excerpts fro Usenet News • ·Simulating typing, mimicking a user’s imperfect performance Possible applications of chatterbots: • ·Visiting on-line chatroooms on topics of interest to your company • ·Initiating interesting conversations in chatrooms • ·Presenting comparison ads against your rivals • ·Querying information requests about your products • ·Serving as a site guide for finding information • ·Serving as a product guide on your site (e.g. demonstrate an automobile)
Intranets Business applications of intranets: • Effective communication medium for enterprises • Create virtual communities within an enterprise • Automating order tracking and transaction • processing • Marketing support automation • Customer service and knowledge sharing among customers • Internal help desk to provide guidance for corporate processes and resources • Human resources support
Drawbacks and extended features Drawbacks include: ·Separate notification for each user interest, cluttering mailbox ·Do not incorporate user model for tracking user’s actions upon information delivery Advanced Features ·Recommend an agent for each new user interest topic ·Modify an existing agent, based on user’s use of agent recommended information (e.g. specialize an information agent) ·Remove an agent that the user does not use ·Temporally activate an agent based on user interest and disinterest in the agent’s recommendation
Collaboration Agents The software runs over a network and enables a team to work together and share information. It assists groups in: ·Group scheduling ·Discussion groups ·Resource tracking ·Document Management It could do some simple tasks: ·Save and re-execute shareable queries that search groupware data bases ·Perform a script under pre-specified conditions ·Perform a script according to pre-specified schedule
Example: Lotus Notes Agent definition ·Agent name with optional comment ·When the agent should run: *manually *if new mail has arrived *if documents have been created, modified, deleted *at scheduled times, e.g. hourly, daily etc • What document should the agent act on? • *all documents • *all new and modified documents since last time agent ran • *all unread documents • *selected documents • What should the agent do? • *User can enter LotusScript program that can examine named fields, and apply simple conditional logic.
Process Automation Agents The goal is to use agents to automate workflow in business applications Differences between traditional workflow and agent-based workflow ·Traditional workflow is centralized; agents offere a distributed infrastructure ·Traditional workflow works only in structured environments; agents could manage workflow during execution ·Traditional workflow pre-specifies paths to take for exception handling: agents can negotiate new tasks and resources dynamically
Database Agents Agents that provide Enterprise-based support ·Run scheduled database analyses in the background ·Exception reporting for operations management ·Notify of information changes in a user-specified database object
Database Agents: Enterprise data delivery system OLAP Server DSS Agent Desktop VLDB Drivers Oracle Informix SQL Server . . . Server