190 likes | 274 Views
incorporating personal information. brent chun sims296a-3. letizia. recommends web pages during browsing based on user profile learns user profile using simple heuristics passive observation, recommend on request provides relative ordering of link interestingness
E N D
incorporating personal information brent chun sims296a-3
letizia • recommends web pages during browsing based on user profile • learns user profile using simple heuristics • passive observation, recommend on request • provides relative ordering of link interestingness • assumes recommendations “near” current page are more valuable than others user letizia heuristics recommendations user profile
why is this useful? • tracks and learns user behavior, provides user “context” to the application (browsing) • completely passive: no work for the user • consequences? • useful when user doesn’t know where to go • no modifications to application: letizia interposes between the web and the browser • consequences?
consequences of passiveness • weak heuristics • example: click through multiple uninteresting pages en route to interestingness • example: user browses to uninteresting page, heads to nefeli for a coffee • example: hierarchies tend to get more hits near root • cold start • no ability to fine tune profile or express interest without visiting “appropriate” pages
open issues • how far can passive observation get you? • for what types of applications is passiveness sufficient? • profiles are maintained internally and used only by the application. some possibilities: • expose to the user (e.g. fine tune profile) ? • expose to other applications (e.g. reinforce belief)? • expose to other users/agents (e.g. collaborative filtering)? • expose to web server (e.g. cnn.com custom news)? • personalization vs. closed applications • others?
new clone xfer find summ lifestreams • lifestream = time ordered stream of documents + filters + “agents” • filters provide views (like rdbms) called substreams • “agents” attach to the ui, streams, and documents • provide (condition,action) pairs. • no machine learning A lifestreams document a lifestream A lifestreams document A lifestreams document A lifestreams document A lifestreams document A lifestreams document A lifestreams document A lifestreams document Oct 19, 1998 lifestream operations Oct 20, 1998 Oct 21, 1998
lifestreams assessment • linear stream of documents is a poor metaphor if used alone • don’t tell me to abandon my hierarchies! • problems: managing complexity, large “working sets”, etc. • stated problem: too many apps, too many file xfers, too many format xlations, too many hierarchies • lifestreams don’t help with any of these and simply replaces the fourth • most of techniques used apply equally well to hierarchies • no machine learning = more work for the user
lifestreams assessment cont. • filters are nice, but how do you write one? • application-specific, but we already knew this • example: “all the email I haven’t responded to” • agents are nice, but how do you write one? • application-specific, but we already knew this • agents have limited applicability
open issues • new metaphors to manage complexity • easy ways to create filters/agents • allow “fuzzy” filters • lifestreams: filters need to be precisely specified • use machine learning + user feedback to relax this • associate actions with filters • tight integration of filters, agents w/ applications • apply ideas in lifestreams to hierarchies • others?
learning interface agents • add agents in the ui, delegate tasks to them • use machine learning to improve performance • learn user behavior, preferences • useful when: • 1) past behavior is a useful predictor of the future • 2) wide variety of behaviors amongst users • examples: • mail clerk: sort incoming messages in right mailboxes • calendar manager: automatically schedule meeting times?
advantages • 1) less work for user and application writer • compare w/ other agent approaches • no user programming • significant a priori domain-specific and user knowledge not required • 2) adaptive behavior • agent learns user behavior, preferences over time • 3) user and agent build trust relationship gradually • claimed advantage: user constructs model of how agent makes decision over time • real users: do the right thing!
machine learning • 1) learn by observation • observe user, record (situation,action) pairs • use “similar” past (situation,action) pairs to predict action for new situations • similarity = weighted difference of situation features • weights assigned based on feature/action correlations • algorithm • take n closest situations, compute scores for associated actions • recommend (or perform) action with highest score • use (situation,action) pairs to explain recommendations
machine learning cont. • 2) learn by user feedback • indirect feedback (e.g. ignore recommendation) • direct feedback (e.g. don’t do this again) • database of priority ratings • 3) learn by being trained • train agent by giving examples of desired behavior • e.g. save all messages from bnc@cs.berkeley.edu in the sims296a-3 mailbox
open issues • how far can black box treatment of apps get you? • example: mail clerk integration w/ ui requires access to application internals; what if this wasn’t the case? • tight integration with application user interface • access to internal events/state of significance • easy way to enable third-party developers to write personalization modules for applications? • chaining (situation,action) pairs to perform complex tasks • e.g. monitor ACM digital library -> look for interesting papers -> download them -> file them -> notify me via email -> print out. • others?
sonia • automatic construction of document clusters • categorization based on full-text comparisons • automatically classify new docs into existing clusters • multiple cluster hierarchies imposed on same data • examples: categorize search results into clusters, categorize files in user’s home directory classes feature selector stemmer clusterer create clusters documents cs298-1 is290-2 is296a-3 classifier project discussion classify documents documents
creating clusters • stemmer: e.g. walking, walked, walk -> walk • feature selector • 1) remove stopwords, e.g. the, and, is, ... • 2) removes term with freq < 3 or freq > 1000 • clusterer • 1) hierarchical agglomerative clustering • 2) iterative clustering technique • document similarity based on term overlap • cluster similarity = pairwise ave. of document similarities
classifying documents • pachinko machine (bayesian classification) • uses 50 “most informative features” for each cluster • significant reduction in computational cost • claim: often sufficient for accurate classification • obvious trade-off between compute time vs. accuracy • best case: compare new document with every document in every cluster and assign, compute time may not justify gain in accuracy.
why is this useful? • useful to help understand contents of large collection of documents (e.g. results from a database query) • useful to automatically construct multiple categorizations of same data • e.g. user may take the time to categorize personal files in a single hierarchy, unlikely to do this in multiple ways • saves times by automatically classifying documents • most applicable when consequences of error are low
open issues • adding importance, confidence to the system • using document structure for weighting terms (e.g. terms in abstract vs. terms in text) • support for different document types (e.g. PS!) • others?