Parallelizing Random Walk with Restart for Large-Scale Query Recommendation

Parallelizing Random Walk with Restart for Large-Scale Query Recommendation Meng-Fen Chiang, Tsung-Wei Wang and Wen-ChihPeng Department of Computer Science National Chiao Tung University (R.O.C.)

Outline • Introduction • Related Work • problem Definition • Parallel RWR • Temporal following pattern mining • Recommendation graph construction • Random walk with restart for multiple queries • Experimental Results • Conclusion

Introduction • Yahoo! Asia Knowledge Plus (AKP) Question Answer

Introduction (contd.) • User access log • Consider a QA pair as an Item • A sequence of items clicked by a user • Typically, what a user looks for during a short period shares certain topics • Within 4 min, 18 sec. “Upload photos to Facebook “

0.03 0.04 10 9 0.10 12 2 0.08 0.02 0.13 8 1 0.13 11 3 0.04 4 0.05 6 5 0.13 7 0.05 Introduction (contd.) • Random Walk with Restart (RWR) • Compute relevance scores of a set of node for a query node

Related Work • Random Walk with Restart (RWR) • Off-line mode • Pre-compute required information off-line • Pros : fast on-line recommendation for a query • Cons : prohibitive storage consumption • On-line mode • Compute recommendation for a query on-line • Pros : less storage consumption • Cons : longer response time • Fast RWR • Less storage consumption • Fast on-line response time for a query

Related Work (contd.) • Scalable recommendation • SmartMiner • Identify user sessions • Mine frequent navigation patterns • Personalized community recommendation • 312 K active users, 109 K popular communities • Training time ~ 14 mins (200 nodes) • Personalized news recommendation • Handel streaming content • No explicit runtime analysis of off-line training and on-line recommendation

Problem Definition • Goal • Given user click logs, a query item I • Recommend relevant items w.r.t. I • Requirements • Effectiveness • Mine frequent navigation patterns from click logs • Scalability • Efficiently manage large-scale click logs within few hours • Parallelization of RWR • Parallelization of RWR for multiple query nodes

Outline • Introduction • Related Work • problem Definition • A framework for scalable recommendation • Temporal following pattern mining • Recommendation graph construction • Random walk with restart for multiple queries • Experimental Results • Conclusion

System Architecture User Access Logs Temporal Following Pattern Mining Item ID : <Item List> . . . Parameters: window size bin size Recommendation Graph Construction Query Items : Item 1 Item 2 . . . Random Walk with Restart Item ID : <Item List> . . . Input Off-Line Computation Storage

Mining Temporal Following Patterns in Parallel User Access Logs Temporal Following Pattern Mining Item ID : <Item List> . . . Parameters: window size bin size Recommendation Graph Construction Query Items : Item 1 Item 2 . . . Random Walk with Restart Item ID : <Item List> . . .

Temporal Following Relation • Frequent QA browsing behaviors of users within a pre-defined time window • E.g., window size = 150 sec. User Click Stream : Item 1 Item 2 Item 3 Item 4 70 0 30 160 Temporal Following relation : <Item 1, Item 2> : dt = 30 <Item 1, Item 3> : dt = 70 <Item 1, Item 4> : dt = 160 . . .

Temporal Following Pattern Mining User click logs Parameters . . . Emit temporal following pairs for each item Mapper N Mapper 1 Temporal Following Relations <Itemi, Itemj:cntij> Aggregate temporal following relation for each item . . . Reducer 1 Reducer N Temporal Following Patterns <Itemi, <Itemj:cntij, …, Itemz:cntiz>>

Recommendation Graph Construction User Access Logs Temporal Following Pattern Mining Item ID : <Item List> . . . Parameters: window size bin size Recommendation Graph Construction Query Items : Item 1 Item 2 . . . Random Walk with Restart Item ID : <Item List> . . .

Recommendation Graph Construction • Goal • Transform discovered temporal following patterns to a recommendation graph • E.g., n2 Recommendation Graph cnt12 Temporal Following Pattern n4 n1 <Item 1, <Item2:cnt12, item3:cnt13>> <Item 4, <Item3:cntt13>> cnt43 cnt13 n3

Paralleling Random Walk with Restart User Access Logs Temporal Following Pattern Mining Item ID : <Item List> . . . Parameters: window size bin size Recommendation Graph Construction Query Items : Item 1 Item 2 . . . Random Walk with Restart Item ID : <Item List> . . .

10 9 12 2 8 1 11 3 0.04 0.03 10 9 0.10 12 4 0.13 0.08 2 0.02 8 1 11 0.13 3 6 0.04 5 4 0.05 6 5 0.13 7 7 0.05 Paralleling Random Walk with Restart • With single query

Paralleling RWR With Single Query User click logs q : an item Parameters Machine 1 : Set initial score for q Machine N : Set initial score for q . . . Initialization Machine 1 : Calculate relevance score for each item Machine N : Calculate relevance score for each item . . . RWR Machine 1 : Calculate difference of relevance score vectors Machine N : Calculate difference of relevance score vectors . . . Convergence Yes No Converged

10 10 9 9 12 12 2 2 8 8 1 1 11 11 3 3 0.04 0.04 0.03 0.03 10 10 9 9 0.10 0.10 12 12 4 4 0.10 0.13 0.08 0.13 2 2 0.02 0.02 8 8 1 1 11 11 0.10 0.13 3 3 6 6 0.04 0.04 5 5 4 4 0.05 0.13 6 6 5 5 0.13 0.13 7 7 7 7 0.05 0.13 Paralleling Random Walk with Restart • With multiple query 0.13

Paralleling RWR With Multiple Queries User click logs Q : items Parameters Machine 1 : Set initial score for Q Machine N : Set initial score for Q . . . Initialization Mapper 1 : Calculate diffusion score for each item w.r.t. each q Mapper N : Calculate relevance score for each item w.r.t. each q . . . RWR Reducer 1 : Sum up diffusion score for each item w.r.t. q Reducer N : Sum up diffusion score for each w.r.t. q . . . Until Maximum iteration <Itemi, <q1:rs1i, …, qz:rs1z> <adjacent list>>

Paralleling RWR With Multiple Queries • Diffusion score for each item w.r.t. q • Sum up diffusion scores for each item w.r.t. q

Experimental Setup • Yahoo! Asia Knowledge Plus (AKP) • Duration : 1-week in July, 2009 • #clicks : 90 M • #items : 4 M • #users : 2 M • Performance evaluation • Quality study • Scalability study • Case study

Quality Study • User access logs • Train 80% • Test 20% • Groundtruth • For each item I clicked by user U • The set of items clicked by U after I within T sec. • Measure the similarity with historical user click logs • Item-precision • Item-recall

Quality Study (contd.) • Top-k hot items in the category of test item (HC) • Temporal following pattern (TFP) • RWR based on temporal following pattern (RWRTFP) • Higher precision & recall

Scalability Study • Temporal following pattern (TFP) • 4.1M items • 40 sec. • RWR based on temporal following pattern (RWRTFP) • #sizes of input data • #computing nodes

Scalability Study (contd.) • Computational cost is significantly reduced as number of machines increases • More queries, more computation effective • 0.74 sec. (2K queries)  0.49 sec. (10K queries)

Case Study • Query Item • “What can I do if I do not have Word?”

Conclusion • Proposes a parallel RWR for multiple query recommendation • Parallelize mining frequent navigation behavior • Parallelize RWR • Compute RWR for multiple queries in parallel • The recommender system • General • Content- agnostic

Q & A

Temporal Following Pattern Mining User click logs Parameters Mapper 1 : Emit temporal following pairs for each item Mapper N : Emit temporal following pairs for each item . . . Temporal Following Relations <Itemi, Itemj:dtij> Reducer 1 : Aggregate temporal following relation for each item Reducer N : Aggregate temporal following relation for each item . . . Temporal Following Patterns <Itemi, <Itemj:dtij, …, Itemz:dtiz>>

Parallelizing Random Walk with Restart for Large-Scale Query Recommendation

Parallelizing Random Walk with Restart for Large-Scale Query Recommendation

Presentation Transcript

Exploring the Query-Flow Graph with a Mixture Model for Query Recommendation

Large-scale Processing with MapReduce

Large Scale Visualization with ParaView

Interactive Recommendation about Location and Activity Using Integrated Random Walk

Fast and Unified Local Search for Random Walk Based K-Nearest Neighbor Query in Large Graphs

Random walk

TrustWalker: A Random Walk Model for Combining Trust-based and Item-based Recommendation

Task-aware query recommendation

Simple Random Walk

LARGE SCALE

Random Walk with Restart (RWR) for Image Segmentation

Grocery Shopping Recommendation Based on Basket-Sensitive Random Walk

Large scale

Fast Random Walk with Restart and Its Applications

Random Walk Simulation

Large-Scale Computing with Grids

A Fault-Tolerant Environment for Large-Scale Query Processing

Extinction: A large Scale Random Transformation

Random Walk Model

Fast Random Walk with Restart and Its Applications