Real-time recommendations for retail: Architecture, algorithms, and design

Juliet Hougland and Jonathan Natkins Real-time recommendations for retail: Architecture, algorithms, and design

Who Are We? • Jonathan Natkins • Field Engineer at WibiData • Before that, Cloudera Software Engineer • Before that, Vertica Software/Field Engineer • Juliet Hougland • Data Scientist, previously at WibiData • MS in Applied Math • BA in Math-Physics

Recommendations in Retail • Personalized versus Non-Personalized

Recommender Contexts • Taste History • Based on everything you know about a user • Interests over months/years • Current Taste • Based on a user’s immediate history • Interests over minutes/hours • Ephemeral • Extreme version of current taste • For example, location • Demographic* • Similar to taste history, but less subjective • Geographic region, age bracket, etc.

Why Does Real-Time Matter? Relevancy

I am a Special Snowflake Natty

Requirements for a Real-Time System • General System Requirements • Handle millions of customers/users • Support collection and storage of complex data • Static and event-series • Real-Time System Requirements • Quickly retrieve subsets of data for a single user • Aggregate/derive new, first-class data per user

What is Kiji? • The Kiji project is a modular, open-source framework for building real-time applications that collect, store, and analyze entity-centric data • kiji.org • github.com/kijiproject

Three Challenges • Developing models for use in real-time • Scoring models in real-time • Deploying models into a production environment

How Can We Make Real-Time Models? Population interests change slowly Individual interests change quickly

How Can We Make Real-Time Models? Population interests change slowly Individual interests change quickly Models don’t need to be retrained frequently

How Can We Make Real-Time Models? Population interests change slowly Application of a model should be fast Individual interests change quickly Models don’t need to be retrained frequently

A Common Workflow • Train a model over the entire dataset • Save fitted model parameters to a file or another table • Access the model parameters when generating new recommendations based on new data This is EXPENSIVE

Developing Models • KijiExpress • Scala interface for interacting with Kiji data • Uses Scalding for designing complex dataflows • Model Lifecycle • Allows analysts and data scientists to break apart a model into phases

Scoring Models in Real-Time • Batch isn’t real-time

Scoring Models in Real-Time • Batch isn’t real-time Number of Users Number of Interactions

Scoring Models in Real-Time • Batch isn’t real-time Number of Users A few users with many interactions Number of Interactions

Scoring Models in Real-Time • Batch isn’t real-time A lot of users with few interactions Number of Users A few users with many interactions Number of Interactions

Fresheners Compute Lazily Read a column Get from HBase Client KijiScoring Server HBase

Fresheners Compute Lazily Read a column Get from HBase Client Freshness Policy KijiScoring Server HBase

Fresheners Compute Lazily Read a column Get from HBase Client Freshness Policy Yes, return to client KijiScoring Server HBase

Fresheners Compute Lazily Read a column NO Get from HBase Client Freshness Policy Scorer KijiScoring Server HBase

Fresheners Compute Lazily Read a column Get from HBase Client Freshness Policy Scorer Yes, return to client Write back for next time KijiScoring Server HBase

Kiji Application Stack

Deployment Challenges

Kiji Model Repository • Link between application and models • Stores Freshener metadata • FreshnessPolicy, Scorer, attached column • Location of trained model • Stores Scorer code • Code repository makes model scoring code available to the application from a central location • New models can be deployed to the Model Repository and made immediately available to the application

Kiji Model Repository

Retail Recommendation

Types of Recommenders Recommendation Algorithms Collaborative Filtering Methods Content Based Methods Memory Based Model Based

Content-Based Recommenders Build models around entities using features that we think reflect inherent characteristics Orange-Nosed Lab Assistant Meeps a lot

Content-Based Recommenders safer faster knife

Pandora: Content-Based Expertly-Characterized Music

Collaborative Filtering Represent users-item affinities as a sparse matrix Beaker Banana Slicer Pineapple Slicer Users ≈ Rows Items ≈ Columns

Aspirational Ratings I put in my queue… I actually watch

Collaborative Filtering Represent users-item affinities as a sparse matrix Beaker Banana Slicer Pineapple Slicer Users ≈ Rows Items ≈ Columns

Collaborative Filtering: How It Works Similar Users Similar Products Simple aggregate predictors

Similar Entities • What do we mean by similar? • Jaccard Index: a measure of set similarity • Cosine Similarity: the angle between two vectors • Pearson Correlation: statistical measure, similar to cosine • Naively, we could compare every entity to each other …But that would not scale will with increasing numbers of entities

Building the Similarity Matrix

Collaborative Filtering: Is This Useful? • Problem: Too much data! • Tracking user preferences and all their events generates huge amounts of data • Problem: Too little data! • Dimensions of user-space and item-space are usually very large • More variables makes it more difficult to generate user preferences • Problem: Cold start • If you don’t know anything about a user, what should you recommend? • Problem: More ratings means slower computations • Identifying neighborhoods of entities is expensive

Collaborative Filtering: Why Is It Useful? • Because it works • Content-agnostic • All that matters is co-occurrence of events

Amazon: Item-Item Collaborative Filtering • Used for personalized recommendations • Fill screen real estate with related items • Produces specific, but non-creepy recommendations > Linden, G.; Smith, B.; York, J., "Amazon.com recommendations: item-to-item collaborative filtering," Internet Computing, IEEE , vol.7, no.1, pp.76,80, Jan/Feb 2003

Item-Item Collaborative Filtering • Beaker buys a banana slicer • Then: • Generate list of candidate items to predict ratings for • Predict ratings for candidate items • Select Top-N items

Accessing External Data • KeyValueStore API enables external data access when applying a model • External data might be… • Trained model parameters • Hierarchical/Taxonomic data • Geo-lookup • Store external data flexibly • Text files, sequence files, Kiji tables, etc. • Data access is decoupled from use during execution • If the data doesn’t fit in memory, put it in a table

How Much Less Work Can We Do? • We can choose a predictor that allows us to truncate a sum • There are two ways terms in the sum of our predictor can be small • No rating • Small similarity

How Much Less Work Can We Do? • We can choose a predictor that allows us to truncate a sum • There are two ways terms in the sum of our predictor can be small • No rating • Small similarity Ignore unrated items

How Much Less Work Can We Do? • We can choose a predictor that allows us to truncate a sum • There are two ways terms in the sum of our predictor can be small • No rating • Small similarity Ignore dissimilar items

Real-time recommendations for retail: Architecture, algorithms, and design

Real-time recommendations for retail: Architecture, algorithms, and design

Presentation Transcript

Real-time Software Design

Real-time Software Design

Real-time Software Design

Real-time Software Design

Real-Time Design

Mechanism Design for Real-Time Scheduling

“Real-time” Transient Detection Algorithms

Real-Time Dynamic Shadow Algorithms

Real-Time Systems Design

Real-time Software Design

Algorithms and Architecture

Multicore Architecture for Critical Real-Time Embedded Systems

Real-Time System Design

Real-Time Control Architecture for SAUVIM

Real Time Systems Design

Real-time Design and Verification

A Real Time Radiosity Architecture for Video Games

Anytime Control Algorithms for Embedded Real-Time Systems

Algorithms for Real Time Magnetic Field Tracing and Optimisation

Landscaping design Recommendations And Recommendations For Grass Rangers

Real-time Software Design