170 likes | 287 Views
Real-Time Recommendation of Diverse Related Articles. Yeung Fu Sing. Introduction. News website provide suggestion Base on popularity, recency or editor’s pick Sometimes on the content. Problem. E.g. News article: Barack Obama elected Suggestion: Obama wins election
E N D
Real-Time Recommendation of Diverse Related Articles Yeung Fu Sing
Introduction • News website provide suggestion • Base on popularity, recency or editor’s pick • Sometimes on the content
Problem • E.g. News article: Barack Obama elected • Suggestion: Obama wins election Obama has four more years • Similar articles • Not what user want
Aim • Readers want relevant article • But not very similar article • Diversity is required in suggestions • E.g News article: Obama wins election Suggestion: Mitt Romney admit defeat What can Obama do in four yrs?
How to achieve diversity? • Through reading comments • Base on the characteristics of discussion • Usage of words • People participated • View on the news • Location of user
Methodology For any article a, • Find relevant articles Compute the distance between a and other article ai Find candidate set with distance < r • Compute diversity between any pair of article in candidate set Find a recommendation set by finding k articles with maximum diversity among all k-article set
Relevance • Compute by Jaccard Coefficient J(A,B) • Distance between two articles Distrel(a,ai) = 1 – J(A,B) Where A, B are features of a and ai respectively
Relevance • Features extracted by Open Calasis Open Calasis • A software by reuters in text mining • Analyse documents • Return entities of a document Rel(a,ai) = J(OC(a),OC(ai))
Diversity – Entities • Writer believe user will reveal and amplify difference through comments • Seek features of user comments • Use Open Calasis again Ddiv(ai,aj) = 1 – J(OC(ai),OC(aj))
Diversity - Sentiments • Positivity of comments help identify diversity • Compute by counting positive and negative words • Can be calculated by Euclidean distance or simply average
Diversity – User ID • User tend to read similar articles Ddiv(ai, aj)=1− J (Si , Sj) Where Si and Sjbe the set of user comment on article ai and aj respectively
Diversity – User Location • User from similar location tends to read similar article • Similar to user ID, but we use countries Ddiv(ai, aj)=1− J (Si , Sj) Where Si and Sjbe the set of commented users’ countries on article ai and aj respectively
Diversity • Diversity of every two article calculated by • The least diversity of the mentioned method • Total diversity of k articles is calculate by summing all diversity • K articles with the most diversity will be recommended
Computation Problem • Difficult to compute JaccardCoeffcient • Use Locality-Sensitive Hashing to get a approximate set of candidates • Where min-hash functions are used • Time consuming to compute diversity of all pair • Compute diversity first in each hash bucket • Choose only k most diverse in each bucket
Results • Algorithm 22 times faster than brute algorithm • 30.38 times faster than MMR, an algorithm designed in another paper • Recall is affected