340 likes | 490 Views
Hidden Concept Detection in Graph-Based Ranking Algorithm for Personalized Recommendation. Nan Li Computer Science Department Carnegie Mellon University. Introduction. Previous work: Represents past user behavior through a relational graph.
E N D
Hidden Concept Detection in Graph-Based Ranking Algorithm for Personalized Recommendation Nan Li Computer Science Department Carnegie Mellon University
Introduction • Previous work: • Represents past user behavior through a relational graph. • Fail to represent individual differences among items of a same type. • Our work: • Detect hidden concepts embedded in the original graph • Build a two-level type hierarchy for explicit representation of item characteristics.
Relational Retrieval • Entity-Relation Graph G=(E, T, R): • Entity set E={e} Entity types set T={T} Entity relations R={R} • Each entity e in E has its type e.T . • Each relation R has two entity types R.T1 and R.T2. If two entities has relation R, then R(e1, e2) = 1, o/w 0. • Relational Retrieval Task: Query q = (Eq, Tq) • Given Eq = {e’}, predict the relevance of each entity e of the target type Tq.
Path Ranking Algorithm • Relational Path: • P = (R1, R2, …, Rn) R1.T1=T0 and Ri.T2=Ri+1.T1. • Relational Path Probability Distribution: • The probability corresponds to the probability of a path random walker reaching that entity from a query entity.
PRA Model • (G, l, θ) • The feature matrix A has its each column to be the distribution hp(e). • The scoring function:
Training PRA Model • Training data: D = {(q(m),y(m))}, ye(m)=1 if e is relevant to the query q(m) • Parameter: The weight of path θ • Objective function:
Hidden Concept Detector (HCD) Find hidden subtype of relations • Two-Layer PRA paper gene author paper gene author title journal year title journal year
Bottom-Up HCD • Bottom-Up merging algorithm: • For each relation type Ri • Step1: Divide every starting node of relation Ri as a subrelationRij. • Step2: HAC: Each time merge two subrelationsRim and Rin to maximize the gain of objective functions until no positive gain: paper author paper author paper author author paper
Approximate the Gain of Objective Function • Calculate the maximum gain of two relations: gm and gn • Use taylor series to approximate:
Experimental Results • Data Sets: • Saccharomyces Genome Database, a publication data set about the yeast organism Saccharomycescerevisiae • Three measurements: • Mean Reciprocal Rank (MRR): inverse of the rank of the first correct answer • Mean Average Precision (MAP): the area under the Precision-Recall curve • p@K: precision at K, where K is the actually number of relevant entities.
Normalized Cut • Training data: • Number of clusters ↑ Recommendation quality↑ • Test data: • NCut outperforms random
HCD • Training data: • HCD outperforms PRA in all three measurements • Test data: • Two systems perform equally well
Future Work • Bottom-Up vs Top Down • Improve Efficiency • Type Recovery in Non-Labeled Graph
Building an intelligent agent that simulates human-level learning using machine learning techniques A Computational Model of Accelerated Future Learning through Feature Recognition Nan Li Computer Science Department Carnegie Mellon University
Accelerated Future Learning • Accelerated Future Learning • Learning more effectively because of prior learning • Has been observed a lot • How? • Expert vs Novice • Expert Deep functional feature (e.g. -3x -3) • Novice Shallow perceptual feature (e.g. -3x 3)
A Computational Model • Model Accelerated Future Learning • Use Machine Learning Techniques • Acquire Deep Feature • Integrated into a Machine-Learning Agent
Feature Recognition asPCFG Induction • Under lying structure in the problem Grammar • Feature Intermediate symbol in a grammar rule • Feature learning task Grammar induction • Error Incorrect parsing
Problem Statement • Input is a set of feature recognition records consisting of • An original problem (e.g. -3x) • The feature to be recognized (e.g. -3 in -3x) • Output • A PCFG • An intermediate symbol in a grammar rule
Accelerated Future Learning through Feature Recognition • Extended a PCFG Learning Algorithm (Li et al., 2009) • Feature Learning • Stronger Prior Knowledge: • Transfer Learning Using Prior Knowledge • Better Learning Strategy: • Effective Learning Using Bracketing Constraint
A Two-Step Algorithm • Greedy Structure Hypothesizer: • Hypothesizes the schema structure • Viterbi Training Phase: • Refines schema probabilities • Removes redundant schemas Generalizes Inside-Outside Algorithm (Lary & Young, 1990)
Greedy Structure Hypothesizer • Structure learning • Bottom-up • Prefer recursive to non-recursive
EM Phase • Step One: • Plan parse tree computation • Most probable parse tree • Step Two: • Selection probabilities update • s: aip, ajak
Feature Learning • Build Most Probable Parse Trees • For all observation sequences • Select an Intermediate Symbol that • Matches the most training records as the target feature
Transfer Learning Using Prior Knowledge • GSH Phase: • Build parse trees based on previously acquired grammar • Then call the original GSH • Viterbi Training: • Add rule frequency in previous task to the current task 0.5 0.33 0.5 0.66
Effective Learning Using Bracketing Constraint • Force to generate a feature symbol • Learn a subgrammar for feature • Learn a grammar for whole trace • Combine two grammars
Experiment Result in Algebra • Both stronger prior knowledge and a better learning strategy can yield accelerated future learning • Strong prior knowledge produces faster learning outcomes • L00 generated human-like errors Fig.3. Curriculum two Fig.2. Curriculum one Fig.4. Curriculum three
Learning Speed inSynthetic Domains • Both stronger prior knowledge and a better learning strategy yield faster learning • Strong prior knowledge produces faster learning outcomes with small amount of training data, but not with large amount of data • Learning with subtask transfer shows larger difference, 1) training process; 2) low level symbols
Score with Increasing Domain Sizes • The base learner, L00, shows the fastest drop • Average time spent per training record • Less than 1 millisecond except for L10 (266 milliseconds) • L10: Need to maintain previous knowledge, does not separate trace into small traces • Conciseness: Transfer learning doubled the size of the schema.
Integrating Accelerated Future Learning in SimStudent • A machine-learning agent that • Acquires production rules from • Examples and problem solving experience • Integrate the acquired grammar into production rules • Requires weak operators (non-domain specific knowledge) • Less number of operators x+5
Concluding Remarks • Presented a computational model of human learning that yields accelerated future learning. • Showed • Both stronger prior knowledge and a better learning strategy improve learning efficiency. • Stronger prior knowledge produced faster learning outcomes than a better learning strategy. • Some model generated human-like errors, while others did no make any mistake.