300 likes | 622 Views
Relational Evaluation Techniques. Daniel McEnnis. Outline. Definition Component Overview Existing Approaches Descriptions of the Components Applications and Examples. Relational Evaluation Techniques Definition.
E N D
Relational Evaluation Techniques Daniel McEnnis
Outline • Definition • Component Overview • Existing Approaches • Descriptions of the Components • Applications and Examples
Relational Evaluation Techniques Definition • Experimental setup for evaluating the performance of algorithms that use data that span more than one table or instance vector • Can use either relational algebra or hypergraph-based descriptions
Components • Data Acquisition • Ground Truth Acquisition • Cross-Validation Technique • Query Type • Scoring Metric • Significance Test
Existing Approaches • Machine Learning • Relational Machine Learning • TREC • Collaborative Filtering • ISMIR • Social Network Analysis
Machine Learning • Predetermined flat data, no sampling • Predetermined ground truth • Typically simple queries • Sophisticated cross-validation • Basic set based metrics • No significance tests
Relational Machine Learning • Predetermined relational data • Predetermined ground truth • Predefined simple query • Sophisticated cross-validation • Basic set-based metrics • No significance tests
TREC • Predetermined flat data • Sophisticated ground truth sampling. • Sophisticated queries • Machine-learning cross-validation • Ranked set-of-sets scoring • Simple significance tests
Collaborative Filtering • Predetermined flat/relational data • Predetermined ground truth • Simple, predefined query • No cross-validation • Sophisticated Scoring metrics • No significance tests
ISMIR • Sampled flat data • Predetermined ground truth • Sophisticated queries • Machine-learning cross validation • Simple set based scoring metrics • Sophisticated significance tests
Social Network Analysis • Sophisticated data sampling • Sophisticated statistical techniques
Sequences of Choices • Plug ‘n play an experiment • Different aspects are evaluated • Some algorithms simply don’t work • Extensive algorithm rewrites sometimes needed
Data Acquisition • Data structure • Where is it? • What sampling technique to use • Random Access • Snowball • Hypergraph Snowball • How much data is needed?
Ground Truth Acquisition • What is being tested? • TREC extended ground truth sampling • Structure of the output
Cross-Validation • Actor Based • Link Based • Graph Based • No Cross Validation
Graph Notation • Actor definition • Link definition • Graph definition • Database table / instance vector equivalence • Foreign key / link equivelance
Actor Cross-Validation • Traditional Machine Learning approach • Divisions by database table • Folds usually random assignment • Works well on flat data • Trouble with relational data
Link Cross Validation • Rare machine learning approach • Divisions by foreign key reference • Less statistical independence than actor • Works for collaborative filtering • Usually random assignment
Graph Cross Validation • Relational Machine Learning • Divisions by predetermined discrete graphs • Statistical independence • Non-learning based approaches • Clustering based fold generation
No Cross Validation • Standard over fitting problems • Useful after implied cross-validation
Query Type • Information Need definition • Actor based query • Set or List based query • Conditional queries
Scoring Metrics • Comparisons against ground truth • Set based metrics • Ranked based metrics • List based metrics
Set Based Metrics • Recall and Precision • F-Measure • Mean Average Performance
Ranked List Metrics • Pearson Correlation • Spearmans Correlation • Mean Absolute Error • Linear Algebra Distance Metrics • Serendipity
Ordered List Metrics • Half Life • Kendall Tau • NDPM • Sequence Alignment Algorithms • Hamming Distance
Significance Tests • Pairwise student t-test • ANOVA • ANOVA/Tukey-Kramer statistical test
Evaluation Questions • Does the data contain time (global ordered sequence) • Actor-, Link-, Graph-, or Set-based queries • List, Set, or Set-of-Lists output • Contextual question or absolute • Statistical purity versus maximum information
Music Recommendation • Example - Personalized Dynamic Tag Radio • LastFM profile data • LastFM tag data • Semantic Web data • Next-week-data ground truth • Conditional query • Graph cross-validation • Kendall Tau scoring metric • ANOVA/Tukey-Kramer statistical analysis
Conclusions • No one-size-fits-all • Data and ground-truth set the framework • Question determines the final structure • Each discipline has a piece of the answer • Graph-RAT 0.5
Future Work • Finish exploring Social Network Analysis significance tests • Fully explore set-of-sets evaluation metrics • Debugging of Graph-RAT cross-validation schedulers • Ease of use improvements to Graph-RAT