230 likes | 351 Views
Dept. of Informatics & Telecommunications University of Athens, Greece. Predicting Download Directories for Web Resources George Valkanas Dimitrios Gunopulos. 4 th International Conference on Web Intelligence, Mining and Semantics June 3, 2014. Online User Activities.
E N D
Dept. of Informatics & Telecommunications University of Athens, Greece Predicting Download Directories for Web Resources George Valkanas Dimitrios Gunopulos 4th International Conference on Web Intelligence, Mining and SemanticsJune 3, 2014
Save Link In Folder Facilitating Downloads
Save Link In Folder • Problems: • Predefined Directories • Blunt approach / No learning • UI Clutter • Tedious user management Facilitating Downloads
A principled solution Associate the navigation through the hierarchy with a cost functionOne possible c.f.: Hierarchical Navigation Cost (HNC), i.e., #clicksHNC(imgs/, docs/) = 2
Problem Definition • Given • The hierarchical structure • A target directory T, where theresource will be saved • Goal • Suggest a directory S that minimizes the cost function cf( S, T )
Problem Definition • Given • The hierarchical structure • A target directory T, where theresource will be saved • Goal • Suggest a directory S that minimizes the cost function cf( S, T ) • But if I know T, why not suggest T directly? (0 cost)
Problem Definition • Given • The hierarchical structure • A target directory T, where theresource will be saved • Goal • Suggest a directory S that minimizes the cost function cf( S, T ) • But if I know T, why not suggest T directly? (0 cost)In this setting, we don’t know T until it’s too late!
Casting to a classification framework • Directories are potential class values • T is the true target class • S is the output of a classification process • Web resource properties → classification features • Recommend S that best matches T • Use directories from past saves as candidate classes
Experimental Setup • Implement classifier as a FF plugin • DiDoCtor approach • Javascript • 1-NN classifier • 6 participants • 4-month minimum use period • Baseline • Last-by-domain (LBD), current browser approach • Simulated, based on submitted result • Metrics • Click Distance: HNC, Breadcrumbs • Classification Accuracy
Preliminary Result Analysis • Take Home Messages • Users have different saving pattern behavior(s)
Preliminary Result Analysis • Take Home Messages • Users have different saving pattern behavior(s) • Users have high variability in their accesses to each directory
Click Distance - HNC Take Home Message Significant reduction in number of clicks to reach target directory!
Click Distance - HNC Click distance gainis even higherwhen consideringa breadcrumbs UI! Take Home Message Significant reduction in number of clicks to reach target directory!
Running Accuracy Take Home Message DiDoctor is much more accurate in predicting the download directory
Basic Model Extensions • Feature reweighting • RELIEF_F
Basic Model Extensions • Feature reweighting • RELIEF_F • Suggesting k directories
Alternative classifiers • Take Home Messages • Classifiers can help! • DiDoCtor generallyperforms the best • Accuracy is affectedby user behavior!
Conclusions & Future work • Approach for facilitating downloads • Optimization problem & classification framework • Experimentation with real users • Basic model extensions • Further exploit the temporal dimension • More informative features (e.g., entities) • Automatic generation of directories
Thank you! • Questions? • Acknowledgements • To the evaluators of our plugin • Heraclitus II fellowship, THALIS-GeoComp, THALIS-DISFER, Aristeia-MMD, EU project INSIGHT