510 likes | 524 Views
This article discusses mobile search using message-based, browser-based, and Java application methods. It also explores unique features, challenges, and scalability issues in mobile search.
E N D
Data Mining Meets Mobile Search Wen-Chih Peng (彭文志) Dept. of Computer Science National Chiao Tung University
What’s Mobile Search • Doing search via mobile devices • Types: • Message-based • Browser-based • Java application
Unique Features in Mobile Search • Mobile devices • Personal devices • Wireless networks • Positioning enabled • Services in mobile search • Local search • Static: Nearby entities (more vertical search) • Dynamic: Live traffic or find your buddy • Entertainments: • Video, images
Mobile Search@NCTU LBS recommendation list based on Blog or Web 2.0 sites Mobile/PDA phones with wireless interfaces and GPS
Technology Highlight • Positioning • Large-Scale Wi-Fi issue • Data content • Static data: Web Puzzle problem • Dynamic data: Live traffic data • Community structures in Web 2.0 • Behavior analysis for intelligent UI
Positioning Joint work with Prof. Y.-C. Tseng
Challenges • Provide Wi-Fi positioning techniques • A large-scale pattern-matching mechanism • Training phase: collect thousands or millions of training data • Positioning phase: quickly estimate a location according to a huge location database
1 Location Database <xn,yn> 1 i <x1, y1> i <xi,yi> s <x2,y2> <x2,y2> s <x, y> Pattern-Matching Mechanism training data <x1, y1>1 <x2, y2>2 . . . <xn, yn>n <x1, y1>1 <x2, y2>2 . . . <xn, yn>n [ 1,1, 1.2, 1.3, 1.4 ] Training Phase Positioning Phase Pattern-Matching Localization Algorithm access point (AP) [s1, s2, s3,s4] real-time data
Longer system setup time Longer system response time The Scalability Problem • Huge calibration efforts in the training phase • High computation cost in the positioning phase
The Scalability Problem • Reduction of computation cost incurred in the positioning phase • Apply clustering technique to fragment database into a number of clusters
Computation Cost Comparison Typical Pattern-matching Searching Space Cluster-Based size of location database number of clusters
appears at <x1,y1> Cluster C2 Real-time received signal strengths s 2 RSS of AP 2 1 C1 1 If s is in the shaded region 3 2 C2 3 C3 RSS of AP 1 Cluster C1 Cluster C3 False Cluster Selection Cluster C2 1,1: received signal of AP 1 at <x1,y1> 1,2: received signal of AP 2 at <x1,y1> <x1, y1>(1,1, 1,2) The region that the signal may fluctuate Cluster C3 The cluster that contains the true location C2 ≠ False cluster selection occurs!!! Considering 2 APs in the environment
Pieces of Le Bouquet Cake House • Name: Le Bouquet 繽紛蛋糕房 • Address: 台北市中山北路二段63號 • Tel: 21002856 • Reviews: {summary from source articles}
Goal • Problem: • Unstructured information of structured objects are distributed in WWW • Unclear / vagued / homogeneous pieces from heterogeneous sources construct an object • Goal: • Unstructured sources --> structured view • New object instance discovery • Applications • Information portals for various domains • GeoGuider: portal for GeoObjects
Web Puzzle Problem Given The annotated corpus Input Keyword: describe the conceptually tuple space (optional) Entity: the tuple scheme e.g Computer Science #Person #Email #Phone Output Ranked entity tuples
Keyword Search Data Objects Entity Vector Object Composition Entity Index Binary Relation EntityAnnotation
Binary Relation Counting a -> b1: 10 times; a -> b2: 3 times P(b1 | a) > P(b2 | a) Context and text proximity Customized functions
Search Associate keywords to object entities P(#entity | keywords) Rank data objects by keywords Similarity(tuple_keyword, tuple_object)
Prototype Platform in NCTU CS 2.7T disk space; 40 cores; 40G ram Distributed File System Map/Reduce enabled (supported by Google/Yahoo)
f1 f2 ☆ Nop ☆ . . . f202 Example : K+
b1 .7 b3 Vf Vc .3 .5 b2 b4 .5 Definition • Popular co-cited community (PCC) • G=(Vc∪Vf, E) is a PCC if there exists a partition s.t. • Co-cited : ni in Vf is fully connected to nj in Vc and • Popular : | Vf | > min_sup • E.g., • Core member : {b3, b4} • Followers : {b1, b2}
f1 f2 ☆ Nop ☆ ☆ Nop ☆ f2 . . . fi fi Sandy Sandy . . . 小昕昕 小昕昕 fn fn Example : K+ • Mining PCC
f2 f10 N3 N1 f6 f200 .6 N2 N4 ☆ Nop ☆ f214 f9 f2 .4 G(1) = fi Sandy 小昕昕 fn Example : K+ • Mining TPCC G(0)2 .5 G(0)1 .5 .3 .7 G(0)3
Motivation • Sharing GPS data • Cars with GPS and 3G mobile phones • Spatial-temporal databases • Mining traffic patterns
Joint Clustering in Data Streams • Highway traffic database
Joint Clustering in Data Streams • Motivation • Discover useful clusters for sensor data management • Input • Highway traffic database • Parameters • window size w, range r, average speed error ε • Output • For each window, we generate clusters • Cluster: a set of r-connectedsensors
Traffic Estimation Problem • Input • A traffic database • Query (road segment, time) • Output • A speed of the query road segment Query: (e,T4)
Temporal Feature • Neighboring time slots
h i v u p q l m t a d f w b e g x r n o k j s Spatial Feature • Nearby road segments • Similar road types
Why Prediction? • Inconveniences of handheld devices • Humble keyboard • Keyword prediction • Recommendation • Small screen • Search result ranking • Segment prediction
Multi-Domain Sequence • User behavior of handheld devices • Location (moving patterns) • Searching (keywords) • Payment (transactions) • Integrate multiple user behaviors • MDS: Multi-domain sequences • More informative than a single domain sequence
Challenges • Each domain has its own sequence database • Performing join operation across sequence databases is costly Join
PropagatedMine • Perform sequential pattern mining in the first starting domain • Then further propagate the mining results to other domains propagate propagate propagate propagate Dn D3 D1 D2 Sequential patternmining Propagated Table Propagated Table Propagated Table Multi-domain sequential patterns Multi-domain sequential patterns Multi-domain sequential patterns Sequential patterns
Conclusions • Data mining helps the growth of mobile search • Positioning • Web Puzzling • Community structures • CarWeb • User behavior analysis for intelligent UI • Built a mobile search prototype
Selected References • C.-Y. Lin, W.-C. Peng, and Y.-C. Tseng, ``Efficient In-Network Moving Object Tracking in Wireless Sensor Networks," IEEE Trans. on Mobile Computing, Vol. 5, No. 8, pp. 1044-1056, August 2006 • L.-Y. Wei and W.-C. Peng, ``Clustering Data Streams in Optimization and Geography Domains," Proceedingds of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2009), Bangkok, Thailand, April 27-30, 2009. • C.-H. Lo, W.-C. Peng and M.-F. Chiang, ``Ranking Web Pages from User Perspectives of Social Bookmarking Sites," Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence, Sydney, Australia, Dec. 9-12, 2008. • C.-H. Lo and W.-C. Peng, ``Efficient Joint Clustering Algorithms in Optimization and Geography Domains," Proceedingds of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2008), Osaka, Japan, May 20-23, 2008. • C.-H. Lo, W.-C. Peng, C.-W. Chen, T.-Y. Lin, and C.-S. Lin, ``CarWeb: A Traffic Data Collection Platform," Proceedings of the 9th International Conference on Mobile Data Management, April 27-30, Beijing, China, 2008. • M.-F. Chiang, W.-C. Peng and C.-H. Lo, "Discovering Popular Co-Cited Communities in Blogspaces," Proceedings of the first IEEE International Workshop on Data Engineering for Blogs, Social Media, and Web 2.0, (In conjunction with IEEE International Conference on Data Engineering), Cancun, Mexico, April 12, 2008. • S.-P. Kuo, B.-J. Wu, W.-C. Peng, and Y.-C. Tseng, ``Cluster-Enhanced Techniques for Pattern-Matching Localization Systems," Proceedings of the 4th IEEE International Conference on Mobile Ad-hoc and Sensor Systems (MASS 2007), Pisa, Italy, Oct. 8-11, 2007. • Z.-X. Liao and W.-C. Peng, ``Exploring Lattice Structures in Mining Multi-Domain Sequential Patterns," Proceedings of the Second International Conference on Scalable Information Systems (InfoScale), Suzhou, China, June 6-8, 2007.