300 likes | 664 Views
How to Find APP Relationship: An Iterative Process. Ming Liu Harbin Institute of Technology School of Computer Science and Technology. Backgrounds. Plenty of apps are released to help users make the best use of their phones.
E N D
How to Find APP Relationship: An Iterative Process Ming Liu Harbin Institute of Technology School of Computer Science and Technology
Backgrounds • Plenty of apps are released to help users make the best use of their phones. • Facing to massive apps available to be used, app retrieval and app recommendation are good solutions for users to acquire their desire apps. • Recent methods are conducted mostly depending on user’s log or latent context similarity between apps. • They can only detect whether two apps are downloaded, installed meanwhile or provide similar functions or not.
APP Relationship • Apps contain deep relationship such as one app needs another app to cooperate to fulfill its work. • This relationship can’t be dug only by user’s log or latent contexts of apps. • “Hotels.com” and “alipay”. • https://play.google.com/store/apps/details?id=com.hcom.android&hl=zh_CN • https://play.google.com/store/apps/details?id=com.alipay.android.client.pad&hl=zh_CN
The Role of Reviews • Reviews contain useful information about apps, such as user’s viewpoint. • Given two apps (marked as app1 and app2), users in one review to app1 require a service which app1 can’t provide, and there is another review to app2where users state this service is provided by app2, app1 and app2are possibly relevant. This relationship isn’t similarity.
Challenging • Reviews are too short, thus, are uneasy to be full used. • Most of reviews don’t directly describe apps whereas only contain user’s viewpoint, thus, are uneasy to be used to extract attributes. An iterative process by combining review similarity and app relationship into an calculating process.
Related Work • App is just entity, and the way to calculate entity relationship can be directly used to calculate app relationship. • Dictionary based way (sometimes called as knowledge based way) relies on professional thesauruses to extract attributes to define relationship among entities (or apps). • Statistic based way (sometimes called as corpus based way) digs relationship among entities based on large-scale corpus.
Defeats • Dictionary based way • With its hierarchical structure (e.g. WordNet), one can easily tell entity relationship in terms of the position of entity. • Most of recent thesauruses don’t import apps as their terms, thus, it’s impossible to extract attributes from them to represent apps. • Statistic based way • It seldom encounters missing data issue. • It bases on contextual similarity by attributes extracted from corpus, thus, it can only detect entity similarity .
Organization • We use M to organize reviews and apps, and apply app vectors and review vectors to represent apps and reviews respectively. • M is an n*k matrix constituted by Vector Space Model. Each column in M indicates one app in APP. Each row in M indicates one review in RC. • The effective and efficient statistical metric tf-idf is adopted to form the value of each entry in these vectors.
APP Relationship Calculation • Generally speaking, the straightforward way to calculate the relationship between two apps (e.g. appp and appq), is to use their app vectors as bases, such as tfcp and tfcq denote the values of cth entry respectively in V(appp) and in V(appq).
Drawback • Previous equation bases on the idea that two apps frequently appearing in the same review are possibly similar to each other. • Consequence: many similarities are close to 0. • Reason: many apps share no common reviews even they are really similar.
Expand • Reviews often contain topic similarity. • For example, given two reviews, rci and rcj, respectively corresponding to appp and appq, if users in rci require a service which appp doesn’t provide, and users in rcj state this service is provided by appq, rci and rcj are topic similar. • It’s straightforward that, if two apps frequently appear in the topic similar reviews, these two apps are relevant.
Results • V’(appp) and V’(appq) respectively denote two app vectors after expanding by topic similarity among reviews. The cth entries in them can be calculated by
Review Similarity • To calculate previous equations, it needs to calculate topic similarity among reviews beforehand. • As the same to app relationship, vector based measurements can be directly adopted to calculate topic similarity as
Expand • As topic similarity among reviews, apps contain semantic relationship among them, which causes two reviews without sharing the common apps even present the similar topic.
Assumption • Calculations between app relationship and review similarity interact. • To calculate review similarity, it needs to calculate app relationship in advance. Given rci and rcj, to calculate Sim(rci,rcj), it needs to calculate Pgc in advance to form app relationship between appg and appc. In contrast, to calculate app relationship, it needs to calculate review similarity in advance.
Two Ways • App relationship obtained by two ways
Reasons to Observations • Our process combines app relationship and review similarity as an iterative calculating process. • App relationship can be dug from reviews and then to conduct review similarity calculation. • Review similarity can be found by the relationship among apps and then to direct app relationship calculation. • Via this iterative process, deep relationship among apps can be obtained.
Initialization • To perform our two-way-alternative process, we need to set one initial parameter (either initial app relationship R0(appp,appq) or initial review similarity Sim0(rci,rcj)). (initial parameters ) • That is to choose the measurement to calculate app relationship and review similarity via app vector and review vector. It’s just to decide how to calculate Sim(rci,rcj) and R(appp,appq). (initial measurement)
Data Sets and measurements • Miller-65, which includes 65 entity (or word) pairs selected from WordNet whose relationship values are already defined by experts. • Metrics: Pearson correlation and Spearman correlation • APP Collection, which collects 1000 apps from Google play to form one artificial testing collection. • Metric: F1
Calculating results when changing initial parameters, whereas, fixing Cosine as initial measurementMiller-65 APP Collection
Calculating results when fixing Cosine to set initial parameters, whereas, changing initial measurements to calculate vectorsMiller-65 APP Collection
Reasons • The reason why initial parameters deeply affect calculating results is that, initial parameters are the only predefined factors to affect the subsequent calculation. • The reason why initial measurements are unable to affect calculating results is that, app vector and review vector are already expanded by semantic relationship among apps and topic similarity among reviews. Therefore, different initial measurements are unable to import extra semantics, thus, they are unable to affect calculating results.
Conclusions • To acquire high-quality results, we can only focus on choosing an effective method to set initial parameters. • However, to determine which method is effective is uneasy to be fulfilled. • For this reason, we hope to acquire high-quality results even with weak initial parameters.
Two Definitions • Weak initial parameters or weak initial measurements • Effective initial parameters or effective initial measurements • Obtained by simple concurrence based methods VS obtained by compression based methods or semantic based methods. • For example: Cosine, KL, or Euclidean distance VS Word Clustering, Lexical Cohesion, LSI, PCA, or LDA.
“Booking.com” and “Alipay” • Two ways initialized by one combined method are marked as the same symbol. • With effective initial parameters, the final results are large, and, with weak initial parameters, the final results are small. • With effective initial parameters. calculating results from two ways are closer to each other than with weak initial parameters.
Observations and Conclusions • Observations: • The results with effective initial parameters are closer to the optimal results than those with weak initial parameters. • The range of the results with weak initial parameters in the beginning stages contains that with effective initial parameters. • Conclusion: • When initial parameters for two ways are both optimal, calculating results should be the same at any time. • The optimal results lie between two results respectively obtained by two ways of our two-way-alternative process.
Two Ways Combination • The results with effective initial parameters are closer to the optimal results, it is reasonable that the results from the smooth way are credible and take more effects on the combined results. • With effective initial parameters, the tracks are smooth, whereas, with weak initial parameters, the tracks are rough .
Selected Publications • Ming Liu, Chong Wu, Yuanchao Liu A Vector Reconstruction based Clustering Algorithm Particularly for Large-Scale Text Collection. Neural Networks. 2014, Accepted. (SCI) • 刘铭, 吴冲, 刘远超. 基于特征权重量化的相似度计算方法.计算机学报, 2014, Accepted. • Ming Liu, Chong Wu, Yuanchao Liu. Weight Evaluation for Features via Constrained Data-Pairs. Information Sciences. 2014, Accepted. (SCI) • Ming Liu, Yuanchao Liu, Bingquan Liu, Lei Lin. Probability based Text Clustering Algorithm by Alternately Repeating Two Operations. Journal of Information Science. 2013, 39(3): 372-383. (SCI, IDS: 149BC) • Ming Liu, Lei Lin, Lili Shan, Chengjie Sun. A Novel Self-Adaptive Clustering Algorithm for Dynamic Data. ICONIP 2012, Doha, Qatar, 2012: 42-49. • Ming Liu, Bingquan Liu, Yuanchao Liu, Chengjie Sun. Data Evolvement Analysis Based on Topology Self-Adaptive Clustering Algorithm. Information Technology and Control. 2012, 41(2): 162-172. (SCI, IDS: 967UJ) • 刘铭, 王晓龙, 刘远超. 基于词汇链的关键短语抽取方法的研究. 计算机学报. 2010, 33(7): 1246-1255. • 刘铭, 王晓龙, 刘远超. 一种大规模高维数据快速聚类算法. 自动化学报. 2009, 35(7): 859-866.
End Thank you!