1 / 1

GDM@FUDAN

GDM@FUDAN. Introduction. Empirical Study. Graph Data Management Lab, School of Computer Science http://gdm.fudan.edu.cn Fudan University, Shanghai, China. Motivations Who are the most appropriate candidates to receive a call-for-paper or call-for-participation?

gefjun
Download Presentation

GDM@FUDAN

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GDM@FUDAN Introduction Empirical Study Graph Data Management Lab, School of Computer Science http://gdm.fudan.edu.cn Fudan University, Shanghai, China • Motivations • Who are the most appropriate candidates to receive a call-for-paper or call-for-participation? • What session topics should we propose for a conference of next year? • Addressing these objectives, we study author’s topic-following behavior in Scientific Collaboration Network (SCN), i.e., an author follows others to publish papers of a given topic • Basic Idea • Scientific Collaboration Network • It is represented as • a graph where vertices • represent authors and • edges represent coauthor • relationships extracted • from DBLP dataset • It is a temporal graph • Gt, in which vertices and • edges increase as time t • elapses • Author’s topic-following behavior is the process of topic diffusion in social networks, which is driven by two typical ingredients, social influence and homophily • We try to find the variables that can precisely depict social influence and homophily in our scenario and use them to predict one author’s topic-following behavior in future • Challenges • How to distinguish social influence and homophily? • Topic definition and identification • Sample sparseness • Contributions • Uncover the effects of social influence and homophily on topic diffusion • Propose a Multiple Logistic Regression (MLR) model to predict author’s topic-following behavior • Extensive experiments prove our model’s excellent performance • Driving Forces of Topic-Following • U1: users affected by both social influence and homophily • U2: users affected only by social influence • U3: users affected only homophily • U4: users without any impact • Results: • Two forces are mixed to impact topic-following • Impacts are time-sensitive and decrease in an exponential way Which Topic will You Follow?Deqing Yang, Yanghua Xiao, Bo Xu, Hanghang Tong, Wei Wang and Sheng Huang • Social Influence • An author adopts a topic with more probability when more of his neighbors have followed the topic before • x is affected neighbor number/proportion • p(x) is the probability that an author follows the topic • It is more probable for an author to follow the topics that have been adopted by his neighbors (direct propagation) who have coauthored more papers with him Modeling Topic Diffusion in Scientific Collaboration Networks • Model • It is a two-category classification to predict whether an author will follow a given topic • Multiple Logistic Regression (MLR) model is feasible for our scenario, where the probability of topic-following is formalized as: • where xi is explanatory variable, αand βare parameters we should estimate by training • Baseline model • where a is the number of neighbors who have followed the topic • Explanatory Variables • Social Influence • An author u’s tendency to follow topic s in year t, is composed from all his neighbor v’s tendency to this topic, as well as considering their coauthor strengths • Homophily • We use topic similarity to depict the homophily among users in the context of topic-following • A 25-dim vector u represents an author’s topic history, each dimension is the number of his papers of a given topic • Then, topic similarity between user u and v can be defined as • W.r.t. those users who have followed topic s before t, i.e., we measure u’s homophily as • Then, the whole MLR model is • Y=π(x)=1, if u follows s or its related topics • Parameter Estimation • By maximum likelyhood against training set • β2 has larger Wald value than β1 indicating FTS (homophily) is more crucial to impact topic-following behavior than FSI • Model Evaluation • Metrics • Recall/sensitivity, specificity, precision, accuracy, AUC • Fβ, we set β=1.1 to favor recall a little • For topic XML • Area under ROC curve (AUC) is 0.743 vs. 0.638 • For other 4 representative topics, MLR outperforms the baseline in both accuracy and Fβ yangdeqing, shawyh@fudan.edu.cn ECML/PKDD2012, Bristol, UK

More Related