310 likes | 472 Views
2011-9-13 组会. 报告人:徐波. 大纲. 学术 项目. 学术. Diversity. Introduction. An individual’s social ties are diverse if he or she maintains connections to different communities or groups User A is more diverse than User B. Interesting Problems. W hat is the distribution of diversity?
E N D
2011-9-13组会 报告人:徐波
大纲 • 学术 • 项目
学术 • Diversity
Introduction • An individual’s social ties are diverse if he or she maintains connections to different communities or groups • User A is more diverse than User B
Interesting Problems • What is the distribution of diversity? • Is there any correspondence between an individual’s diversity and his/her status in the social network? • Is diversity trivially correlated to the existing measures of vertex? • Are the individuals of significantly diverse social ties tend to connect to each other? • Are the individuals with diverse social ties more competitive than others
Diversity of Social Networks • Diversity and structural holes • SCN extracted from DBLP • Quantifying diversity
Diversity and structural holes • structural hole • a separation between non-redundant contacts • if links a, b, c are removed from the network shown in FIG. 1, structural holes will form among groups C1,C2,C3
SCN extracted from DBLP • Dataset • DBLP • G(V, E, ℓ) • ℓ is a labeling function that assigns a unique label of L to each vertex in V
Quantifying diversity • Global Diversity • Local Diversity • Relationship
Empirical Analysis of Diversity • Distribution • Correlation • Assortative mixing • Structural holes • Top-k Analysis
Assortative mixing • Assortativemixing by degree • large degree individuals preferentially attach to large degree individuals • Rich club • the individuals of rich connections are well connected to each other • Measure • where kiis the GD of vertex i, Nkis the number of vertex with GD as k, Ekkicounts the connections between vertex i and vertex with GD as k.
Result • The result confirms that authors of large GD tend to coauthor with each other. knn(k) generally linearly increases with the growth of GD with some exceptions when GD = 9.
Top-k Analysis • include : • some very prolific authors • some outliers
专家意见 • 论文方法要求对网络社团结构进行硬划分,即每个节点只能隶属于一个社团。考虑到社会网络中节点关系的多样性,这个约束条件太强,建议作者将上述方法推广到更具一般性的、具有重叠社团结构的社会网分析中
专家意见 • 论文指出,全局多样性度量GD可用于衡量网络节点的重要性。但从表3的多样性Top-k排序清单可以看出,网络中很多节点具有相同的GD值,该度量并不能有效地描述这些节点的重要性差异,建议作者综合GD与LD度量来实现基于节点间关系多样性的重要性排序
修改 • 不再强制让一个作者只属于一个领域,他在多少个领域发表个K(自定义)篇文章,就称该作者属于该领域。 • K取值不同,结果大不相同
Fast calculate • Community • incorrect. Some community is too large. • Graph label • The people in different community tends to have very different label • DFS • BFS • Heuristics(most neighbors already visited)
项目 • 新浪微博数据爬取
新浪API • Social Graph接口 • friends/ids 获取用户关注对象uid列表 • followers/ids 获取用户粉丝对象uid列表
接口访问频次权限 • 微博接口限制用户每个小时只能请求一定的次数。限制分用户维度和IP维度,详述如下: • 针对一个服务器IP的请求次数限制 • 普通授权: • 10000次/小时 • 中级授权: • 20000次/小时 • 针对一个用户在使用一个应用的请求次数限制 • 普通授权: • 总限制:单用户每应用 150次/小时 • 发微博:单用户每应用 30次/小时 • 发评论:单用户每应用 60次/小时 • 加关注:单用户每应用 60次/小时 200次/天 • 中级授权: • 总限制:单用户每应用 300次/小时 • 发微博:单用户每应用 60次/小时 • 发评论:单用户每应用 120次/小时 • 加关注:单用户每应用 120次/小时 400次/天