320 likes | 426 Views
PhD Research Proficiency Exam. Social Network Analysis using Link Mining. Jing Xia Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University http://www.kddresearch.org http://www.cis.ksu.edu/~xiajing. Outline.
E N D
PhD Research Proficiency Exam Social Network Analysis using Link Mining Jing Xia Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University http://www.kddresearch.org http://www.cis.ksu.edu/~xiajing
Outline • Social Network Introduction • Networks in Biological System • Mining on Social Network • Linking Mining • Multi Relational Mining • Problem Specification • Proposed approach
Social Network Introduction • What is Social Network? • a social net work is a heterogeneous and multirelational data set represented by a graph. • Characteristics of Social Network • “Natural” Networks and Universality • Quantitative measures • Mining Social Network • Link Mining: Tasks and Challenges
Society Nodes: individuals Links: social relationship(family/work/friendship/etc.) • S. Milgram (1967) “natural” network appears to be a universalSix Degrees of Separation • Society networks: Many individuals with diversesocial interactions between them. 2014年10月21日星期二 Data Mining: Concepts and Techniques
Communication • The Earth is developing an electronic system, a network with diverse nodes and links are -computers -routers -satellites -phone lines -TV cables -EM waves Communication networks: Many non-identical components with diverseconnections between them.
Epidemiology Nodes: doctors, patients, geological location Links: contact relationship(direct/indirect infectiousness)
Characteristics of Social Network • Consider many kinds of networks: • social, technological, business, economic, content,… • These networks tend to share certain informal properties: • Multi relational interaction • Temporal (time-evolving) • large scale; continual growth • distributed, organic growth: vertices “decide” who to link to • mixture of local and long-distance connections • abstract notions of distance: geographical, content, social,…
Social Network Theory • Do natural networks share more quantitative universals? • What would these “universals” be? • How can we make them precise and measure them? • How can we explain their universality? • This is the domain of social network theory • Sometimes also referred to as link analysis
Quantitative Measure • Connected components: • how many, and how large? • Networkdiameter: • maximum (worst-case) or average? • exclude infinite distances? (disconnected components) • the small-world phenomenon
Quantitative Measure • Clustering: • to what extent that links tend to cluster “locally”? • what is the balance between local and long-distance connections? • what roles do the two types of links play? • Degreedistribution: • what is the typical degree in the network? • what is the overall distribution?
Outline • Social Network Introduction • Networks in Biological System • Problem Specification • Mining on Social Network • Linking Mining • Multi Relational Mining
GENOME Protein-gene interaction PROTEOME protein-protein interactions METABOLISM Bio-chemical reactions Citrate Cycle Bio-Map
PROTEOME protein-protein interactions Protein-Protein Interaction Network
Protein-Protein Interaction Network • Nodes: proteins • Links: multi relational • physical interactions (binding) • complex membership • Pathway P. Uetz, et al.Nature 403, 623-7 (2000).
Outline • Social Network Introduction • Networks in Biological System • Mining on Social Network • Linking Mining • Multi Relational Mining • Problem Specification • Proposed approach
Link Mining • Traditional machine learning and data mining approaches assume: data is flat • Typical real data set • Instances in data set form linked networks • Link Mining • Newly emerging research area at the intersection of research in social network and link analysis, hypertext and web mining, graph mining, relational learning and inductive logic programming
Link Mining Tasks • Object-Related Tasks • Link-based object ranking • Link-based object classification • Object clustering (group detection) • Object identification (entity resolution) • Link-Related Tasks • Link prediction • Graph-Related Tasks • Subgraph discovery • Graph classification
Multi-relational Link Mining • Traditional link mining assume there is only one kind of relation in the network: link is flat • There exist multiple, heterogeneous social networks, each representing a particular kind of relationship • Multi-relational & heterogeneous
Multi-relational Network • Multi-relational& heterogeneous Network • Multiple object and link types • Example Network • Medical network: patients, doctors, disease, contacts, treatments • Bibliographic network: years, publications, authors, venues • Epidemic transmission network (involve temporal data, multi-relational: airborne, patients’ contacts
Outline • Social Network Introduction • Networks in Biological System • Mining on Social Network • Linking Mining • Multi Relational Mining • Problem Specification • Proposed approach
Problem Specification • Phenomenon: Heterogeneity & Multi-relationship existsin many real network • Rationale: it might be useful for link mining • Problem • Can weutilize multi-relationship to helplink analysis • How to extract relations as relation network (RN)? • How toidentify relationship among relation network? (co-relation, independent, etc) • Is RN time-evolving? Which relation plays an important role?
Problem Example1 • Application Domain: Epidemic Disease • Pre-condition 1: given multi relations -- patients’ contacts network in timeline • Pre-condition 2: sequential relationship among relations • Pre-condition 3: another medium of disease transmission • Problem: can we predict if any person will be infected, based on mining these multi-relational networks?
Problem Example2 • Application Domain: bibliographic network • Pre-condition 1: given multi relations – the co-author relation networks of a conference in some years • Problem 1: what is the relationship among these relation networks • Problem 2: How can we utilize the relationship to meet the user’s query Mining Hidden Community in Heterogeneous Social Networks, Deng Cai, Zheng Shao, Xiaofei He, Xifeng Yan, and Jiawei Han, March, Report No. UIUCDCS-R-2005-2538 UILU-ENG-2005-1731
Problem Example3 • Application Domain: bibliographic network • Pre-condition 1: given multi relations – the co-author networks of a conference in some years • Pre-condition 2: topics of publications • Problem: Can we predict if two researchers will be co-author in the future, based on two types of networks?
Outline • Social Network Introduction • Networks in Biological System • Mining on Social Network • Linking Mining • Multi Relational Mining • Problem Specification • Proposed approach
10 0.03 0.04 9 10 9 12 0.10 2 12 2 8 1 0.08 0.02 0.13 11 3 8 1 0.13 11 3 0.04 4 4 6 5 0.05 6 5 0.13 7 7 0.05 Proposed approach • Random walk with restart Nearby nodes, higher scores More red, more relevant
Proposed approach • Basic idea • RWR serves as a measure for proximity between two nodes in network • Model relationship among multi relations using RWR • Purpose • Facilitate mining more interesting patterns • Increase prediction accuracy
Measure Relationship Q: what is most related conference to ICDM A: RWR! Neighborhood Formulation [Sun ICDM2005]
Multi-Relational Model KDD author network ICDM author network relation network ICML author network PKDD author network
Other Applications • Content-based Image Retrieval [He] • Personalized PageRank [Jeh], [Widom], [Haveliwala] • Anomaly Detection (for node; link) [Sun] • Link Prediction [Getoor], [Jensen] • Semi-supervised Learning [Zhu], [Zhou] • …
Summary • Social Network Analysis • Linking mining • Problem: multi relational • Proposed approach