1 / 1

Work partially funded by Rhône-Alpes and Saint-Étienne Métropole. August 2012.

Combining relations and text in scientific network clustering David COMBE 1 , Christine LARGERON 1 , El ő d EGYED-ZSIGMOND 2 , Mathias GÉRY 1. 1. CNRS , UMR 5516, Laboratoire Hubert Curien, Université de Saint-Étienne, Jean-Monnet, Saint-Étienne, France

keola
Download Presentation

Work partially funded by Rhône-Alpes and Saint-Étienne Métropole. August 2012.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining relations and text in scientific network clustering David COMBE1, Christine LARGERON1, Előd EGYED-ZSIGMOND2, Mathias GÉRY1 1. CNRS, UMR 5516, Laboratoire Hubert Curien, Université de Saint-Étienne, Jean-Monnet, Saint-Étienne, France Email : {david.combe, christine.largeron, mathias.gery}@univ-st-etienne.fr 2. Université de Lyon UMR 5205 CNRS, LIRIS Email : elod.egyed-zsigmond@insa-lyon.fr • Detection of communities in social networks using the relationships between the actors and attributes describing them. • Construction of a dataset with ground truth: a scientific social network in which textual data is associated to each vertex and the classes are known. • Proposition of different scenarios for the detection of communities and evaluation of the performance. Aim of the work Problemstatement Dataset Let G = (V, E) a graph describing a social network. V is the set of the vertices and E is the set of edges such that each vertex vi V is associated with a vector of textual attributes where wij is the tf-idf weight of the term tj in the document di. The attributed graph clustering problem is to build a partition of V such that: there should be many edges within each cluster and relatively few between the clusters; two vertices belonging to the same cluster are more similar in terms of attributes, than two vertices belonging to two different clusters. • Vertices: 99 authors from 2 conferences: SAC’09 & IJCAI’09. • Edges: co-participation network generated from DBLP. • Vertices document: the abstracts and titles of the articles were extracted from the websites of the conferences. • Ground truths partitions: • PS= A  B,C  D (conferences) • PT= A,B C, D (research areas) • PTS = A,B,C,D (sessions) SAC SAC IJCAI IJCAI A A Bioinformatics Bioinformatics B C B C Robots Robots Linearcombination D D Constraints Constraints TS1: Structure-based clustering on attribute weighted graph TS3: Linearcombination TS2: Attribute-based clustering on structural distance Cosine distance matrix Vertices distance matrix Textual distance (cosine) matrix Textual distance (cosine) matrix Shortestpathdistance matrix Combined distances matrix Shortestpathprocessing Weighted-graphs clusteringalgorithm  1- Information network Graph valuedwithtextual distance Graph valuedwithtextual distance Graph valuedwithtextual distance Information network Results Conclusion and perspectives Accuracycriterion • Simplestmethodswith no parameter tend to achieve a good classification. • Scenarios considerdifferent aspects of data and givedifferentresults. • Properties of the produced classes depend on the scenario (production of disconnected classes…). • Weintend to help choosing the best clustering scenario for a givendataset. Hierarchicalagglomerative clustering Hierarchicalagglomerative clustering Workpartiallyfundedby Rhône-Alpes and Saint-Étienne Métropole. August 2012.

More Related