290 likes | 916 Views
POLYPHONET: An Advanced Social Network Extraction System from the Web. Yutaka Matsuo Junichiro Mori Masahiro Hamasaki National Institute of Advanced University of Tokyo National Institute of Advanced
E N D
POLYPHONET: An Advanced Social NetworkExtraction System from the Web Yutaka Matsuo Junichiro Mori Masahiro Hamasaki National Institute of Advanced University of Tokyo National Institute of Advanced Industrial Science and Hongo 7-3-1, Tokyo 113-8656 Industrial Science and Technology Japan Technology y.matsuo@aist.go.jp jmori@mi.ci.i.u-tokyo.ac.jp hamasaki@ni.aist.go.jp (WWW2006) Finding Social Network for Trust Calculation (ECAI 2004) Yutaka Matsuo, Hironori Tomobe, Koiti Hasida and Mitsuru Ishizuka
ABSTRACT • Social networks in Semantic Web: • Knowledge management, • Information retrieval, • Ubiquitous computing.. • POLYPHONET: • Extract relations of persons • Detect groups of persons • Obtain keywords for a person.
Introduction and Related work – 1/3 • Social Network : • “Please indicate which persons you would regard as your friend.” • Social networking services (SNSs) • Friendster : http://www.friendster.com/ • Orkut : http://www.orkut.com/ • Imeem : http://www.imeem.com/ • 3600 : http://360.yahoo.com/ • Web of trust • Ontology construction
Introduction and Related work – 2/3 • Referral Web (1995): • social network extraction system from the Web • Two person X and Y by putting a query “X and Y” to a search engine. • Flink : • online social networks for a Semantic Web community • Given a set of names as input, the component uses a search engine to obtain hit counts
Introduction and Related work – 3/3 • Name disambiguation probability model • Co-occurrence information • provided by a search engine • to detect the proof of relations • Google-Hacks [book] • PageRank, HITS • Web graphs • Link structure of Web pages is seen as a social network.
Social Network Extraction – 1/4 • Nodes and Edges • Nodes: a list of persons is given beforehand • JSAI2003,JSAI2004,JSAI2005 and UbiComp2005 • Edges between of nodes are added using a search engine. • Co-occurrence • matching coefficient, nX^Y • mutual information, log(nX^Y /nXnY ) • Dice coefficient, (2nX^Y )/(nX+ nY) • Jaccard coefficient,(nX^Y /nXvY) • overlap coefficient, (nX^Y / min(nX, nY))[ECAI 2004] • cosine, (nX^Y / )
Advanced Extraction • Relationship: • Relationships between people • 30 kinds of relationships • http://vocab.org/relationship • POLYPHONET • Co-author: co-authors of a technical paper • Lab: members of the same laboratory or research institute • Proj: members of the same project or committee • Conf: participants in the same conference or workshop
Advanced Extraction - Class of Relation 1/2 • GoogleTop(“X Y”,5) • C4.5 • Five-fold cross validation (JSAI Case) High tf-idf terms manually categorize data set.
Advanced Extraction – Scalability 1/3 • For example - • The network density of the JSAI2003 social network is 0.0196 with o.2 threshold.
Advanced Extraction – Intellectual link 1/6 • Intellectual link : • A relation between a pair of persons with similar interests or citations • Evaluation : • They plot the probability that the two persons will attend the same session at a JSAI conference. • Idea : • If two persons are researchers of very similar topics, the distribution of word co-occurrences will be similar.
Advanced Extraction – Intellectual link 2/6 • Keyword extraction Termex [37]
Advanced Extraction – Intellectual link3/6 • Keyword extraction • 567 researchers with 3981 pages • They gave questionnaires to 10 researchers and defined the correct set of keywords.
Advanced Extraction – Intellectual link 4/6 • X2 • idf • hit
Conclusion • This paper describes a social network mining approach using the Web and organize those methods into small pseudocodes. • New aspects of social networks are investigated: classes of relations, scalability, and a person-word matrix. • This paper implemented every algorithm on POLYPHONET.