1 / 8

Name Disambiguation in Digital Libraries

Name Disambiguation in Digital Libraries. Tan Yee Fan 2005 October 19 WING Group Meeting. Digital libraries. DBLP, Citeseer, etc. Information is stored as metadata records to facilitate searching Author names Titles Publication titles Inconsistency in metadata records hinders searching

keren
Download Presentation

Name Disambiguation in Digital Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Name Disambiguation in Digital Libraries Tan Yee Fan 2005 October 19 WING Group Meeting

  2. Digital libraries • DBLP, Citeseer, etc. • Information is stored as metadata records to facilitate searching • Author names • Titles • Publication titles • Inconsistency in metadata records hinders searching • Abbreviation of names and publication titles • Typographical errors

  3. Are they the same author? • Danny Poo • Danny C. C. Poo, Teck-Kang Toh, Christopher S. G. Khoo, Glenn Hong. Development of an Intelligent Web Interface to Online Library Catalog Databases. APSEC 1999: 64-7 • Danny Chiang Choon Poo, Isaac K. C. Tan. Design of an Automatic Annotation Framework for Corporate Web Content. APSEC 2004: 384-391 • Hui Yang • Maan A. Kousa, Ahmed K. Elhakeem, Hui Yang. Performance of ATM networks under hybrid ARQ/FEC error control scheme. IEEE/ACM Trans. Netw. 7(6): 917-925 (1999) • Hui Yang, Tat-Seng Chua. QUALIFIER: Question Answering by Lexical Fabric and External Resources. EACL 2003: 363-370

  4. Who am I, I am who? • Author name disambiguation • Given a large number of citations, how to determine which name is which author? • Closely related problem: citation matching • Given a large number of citations, how to determine which citations refer to the same papers? • Solutions must be scalable • DBLP has more than 660,000 citations • Citeseer has more than 730,000 documents

  5. Ideas • Idea 1: determine the research field • Unfortunately, paper titles have limited words and some conferences tend to be broad • Idea 2: use coauthors information • Likely that an author will collaborate with a selected group of people • This group will likely publish a number of papers together • To find the similarity of coauthor lists

  6. Forward direction:M. Kan = M.-Y. Kan = Min-Yen Kan • Problem • Pairwise comparison on all the coauthor lists is very expensive (few days also cannot finish) • Solution • Soft clustering on the coauthor lists using some cheap distance measure • Then perform pairwise comparison within the clusters • What is a good soft clustering algorithm?

  7. Backward direction:This Hang Cui is not that Hang Cui • Difficult to determine using the metadata alone without external resources • Many authors have several distinct research areas • Each research area with different collaborators • Currently investigating what kind of external resource to use • Goooooooooogle for URLs?

  8. The end • But the research has just begun…

More Related