1 / 19

Improving the performance of personal name disambiguation using web directories

Improving the performance of personal name disambiguation using web directories ... Using web directories as a knowledge base to ?nd common contexts by TF-IDF in documents. ...

RoyLauris
Download Presentation

Improving the performance of personal name disambiguation using web directories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1

    Slide 1:Improving the performance of personal name disambiguation using web directories

    Quang Minh Vu, Atsuhiro Takasu, Jun Adachi IPM, 2008 Presented by Hung-Yi Cai 2010/09/01

    2

    Slide 2:Outlines

    Motivation Objectives Methodology Experiments Conclusions Comments

    Slide 3:Motivation

    Searching for information about a person on the internet is an increasing requirement in information retrieval. Search results returned from search engines for a personal name query often contain documents relevant to several people because a name is usually shared by several people. Due to this name ambiguity problem, users have to manually investigate the result documents to ?lter out people in whom they have no interest. 3

    Slide 4:Previous Studies

    4

    Slide 5:Objectives

    Propose Similarity via Knowledge Base (SKB) that uses web directories to improve the disambiguating performance in Name Disambiguation System (NDS). SKB can be divided into two components: Using web directories as a knowledge base to ?nd common contexts by TF-IDF in documents. Then, using the common contexts measure to determine document similarities. 5

    Slide 6:TF-IDF

    6 Term weights are calculated using the terms occurrences in the document concerned and in a set of documents. Tf (t, doc) is the number of times term t appears in the document doc.

    Slide 7:Methodology

    In SKB, using web directories to measures features of terms in a document. Measurement of term weights using a knowledge base A knowledge base Modification of term weight in documents Modification of term weight in directories Measurement of document similarities Find directories close in topic with the document Measure document similarities 7

    Slide 8:Name Disambiguation System

    The operational details are as follows: Preprocessing documents Calculation of document similarities Discrimination by reranking documents 8

    Slide 9:Experiments

    Step 1. Data Sets Documents of people Creation of pseudo namesake document sets and real namesake document sets 9

    Slide 10:Experiments

    Step 2. Web directory structures 10

    Slide 11:Experiments

    Step 3. Baseline methods Comparing SKB with two conventional methods: VSM: Calculating the weight of these terms by TF-IDF Building the feature vectors of documents NER: Extracting the entity names in the documents by LingPipe software Using these names to construct feature vectors of the documents (the constituents of vectors were binary values) 11

    Slide 12:Experiments

    Step 4. Evaluation metrics We recorded the precision values at 11 recall points: 0%, 10%, ... ,90%, and 100% and denoted these as P(doci, 0%), P(doci, 10%), ... , P(doci, 90%) and P(doci, 100%), respectively. 12

    Slide 13:Step 5. Experimental results The overall performance for each method In this experiment, we set the window size n = 50 and the number of representative directories k = 20. We set the frequency document ratio threshold for SKB2 r = 5.

    Experiments 13

    Slide 14:Experiments

    Step 5. Experimental results Performance of SKB2 when varying the frequency ratio threshold 14

    Slide 15:Experiments

    Step 5. Experimental results Performance of SKB systems when varying the window size 15

    Slide 16:Experiments

    Step 5. Experimental results Performance of SKBs when varying the number of representative directories 16

    Slide 17:Experiments

    Step 5. Experimental results Performance for each method on real namesake document sets 17

    18

    Slide 18:Conclusions

    Disambiguation of people will be a trend in web search, and we propose a new method that uses web directories as a knowledge base to improve the disambiguation performance. The experimental results showed a significant improvement with our system over the other methods, and we also veri?ed the robustness of our methods experimentally with di?erent web directory structures and with di?erent parameter values.

    19

    Slide 19:Comments

    Advantages Just requiring little preparation Broad range of people Shortages Cost of computation is proportional Some mistake Applications Information retrieval

More Related