240 likes | 355 Views
Applying Key Phrase Extraction to aid Invalidity Search. Manisha Verma , Vasudeva Varma SIEL, LTRC, IIIT Hyderabad. Outline. Introduction Related Work Motivation and Contribution Approaches Experiments and Results Future Work Questions ???. INTRODUCTION. Invalidity Search.
E N D
Applying Key Phrase Extraction to aid Invalidity Search ManishaVerma, VasudevaVarma SIEL, LTRC, IIIT Hyderabad
Outline • Introduction • Related Work • Motivation and Contribution • Approaches • Experiments and Results • Future Work • Questions ???
Invalidity Search • The task is to uncover patents or other published prior art that may render a granted patent invalid • Find prior art that the patent examiner overlooked so that a patent can be declared invalid.
Input and Process • INPUT • It’s a patent application • PROCESS • Use existing search engines to find similar work. • MANUALLY create queries, go through several documents – articles, granted patents etc and find similar documents.
Related Work Two ways of approaching the problem • Create a query from a patent and try different retrieval models • Use different models to create a query from a patent then use an existing retrieval model. Our work employs the second approach.
Approach 1 • Use claim text or abstract to create a query from the patent. • Following have been used to improve Recall and Precision • Re-ranking using several features • Cluster based Pseudo Relevance Feedback • Scoring based on subtopics etc.
Approach 2 • Select words/phrases from different sections in a patent • Find out which section results in best queries • Select words using tf-idf from a patent. • Assign weight to each word to mark its importance. • Common weighing methods explored are tf,and tf-idf • Identify the optimal length of the query i.e. number of words to keep in a query generated from a patent. • Empirically determine the value.
Motivation and Contribution • Explore and evaluate different ways to select phrases to make queries for patents. • Though several key phrase extraction approaches have been proposed in the literature, they have not been used to create queries for invalidity search task. Evaluate and analyze the performance of queries created by using state-of-the-art unsupervised and supervised key phrase extraction techniques.
Key Phrase Extraction Techniques • Unsupervised • TextRank (R. Mihalcea et al.) • SingleRank (X. Wan et al.) • Tf-Idf • Tf • Supervised • RankPhrase (X. Jiang et al.) • KEA (I. H.Witten et al.)
Unsupervised Approaches • TextRank • Present text as graph using co-occurrence statistics • Run iterative algorithm to find dominant nodes (words) in graph.. • SingleRank • Same approach as TextRank • While in TextRank phrases containing the top-ranked words are selected, in SingleRank, we do not filter out any low scoring words.
Supervised Approaches • KEA • Use features to represent key phrases. • Use a classifier to train on manually annotated data. • RankPhrase • Treat key phrase extraction as ranking problem • Same features from KEA have been used
Training Supervised Approaches ??? • To annotate patents with key phrases, take some applications with relevance judgments. For every phrase in the document • Fire it as a query. • Calculate MAP and Recall of that phrase (using the relevance judgments) • Select phrases with high Map and Recall • Prune phrases based on tf-idf scores • Use these phrases for the document. • Use some sample documents annotated using this approach to train the supervised approach.
Our DATA • 1.3 million patents (NTCIR) • 1000 patent applications • For each application, a list of patents which claim same invention is provided.
Results • The experiments indicate that key phrase extraction techniques indeed improve invalidity search results. • Queries created by using unsupervised and supervised approaches perform better than those formed by tf or tf-idf. • In supervised approaches, queries created by using phrases extracted by KEA show 29% and 37% improvement in MAP over TextRank and tf-idf respectively.
Future Work • Weigh queries generated by using both the approaches • Try the approaches on different patent collections • Explore combination of the two approaches for query construction
References • X. Xue and W. B. Croft. Automatic query generation for patent search. In CIKM '09: Proceeding of the 18th ACM conference on Information and knowledge management, pages 2037–2040, NY, USA, 2009. ACM. • R. Mihalcea and P. Tarau. TextRank: Bringing order into texts. In Proc. of EMNLP, 2004. • X. Xue and W. B. Croft. Transforming patents into prior-art queries. In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 808–809, NY, USA, 2009. ACM. • X. Jiang, Y. Hu, and H. Li. A ranking approach to key phrase extraction. In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 756–757, NY, USA, 2009. ACM.