80 likes | 248 Views
The need for automatic categorization of patent applications The purpose of automatic categorization How does automatic categorization work? A quick word about the IPCCAT technology Measuring categorization accuracy Strategy to use an automatic categorizer IPCCAT Demo.
E N D
The need for automatic categorization of patent applications The purpose of automatic categorization How does automatic categorization work? A quick word about the IPCCAT technology Measuring categorization accuracy Strategy to use an automatic categorizer IPCCAT Demo Automatic Categorizationof Patent Applications Presentation to the 3rd IPC Workshop, WIPO, Feb. 25-26, 2013
The Need for Automatic Categorization of Patent Applications • Growth in number of patent applications • 2’140’600 patent applications worldwide in 2011 • Up from 1’050’700 in 1995 (more than doubled in 16 years) • Source: WIPO IP Indicators • Large (and growing) number of IPC categories • 631 Sub-Classes in IPC 2011 • 7’392 Main Groups in IPC 2011 • Source: IPCCAT Help File
The Purpose of Automatic Categorization • Accelerate patent application processing at Patent Offices • Should not be used as a fully automated categorizer • Average error rate at 5 to 10% at Class level, up to 20% or more at Main Group level • Batch classification is possible but requires downstream elimination of predictions which are below a given confidence threshold • Rather an assistant for human examiners • Suggests most probable IPC categories • Interactions with the examiner (asking for more predictions at a different IPC level, forcing a given domain, etc.)
How Does Automatic Categorization Work? • Train an artificial intelligence program to recognize typical examples for each IPC category • Provide already-classified patents for training • Essential: Balancing the number of examples across categories • The more examples the better • Test the program • Submit patent applications whose IPC categories are already known • Calculate categorization accuracy
A Quick Word About The IPCCAT Technology • The IPCCAT project was designed, managed and financed by WIPO from 2002 to 2004 • A strictly statistical approach • No linguistic or other human-defined rules • So it is language independent • But an adaptation of the indexing method to the various languages supported (English, French, Spanish) so as to process collocations correctly • Categorization algorithm: Neural Networks of the Winnow type, improved by Simple Shift with the help of WIPO • Validated through several competitions on the Internet (latest one : the CLEF-2010 project)
Measuring Categorization Accuracy • To be really good we would only have to predict all the categories all the time! So we need two different ratios: • Precision: On all the predictions made, how many were correct • Recall: On all the correct categories which should have been predicted, how many did we actually find • The prediction accuracy is directly correlated to : • The number of categories at each IPC level • The number of available training documents for each category
Strategy To Use An Automatic Categorizer • If you don’t know which section or class is the most relevant : • Ask for a direct prediction at the finest possible level (Main Group); or • Ask for a prediction at a coarser level (Class) and refine it down to Sub-Class, then to Main Group • If you know which section or class is the most relevant : • Force a prediction under the relevant section or class (reduces the risk of error) • Refine the prediction at the next level(s)