80 likes | 86 Views
Learn about automatic categorization of patent applications, its benefits, and accuracy. Dive into the working mechanism of IPCCAT technology, strategies for better categorization, and measuring accuracy levels. Gain insights into the growth of patent applications worldwide and the need for efficient categorization methods.
E N D
The need for automatic categorization of patent applications The purpose of automatic categorization How does automatic categorization work? A quick word about the IPCCAT technology Measuring categorization accuracy Strategy to use an automatic categorizer IPCCAT Demo Automatic Categorizationof Patent Applications Presentation to the 3rd IPC Workshop, WIPO, Feb. 25-26, 2013
The Need for Automatic Categorization of Patent Applications • Growth in number of patent applications • 2’140’600 patent applications worldwide in 2011 • Up from 1’050’700 in 1995 (more than doubled in 16 years) • Source: WIPO IP Indicators • Large (and growing) number of IPC categories • 631 Sub-Classes in IPC 2011 • 7’392 Main Groups in IPC 2011 • Source: IPCCAT Help File
The Purpose of Automatic Categorization • Accelerate patent application processing at Patent Offices • Should not be used as a fully automated categorizer • Average error rate at 5 to 10% at Class level, up to 20% or more at Main Group level • Batch classification is possible but requires downstream elimination of predictions which are below a given confidence threshold • Rather an assistant for human examiners • Suggests most probable IPC categories • Interactions with the examiner (asking for more predictions at a different IPC level, forcing a given domain, etc.)
How Does Automatic Categorization Work? • Train an artificial intelligence program to recognize typical examples for each IPC category • Provide already-classified patents for training • Essential: Balancing the number of examples across categories • The more examples the better • Test the program • Submit patent applications whose IPC categories are already known • Calculate categorization accuracy
A Quick Word About The IPCCAT Technology • The IPCCAT project was designed, managed and financed by WIPO from 2002 to 2004 • A strictly statistical approach • No linguistic or other human-defined rules • So it is language independent • But an adaptation of the indexing method to the various languages supported (English, French, Spanish) so as to process collocations correctly • Categorization algorithm: Neural Networks of the Winnow type, improved by Simple Shift with the help of WIPO • Validated through several competitions on the Internet (latest one : the CLEF-2010 project)
Measuring Categorization Accuracy • To be really good we would only have to predict all the categories all the time! So we need two different ratios: • Precision: On all the predictions made, how many were correct • Recall: On all the correct categories which should have been predicted, how many did we actually find • The prediction accuracy is directly correlated to : • The number of categories at each IPC level • The number of available training documents for each category
Strategy To Use An Automatic Categorizer • If you don’t know which section or class is the most relevant : • Ask for a direct prediction at the finest possible level (Main Group); or • Ask for a prediction at a coarser level (Class) and refine it down to Sub-Class, then to Main Group • If you know which section or class is the most relevant : • Force a prediction under the relevant section or class (reduces the risk of error) • Refine the prediction at the next level(s)