1 / 15

Report on IPCCAT-Neural: Text Categorization in the IPC

This report discusses the implementation and advancements of the IPCCAT-Neural system, which allows for automatic prediction of IPC symbols based on text input. It covers the training collection, IPC coverage, precision, and potential future developments.

sloretta
Download Presentation

Report on IPCCAT-Neural: Text Categorization in the IPC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 9.a Report on IPC-related IT systems IPC Committee of Experts 50 Patrick Fiévet Head of IT Systems Section International Classifications and Standards Division Geneva February 08, 2018

  2. Agenda • Artificial Intelligence: IPC Text categorization in the IPC i.e. IPCCAT-Neural • What is it / what is new ? • Demonstration • What it could be for • What comes next in the short term • What comes next in the longer term

  3. IPCCAT-neural text categorization in the IPC • What is it about? • Automatic Prediction (guess) of the most appropriate IPC symbols on the basis on a text input (e.g. patent abstract) i.e. 3 guesses among N categories with an associated level of confidence in this prediction • Implementation based on several neural networks • IP knowledge added value and technology used in the processing of the training collection is as important as the technology used in the classifier

  4. IPCCAT-neural text categorization in the IPC at subgroup level ! • IPCCAT neural 2016 at IPC main group level : • Number of categories: 7,374 • Precision (three guesses): 80% • Number of Neural networks: ~700 • IPCCAT neural 2018 at Subgroup level: • Number of categories: 72,137 • Precision (three guesses) based on 1.5 million of test cases: 82% • Number of Neural networks: ~8,000

  5. IPCCAT-neural text categorization in the IPC at subgroup level • Why was It actually doable? • Recent evolution of the IPCCAT classifier (available on-demand as open source by the Olanto foundation) • Added value in data processing: • Training based on patent documents computed from DOCDB XML excerpts • Computation of both IPC and CPC classifications • Progress in computing power opens new R&D horizons e.g. GPU, text processing,…

  6. Evolution of IPCCAT R&D over years 2018: IPC Group level ~73,000 categories 2003-2008: IPC Main Group level (~7,000 categories) 2017

  7. IPCCAT-neural 2018: text categorization in the IPC at subgroup level Training collection, IPC coverage and precision: • Training collection: 27.7 million in EN and 4.4 in FR • Coverage of the IPC (using IPC and CPC through concordance): • 99% at subgroup level (EN) • 91% at subgroup level (FR) • Precision (three guesses): • 82.5 % at subgroup level (EN) !! • 72% at subgroup level (FR)

  8. IPCCAT-neural 2018: text categorization in the IPC at subgroup level Training collection, IPC coverage and precision: • Side-effects of n-gram improvements on precision at IPC main Group level (three guesses): • 89 %at Main Group level (EN) • 83% at Main Grouplevel (FR)

  9. IPCCAT-neural text categorization in the IPC at subgroup level • Demonstration • http://icwscommonacc.wipo.int/classifications/ipc/ipcpub/?searchmode=ipccat

  10. Artificial Intelligence / IPCCAT-neural: on the way to assist IPC reclassification • Chronology: (Still a long way to go) • Evidence that text categorization works at IPC subgroup level with acceptable precision: Done • Integration of IPCCAT neural at sub-group level into IPCPUB v 7.5 (February 2018) • Confirmation that Cross-lingual text categorization can assist in other languages than EN, even in absence of large training collections: to be prototyped based on a commercial CAT tool and limited testing (for costs containment reasons)

  11. Artificial Intelligence / IPCCAT-neural: on the way to assist IPC reclassification • Chronology: (Still a long way to go) • Incentives for R&D in automated text categorization: WIPO DELTA training collection (Bilateral discussion EPO-WIPO in progress) Q2 2018? • Propose alternatives to Default Transfer e.g. more than one symbol based on IPCCAT guesses and confidence levels • CE Decisions, WIPO resource planning, etc… (2019) • Developmentof the production-scale solution integrating neural cross-lingual text categorization (based on IPCCAT neural and WIPO translate ?) (202x) • Integration into IPCWLMS for Stage 3 reclassification (202x)

  12. Incentive to R&D in text categorization: WIPO-Alpha training collection

  13. Incentive to R&D in text categorization: WIPO-Delta training collection • Short term perspective: • Further AI incentives for research and development institutes interested in automatic text categorization e.g. in patent classification • Fully specified XML format (DONE) • Complement the public WIPO-ALPHA training collection with a WIPO-DELTA XML collection ? (see http://www.wipo.int/classifications/ipc/en/ITsupport/Categorization/dataset/index.html ) from IPCWLMS (upload in database for R&D purpose and XML training collection export)

  14. Text categorization in the IPC • Other 2018 perspectives: • Cross lingual text categorization in the IPC at subgroup level • Confirmation of expectations through prototyping of ES, FR, EN, DE, RU support through use of automatic translation by commercial product (bound by budget limitations) e.g. DE text translated text into EN and submitted to IPCCAT neural trained with EN documents • Available through IPCPUB interface or web service (Q2 2018) • IPCCAT retraining based on IPC 2018.01 (Q3 2018)

  15. Thank you for your attention! • QUESTIONS? contact WIPO at ipc.mail@wipo.int

More Related