1 / 19

ST22 revision proposal

This proposal suggests revising the outdated WIPO Standard ST.22 to incorporate advancements in OCR technology over the past decade. The revised standard aims to improve efficiency, reduce costs, and enhance the quality of full-text search results for patent applications. The text highlights the benefits of updating the standard, based on the PCT International Bureau's successful experience with an internal OCR system. The proposal includes key points for implementing a more effective OCR process, such as using commercial products, developing scalable services, and enhancing electronic products with OCR results. Various examples of challenging pages are provided to guide the revision process. The conclusion invites the SDWG to consider establishing a task force for revising WIPO Standard ST.22.

Download Presentation

ST22 revision proposal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva

  2. Agenda • Reasons for the revision of the ST22 • Age of current standard • Expected benefits • PCT International Bureau experience • Examples of pages difficult to OCR • Conclusion • Discussion / Questions

  3. Age of current standard • Inadequate title: “Recommendation for the presentation of patent applications typed in optical character recognition (OCR) format” • Contains valid recommendations but expressed using an old-fashioned terminology (ribbons, typewriter,…). Some recommendations need to be precised. • A few new recommendations should be added to take into account the progress in OCR technology in the last 10 years. • Not enough followed by agents/applicants: some promotion is required

  4. Expected benefits • Experience shows that if documents follow simple layout rules, the automatic OCR procedures are sufficiently effective to yield a satisfying result for full text search purposes (i.e. an average accuracy above 98.5%). • An updated standard ST22 would lead to: • Significant reductions in cost for the OCR procedures performed by the IP regional/national offices and the IB. • Better quality for the full-text published documents built from OCR procedures • More efficient and precise search procedures for the IP community

  5. PCT International BureauExperience • An internal automatic OCR system and a Quality Checking system have been developed by the PCT • The system has been tested for 6 months and then put in production. It has been in operations since January, 1st 2006 and OCRs the pamphlets published weekly by the PCT.

  6. Internal OCR key points • Use an off-the-shelf commercial product and adapt it to the PCT needs • Build a generic and scalable service so that the OCR function can be used from different applications (on- line or batch) and fulfill PCT future needs • Operate the service in house to reduce costs and gain flexibility in the publication process (discontinue Outsourcing contract)

  7. Internal OCR: key points • OCR the description and claims sections of the published PCT pamphlets each week (circa 50’000 pages to OCR weekly) • Provide the results as ST36 XML files that are used to feed the indexation engine of the Patentscope site and the espacenet site (see http://www.wipo.int/pctdb/en/browse.jsp) • Enrich the PCT electronic products with the results of the OCR (searchable PDFs added to the rule 87 DVD)

  8. Internal OCR some figures • With our hardware configuration, the OCR of a complete publication week lasts around 16 hours (it runs during week ends). • 5 staffs are performing part-time Quality Checking operations every Monday (Around 3 to 4 man days are spent each week on quality checking) in order to correct the worse cases.

  9. Quality Checking system

  10. Quality Checking system

  11. Some examples of difficult pages submitted in paper or in image form, the revised ST22 standard should discourage...

  12. Narrow fonts, justified paragraphs

  13. Underline, italic, bold text

  14. Subscripts too small

  15. Mathematical formulae embedded in text

  16. Handwritten text or cursive fonts

  17. Gray or coloured backgrounds

  18. Conclusion • We invite the SDWG to: • (a) to consider the proposal to revise WIPO Standard ST.22; and • (b) to consider establishing a task for the revision of WIPO Standard ST.22 and to set up a Task Force to handle such revision.

  19. Agenda • Reasons for the review of the ST22 • Age of current standard • Expected benefits • PCT International Bureau experience • Examples of applications difficult to OCR • Conclusion • Discussion / Questions

More Related