390 likes | 533 Views
OVERCOMING LANGUAGE BARRIERS IN PATENT INFORMATION SEARCHING – A COMMERCIAL PERSPECTIVE FROM THOMSON REUTERS. Rob Willows Vice President - Patent Offices and Special Accounts Thomson Reuters IP Solutions September 2010. PRESENTATION FRAMEWORK. Thomson Reuters today The Language Challenge
E N D
OVERCOMING LANGUAGE BARRIERS IN PATENT INFORMATION SEARCHING – A COMMERCIAL PERSPECTIVE FROM THOMSON REUTERS Rob Willows Vice President - Patent Offices and Special Accounts Thomson Reuters IP Solutions September 2010
PRESENTATION FRAMEWORK Thomson Reuters today The Language Challenge • The languages of patents Thomson Reuters Foundation approach – translation as a component of Derwent World Patents Index® (DWPISM) value-added patent information • Translation of patents from 44 Authorities into 1 common language – English • Value added Abstracts, Titles, Keywords and Coding • Creation of the unique DWPI patent family
PRESENTATION FRAMEWORK First introduction of technology = Machine Assisted Translation of Japanese patent full text Recent approaches • Human translation (dedicated resources) of source text • Bulk and “on the fly” Machine translation of source text • Local language interfaces Outlook for the future Lost in translation; the impact of errors in original data Commercial challenges Conclusions
THOMSON REUTERS TODAY MARKETS DIVISION PROFESSIONAL DIVISION Sales & Trading Investment & Advisory Healthcare & Science Tax & Accounting Enterprise Media Legal IP SOLUTIONS Largest Provider of Intelligent Information Thomson Reuters is the largest provider of intelligent information to business and professional customers in the world. Generating a total revenue of $13.0bn in 2009. True Global Presence We operate in 300 cities in over 100 countries across the world. Publicly Traded We hold ourselves accountable through compliance with Sarbanes Oxley and a stringent code of business ethics. Strong Brand Named #40 in the BusinessWeek 2009 ranking of the 100 Best Global Brands.
THOMSON REUTERS IP SOLUTIONS Powering the Intellectual Property Lifecycle with the world’s most comprehensive resources… TRADEMARKS& BRAND MANAGEMENT TM PATENTS & SERVICES IP LAW INTELLIGENT INFORMATION ADVANCED TOOLS & ANALYTICS EXPERT IP SERVICES
MARKET DYNAMIC:GLOBALIZATION IMPACT ON IP Global nature of IP impacting how organizations maintain competitive advantage in emerging growth markets Increased Asian patent and trademark filings China & Korea increasingly source of innovation creating opportunities & risks Patent & non-patent prior art research demanding improved global coverage
THE LANGUAGES OF PATENTS Asia & Middle East North America Europe Africa Australia & Oceania South America
FOUNDATION APPROACH - DWPI Abstracts for all countries in English Abstracts written by analysts all using the same guidelines – provides consistency and removes legal jargon Abstracts based on entire patent specification, including drawings
THE LANGUAGES OF DWPI North America Languages:English, French Patent authorities:CA, US
THE LANGUAGES OF DWPI Europe Languages:Czech, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portugese, Romanian, Slovak, Spanish, Swedish Patent authorities:CZ, CS, DK, SE, NL, BE, GB, IL, CH, FI, FR, LU, DE, DD, AT HU, IT, NO, PT, RO, SK, ES, EP, WO (RD,TP)
THE LANGUAGES OF DWPI Africa Languages:Afrikaans, English Patent authorities:ZA
THE LANGUAGES OF DWPI South America Languages:Portugese, Spanish Patent authorities:BR, AR, MX
THE LANGUAGES OF DWPI Australia & Oceania Languages:English, Maori Patent authorities:AU, NZ
THE LANGUAGES OF DWPI Asia & Middle East Languages:Chinese (Mandarin), English, Hebrew, Hindi, Japanese, Korean, Russian Patent authorities:CN, TW, IN, IL, PH, SG, JP, KR, RU, SU, DD, WO
THE LANGUAGES OF DWPI 44 sources 23 languages 7 major linguistic families
THOMSON REUTERS ASIA PACIFIC COVERAGE Derwent World Patent Index ® Thomson Innovation
FIRST INTRODUCTION OF TECHNOLOGY – JP MACHINE-ASSISTED TRANSLATION Machine translation algorithms and dictionaries developed over decades Team of MAT analysts based in Japan
JP MAT - WHAT IS IT ? Human Machine Assisted Translations (MAT) of JP full textdocuments Human intervention i.e. manual correction • Specific tagged fields (author abstract, claims, use, advantage, etc) in the output are scanned for: (i) non-translated JP text (ii) failed translation • Manual enhancement consisting fixing error/non-translations in records according to ranking and priority • Term registration, dictionary enhancement and addition of new rules take place
JP MAT – TRANSLATION VARIANTS Multiple translations for single term can exist • Occurs when differing meaning possible • Terms separated by ¦ • Example: “..application of the MIMO transmission technique using several antennas and OFDM strong against multipass|multipath transmission is performed briskly conventionally..” All terms fully searchable, so improved recall
HOW JP MAT HAS BEEN INTEGRATED INTO IP SOLUTIONS PRODUCTS AND SERVICES Provides the source material for the value-added processing of JP patents into DWPI As searchable English language full text in the Asia patents collection on Thomson Innovation As data feed and Web service options for customers to integrate into their legacy in-house systems
CURRENT APPROACHES Human translation (dedicated resources) of source text Machine translation of source text – bulk and on the fly Local language interfaces Seamless link between the DWPI value add record and translated and/or original language full text documents on our delivery platforms e.g. Thomson Innovation
HUMAN TRANSLATION – CHINESE PATENTS Translated title, abstract and claims from January 2007 onwards for: Applications Utility models
MACHINE TRANSLATION - KOREAN PATENTS Includes application, grants & utility models Coverage from January 2008 onwards MT translation of complete document
TRANSLATION ON THE FLY Chinese French German Italian Japanese Korean Portuguese Russian Spanish
LOCAL LANGUAGE INTERFACES Searching Japanese patent collections in Japanese Search local language data and global content in market-leading platform
NON-PATENT LITERATURE Non-patent prior art research demands improved global coverage Index of journal literature of the sciences, published in Chinese. English-language bibliographic data and abstracts from 2002 1,200 journals in all areas of science, 2,000,000 records Coverage of Agricultural Sciences, Biology, Chemistry, Computer Science, Engineering, Geosciences, Management, Mathematics, Medicine
OUTLOOK FOR THE FUTURE Enhanced machine translation of bulk data Query translation/search against original text Enhanced machine translation on the fly • Into English • From English into local language (already in place in Thomson Innovation) Monitor ongoing and future developments for implementation when viable
IMPACT OF ERRORS IN THE ORIGINAL DATA ON MACHINE TRANSLATION Quality of the original data is essential and the translations are affected by • misleading punctuation • misspellings • wrong word order • missing or repeated words
LOST IN TRANSLATION I Erroneous Korean Original text Machine translation 하프톤 마스크의 반투과부 결함 수정 방법 및 이를 이용한 리페어된 하프톤 마스크 The method of repairing defect in semi-premerable portion and the halftone mask which becomes this with the usage heartburnings repair of the halftone mask. Spacing error in the original Korean text Korean original text after correction of errors Machine translation The method of repairing defect in semi-premerable portion of the Halftone mask and the repaired Halftone mask using the same. 하프톤 마스크의 반투과부 결함 수정 방법 및 이를 이용한리페어된 하프톤 마스크
LOST IN TRANSLATION II Erroneous Korean Original text Machine translation 그리고 사용중에 파손이 안돼도록 어느정도 압력으로 되었을때 끊어질수 있도록 고안을 해서 연결관을 만든다. And in order to be cut when damage to some extent consisted of pressure in busy with the An DwaeDo lockit designs and the connection pipe is made. Misspelling in original Korean text Korean original text after correction of errors Machine translation And the connection pipe is made to be cut when reached to some extent pressure in order not to be damaged in use. 그리고 사용중에 파손이 되지 않도록 어느정도 압력으로 되었을때 끊어질수 있도록 고안을 해서 연결관을 만든다.
LOST IN TRANSLATION III Erroneous Korean Original text Machine translation 밀봉 플레이트(70)는, 제 1 측면(72), 대향하는 제 2 측면(74) 및 상기 제 2 측면 상에 위치된 밀봉부(76)를 포함하며, 밀봉부는 밀봉 플레이트를 둘러싸고 있다 The seal plate (70), is the first side surface (72),and the faced second side (74) and the encapsulant (76) located on the second side are included. And encapsulant surrounds the seal plate. Misuse of comma in original Korean text Korean original text after correction of errors Machine translation The seal plate (70) includes first side surface (72), and the faced second side (74) and the encapsulant (76) located on the second side. And encapsulant surrounds the seal plate. 밀봉 플레이트(70)는제 1 측면(72), 대향하는 제 2 측면(74) 및 상기 제 2 측면 상에 위치된 밀봉부(76)를 포함하며, 밀봉부는 밀봉 플레이트를 둘러싸고 있다.
LOST IN TRANSLATION IV Misspelling in the original Korean text Erroneous Korean Original text Machine translation The invention relates to refrigerator, more specifically, to the cold air circulating apparatus and method of the refrigerator which inhales the cool air to the drive of the ventilation fan and controlled so that the inside of refrigerator cooling air circulation be made. 본 발명은 냉장고에 관한 것으로, 더욱 상세하게는 송풍팬의 구동으로 냉기를 흡입하여 고내 냉기순환이 이루어지도록 제어하는 냉장고의 냉기순환장치 및 방법에 관한 것이다. Machine translation Korean original text after correction of errors The invention relates to refrigerator, more specifically, to the cold air circulating apparatus and method of the refrigerator which inhales the cool air to the drive of the ventilation fan and controlled so that the cooling air circulation be made in the inside refrigerator. 본 발명은 냉장고에 관한 것으로, 더욱 상세하게는 송풍팬의 구동으로 냉기를 흡입하여 냉장고내 냉기순환이 이루어지도록 제어하는 냉장고의 냉기순환장치 및 방법에 관한 것이다.
TRANSLITERATION ERRORS KVAERNER MASA YARDS OY SAEIKKOE J; VEIKKOLAINEN M KEVANAL MASHA-YADES OY J. SACO; M. WIKLANIN No priority data
NON CONVENTION EQUIVALENTS IN DWPI PATENT FAMILY WPI Acc no: 2002-306437/200235XRPX Acc No: N2002-239587 Welding structure formation for building applications, involves controlling welding of the constituents arranged on support surface, based on position of weld points determined from recorded image Patent Assignee: KVAERNER MASA YARDS OY (KVAE-N); KVAERNER MASA-YARDS OY (KVAE-N); SAIKKO J (SAIK-I); VEIKKOLAINEN M (VEIK-I); AKER FINNYARDS OY (AKER-N) Inventor: SAEIKKOE J; SAIKKO J; VEIKKOLAINEN M (1) Basic (2) Equivalents (E) #Non-conventionEquivalents(NCE)
COMMERCIAL CHALLENGES • Investment planning • Enhancements to coverage and treatment • China; India; Korea; Switzerland; Taiwan; Brazil; Spain • Increases in volumes • The number of basics has doubled over the past 10 years • 1.491 million basics projected in 2010 • Sourcing and managing original patent data • Primary duty of patent offices is to grant patents; information dissemination is secondary • Data provided in multiple different formats • GIGO – much effort required to identify and correct errors in source material
CONCLUSIONS The many different languages of patent documents present unique challenges Translation is essential for extracting useful information, but costly Tools and techniques are improving, BUT… We will continue to rely on high quality translation of patent information, by various techniques, reinforced by the skill sets of our value-add production team in order to deliver the Thomson Reuters value-add proposition
THANK YOU THOMSON REUTERS – IP SOLUTIONS IP.THOMSONREUTERS.COM