640 likes | 976 Views
Database evaluation:. Part 2. Two case studies. 1. How many pamphlets (WO-A) did WIPO publish in 2000? Fundamental question of database content Quantitative? 2. How many patents/applications from 2002 refer to uses of elemental gold or its compounds? Database searchability Qualitative?.
E N D
Database evaluation: Part 2 © Magister Ltd 2004, 2005
Two case studies • 1. How many pamphlets (WO-A) did WIPO publish in 2000? • Fundamental question of database content • Quantitative? • 2. How many patents/applications from 2002 refer to uses of elemental gold or its compounds? • Database searchability • Qualitative? © Magister Ltd 2004, 2005
Sources for question 1 • WIPO data • Press release, paper PCT Gazette • IPDL PCT Gazette • esp@cenet • ESPACE-ACCESS CD-ROM • Questel-Orbit • WOPATENT, PCTFULL, PlusPat • STN • PCTFULL © Magister Ltd 2004, 2005
“The PCT in 2000” Source: “The Patent Cooperation Treaty in 2000” Geneva: WIPO, 2001 © Magister Ltd 2004, 2005
Basic test no.1 - Definition • What does this figure represent? • Quote: “The number of international applications published in 2000 in each of the languages of publication was as follows:...” • Quote: “In 2000, the Gazette included entries relating to the 79,947 international applications which were published in 2000 in the form of PCT pamphlets…” © Magister Ltd 2004, 2005
Variations on a theme • No explicit mention of • reprinted or correction documents, • delayed search reports, • cases withdrawn after allocation of publication number. • The use of the term “PCT pamphlets” may mean that only complete specifications are being counted, not WO-A3 or similar. © Magister Ltd 2004, 2005
Paper PCT Gazette • Lowest entry = 00/00001 • Highest entry = 00/79858 • Implication: 79,858 cases were published. • WIPO says: 79,947 cases were published. • Where are the missing 89 ? • NOTE: if the difference is due to withdrawal, we would expect the highest Gazette entry to be higher than actual publications - but it’s lower. © Magister Ltd 2004, 2005
IPDL PCT Gazette • Difficult to locate year truncation feature: • DP/*/*/2000 • Help refers to (*) specifically as RIGHT-hand truncation operator • Result = 98,644 records (deviation = 18,697) • this includes WO-A3 and other correction documents • DP/*/*/2000 AND ( KI/A1 OR KI/A2 ) • Result = 85,873 records (deviation = 5,926) • No way of expanding the KI field to locate other possibilities. © Magister Ltd 2004, 2005
Relevance score = 0 ? Other comments on IPDL © Magister Ltd 2004, 2005
Does this imply that KD codes are not accurately applied to all records? esp@cenet ® / ESPACE-ACCESS • Worldwide file: • Publication no. = WO and Publication date = 2000 (no truncation) • Result = 79,858 (exactly the same as the paper Gazette range, deviation from WIPO = - 89) • ESPACE-ACCESS • Publication no. = WO2000* • Result = 79,850 • Publication no. = WO2000* & (KI=A1, A2) • Result = 68,635 ! © Magister Ltd 2004, 2005
STN/MicroPatent file PCTFULL Comment: Very close to the 79,858 suggested by the paper Gazette Strategy: 2000/PY © Magister Ltd 2004, 2005
STN/MicroPatent file PCTFULL Strategy: 2000/PY & CC/LA © Magister Ltd 2004, 2005
STN/MicroPatent file PCTFULL Strategy: 2000/PY & CC/LA & FT/FA © Magister Ltd 2004, 2005
STN/Univentio file PCTFULL Strategy: 2000/PY © Magister Ltd 2004, 2005
STN/Univentio file PCTFULL Note: In both MicroPatent and Univentio, the sum of languages yields a different total to the publication year: Could imply either that the /LA field is not being accurately filled, or that documents are missing, or both. Strategy: 2000/PY & CC/LA © Magister Ltd 2004, 2005
STN/Univentio file PCTFULL Note: In both MicroPatent and Univentio, the availability of bibliographic data in the appropriate language is still no guarantee of availability of full text. Strategy: 2000/PY & CC/LA & DETD/FA © Magister Ltd 2004, 2005
Questel WOTEXT file • EPO file, now withdrawn - replaced by PCTFULL file from Univentio • Drawn up on a different selection criteria • up to mid-2000, preference given to an English-language representative document, e.g. • fast-publishing US-B replaced an equivalent WO-A • after mid-2000, WIPO XML full-text for all English, French and German cases © Magister Ltd 2004, 2005
Questel/WOTEXT © Magister Ltd 2004, 2005
Questel WOPATENT • Bibliographic file only, data supplied by WIPO • Publication date and kind can be searched in PN field: • /PN 2000 AND (A1/PN OR A2/PN) • Result = 79,857 (1 different from the Gazette total) • of which 68,024 were WO-A1 + 11,833 were WO-A2 • Total = 79,857 ! • Not possible to analyse by publication language? - PNL field not on summary sheet © Magister Ltd 2004, 2005
Questel WOPATENT ?..IND /APL A Beginning of the index. 1 5114 CN 2 14 CRO 3 34 CS 4 13 CZE 5 2686 DA 6 133141 DE 7 1 DK 8 624721 EN 9 5493 FI 10 43696 FR 11 41 HR 12 79 HU 13 21 HUN 14 1594 IT 15 368 ITA ……. 1 26 TR 2 2 TUR 3 1 US End of the index. The APL field is the only obvious field for language, and does not correspond to the language of PUBLICATION INID codes 25 (filing language) and 26 (publication language) exist for this purpose - why are they not being used? © Magister Ltd 2004, 2005
Questel PCTFULL • Full text file, data supplied by Univentio (same as STN version) • Publication date and kind can be searched as for WOPATENT • /PN 2000 AND (A1/PN OR A2/PN) • Result = 79,857 (1 different from the Gazette total and identical to WOPATENT) • Analysis by Kind Code also the same as WOPATENT • Full text available = 76,750 © Magister Ltd 2004, 2005
Questel PCTFULL Notes: Same data supplier, different numbers of texts available. Questel total with language code = 10 more than strategy without language code Strategy: 2000/PN & CC/LA & DESC=YES © Magister Ltd 2004, 2005
Questel PlusPat • Bibliographic - file producer = Questel • Publication date and kind can be searched as for WOPATENT : two (apparently) equivalent command strings - • /PN 2000 AND WO = 83,229 • presumably including WOA3 documents • /PN WO AND PD=2000 = 79,858 • of which WOA1 = 68,022, WOA2 = 11,836 • Language analysis available using 3-letter codes © Magister Ltd 2004, 2005
Questel PlusPat Notes: Closest match yet to the official WIPO totals - but language code is still apparently causing data loss Strategy: (WOA1/PN OR WOA2/PN) & PD=2000 & CCC/LA © Magister Ltd 2004, 2005
Summary • A simple search on publication year and kind code yields substantial variation • but the exact explanation requires more research • It appears that the language code is not being applied correctly • same possibly applies to the Kind Codes • In a real search situation, missing full-texts would cause significant data loss “The truth is rarely pure - and never simple” Oscar Wilde © Magister Ltd 2004, 2005
How many English WO’s in 2000? © Magister Ltd 2004, 2005
Case study 2 • How many patents/applications from 2002 refer to uses of elemental gold or its compounds? • Sub-question ; what proportion are US publications? • Evaluation factors: • Database searchability • Qualitative? © Magister Ltd 2004, 2005
Sources for question 2 • USPTO.gov • WIPO IPDL • Chemical Abstracts • World Patent Index • IFI Claims ® • (esp@cenet ®) © Magister Ltd 2004, 2005
Sample search: USPTO.gov • USPTO granted patents: • ‘Quick Search’ option: • Term 1 = 1/1/2002->12/31/2002 (Issue Date) AND Term 2 = gold (All Fields) = 8,854 patents • Range from US 6334244-B granted Jan 1, 2002 to US 6502221-B granted Dec 31, 2002 • Initial impression - large number of electronics cases (films, connectors etc.) • High recall, low precision © Magister Ltd 2004, 2005
Second search • USPTO published applications: • ‘Quick Search’ option: • PD/1/1/2002->12/31/2002 AND gold (all fields) = 0 applications (?!) • Re-run in ‘Advanced Search’ • PD/1/1/2002->12/31/2002 and SPEC/gold = 11,140 applications • highest number = US 2002/0199221-A (unable to browse to other hit-lists) • Maybe it was having an ‘off-day’? © Magister Ltd 2004, 2005
Refine the search • False drops : • US 6499593-B (Golf Bag) • Refs. cited: “Article, New Gold Accessories for 2000, Golf Illustrated--by: Laurie Lee Dovey, Equipment Editor (No date).” • US PP 13443-P2 (Nectarine tree named `Burnectfive`) • Specification: “Floral nectaries.-- Color. -- A dull orange-gold (RHS Greyed Red Group 178 B).” © Magister Ltd 2004, 2005
Refine the search • False drops : • US 6404519-B (Method of advertising on a motor vehicle) • Inventor City: Gold Hill, NC • US 6465189-B (Systematic evolution of ligands by exponential enrichment: blended selex) • Inventor: Larry Gold • US D464586-S (Sock sculpture) • Attorney: Gold & Rizvi, P.A. © Magister Ltd 2004, 2005
“Experience is what you get when you don’t get what you want” Dan Stanford (1850-94) © Magister Ltd 2004, 2005
Initial conclusion • Many of the false drops are due to the definition of “all fields” • literally includes all text fields, plus all front page bibliographic data fields as well. • Lesson • always be clear about what is included in the ‘basic index’ of your database © Magister Ltd 2004, 2005
Limiting field of search • Granted patents file: • ISD/20020101->20021231 and ACLM/gold = 1,321 patents • By using the ‘claims’ field, we achieve two improvements: • substantial increase in precision • substantially eliminates Design Patents. • smaller ACLM field : typically only one claim in the form “The ornamental design for […], as shown and described” © Magister Ltd 2004, 2005
Alternative document types • Re-issue Patents: • ISD/20020101->20021231 and gold (all fields) and APT/2 = 26 patents • Not linked to their original issue patent in this file • Unconventional topics: • US 6422036 (Jewelry clasp) • unlikely to be covered by Chemical Abstracts? • Lesson: • don’t assume that your database includes all candidate answers of all types © Magister Ltd 2004, 2005
WIPO IPDL full-text • Lesson: clarify timeliness criteria of file before starting to evaluate: • File ‘help’ notes some delay in release of text (typically 2-3 weeks) • Lesson: clarify multilingual search capability in a multilingual file... • DP/20020101->20021231 and (ET/gold or ABE/gold or DEE/gold or CLE/gold) • German and Spanish field labels not available • 319 hits, including reprinted documents (A3) © Magister Ltd 2004, 2005
Abbreviation searching • DP/20020101->20021231 and (ET/gold or ABE/gold or DEE/gold or CLE/gold or DEE/Au) • 121,584 hits • Field DEE includes front page data, and retrieves every designation of Australia! • Lesson: • consider ambiguity of search terms, especially in light of field contents. Question: How many other chemical element symbols correspond to ST.3 country codes? Answer coming up at the end of this session... © Magister Ltd 2004, 2005
Further search terms • Up to now, based on a very crude strategy • basic words, abbreviations, synonyms • Database evaluation with respect to subject-based searching should always include strategies optimised for each file: • CAS - RN’s • IFI - Uniterms, linking and role indicators • WPI - Manual Codes, subscriber abstracts • etc. © Magister Ltd 2004, 2005
Chemical Abstracts • Registry file: • Au/ELS = 39,006 records (L1) • HELP RNYEAR shows 2002 registrations = 380148-72-1 to 477930-11-3 • S L1 RAN=(380148-72-1,477930-11-3) = 1,964 compounds registered during 2002 (L2) • N.B. not the same as ‘compounds registered from documents published in 2002’ • safer to use L1 RAN=(380148-72-1,) = 2,420 (L3) • Answers will include isotopes, compounds and alloys © Magister Ltd 2004, 2005
Chemical Abstracts • CAPlus file • Cross L3 from Registry (L4); & P/DT & 2002/PY.B : • 132 documents, each citing one or more Au compounds registered >=2002 • Compare P/DT & 2002/PY.B & (GOLD/TI OR GOLD/AB) = 519 • Compare L1 & P/DT & 2002/PY.B NOT (GOLD/TI OR GOLD/AB) = 2,735 • both include new uses of older compounds © Magister Ltd 2004, 2005
Example hits • PL 182430-B1, pub. 20020131 • “Method of making ohmic contacts in III-V semiconductor radiation sources.” • 496877-84-0 : 95% Au, 4.5% Zn alloy • JP 2002-161327-A2, pub. 20020604 • “Sintered electric contact material, its manufacture, and circuit breaker.” • 433295-32-0 : 70% W, 30% Au, 0.1% Sb alloy. © Magister Ltd 2004, 2005
Example hits • RU 2188430-C2, pub. 20020827 • “Method for predicting arterial hypertension development during anti-inflammatory therapy in rheumatoid arthritis patients.” • cites 12244-57-4, Tauredon • JP 2002274841-A2, pub. 20020925 • “Superconductor materials” • cites 461667-30-1, Gold magnesium boride ((Au,Mg)B2) © Magister Ltd 2004, 2005
Derwent WPI • Available fields: • Text fields • Basic index, Titles, Extension Abstracts • Manual Codes • CPI subscriber only, EPI open to everyone • Fragment Codes • subscriber only • Lesson: • WPI includes a range of search options - not all open to all users • evaluate with the customer’s access in mind © Magister Ltd 2004, 2005
Extension Abstracts => S (GOLD OR AU) AND 2002/PY.B L1 4313 (GOLD OR AU) AND 2002/PY.B => S (GOLD OR AU)/BI,ABEX AND 2002/PY.B L2 4530 (GOLD OR AU)/BI,ABEX AND 2002/PY.B => S L2 NOT L1 L3 217 L2 NOT L1 © Magister Ltd 2004, 2005
No record in Basic Abstract AN 2003-271291 [27] WPIX TI Catalyst for carboxylate-ester synthesis contains metal ultrafine particle having preset average particle diameter supported on inorganic oxide support. PI JP2002361086 A 20021217 (200327)* 11p B01J-023-52 AB JP2002361086 A UPAB: 20030429 NOVELTY - A catalyst for carboxylate-ester synthesis….. DETAILED DESCRIPTION - An INDEPENDENT CLAIM is included for manufacture... USE - For synthesis of carboxylate ester… ADVANTAGE - The catalyst has excellent catalytic activity…. TECHNOLOGY FOCUS - INORGANIC CHEMISTRY - Preferred Support: The inorganic oxide support... © Magister Ltd 2004, 2005
Extension Abstract ABEX JP 2002361086 AUPTX: 20030429 EXAMPLE - 10 mmol/L chloroauric-acid aqueous solution (500 ml) was maintained at 65-70degreesC and pH was adjusted to 7 using 0.5N sodium hydroxide aqueous solution. gamma-alumina AC-12R (40 g) was added to the aqueous solution with stirring…... The metal fixation material obtained by filtration was dried at 100degreesC for 10 hours, then bake-processed at 300degreesC in air for 3 hours and a metal support (gold/gamma-alumina) having metal supported on the alumina support, was obtained. The amount of metal on the support was 4.6 weight% with respect to the support……. © Magister Ltd 2004, 2005
Effective use of Manual Codes • Many of the CPI Manual Codes are too wide in scope to give precise retrieval for this search • However, they can be used in combination with other search terms (e.g. text, IPC) to set a context for retrieval • e.g. N02-E04/MC AND GOLD/TI, AB limits retrieval specifically to gold in the context of catalysis. © Magister Ltd 2004, 2005
Fragment Codes • Applied only to certain chemical patents, primarily to aid retrieval of compounds disclosed only in generic form. • A679 is the code for gold => S A679/M0,M1,M2,M3,M4,M5,M6 AND 2002/PY.B L6 531 A679/M0,M1,M2,M3,M4,M5,M6 AND 2002/PY.B => S L6 NOT (L1 OR L2) L7 81 L6 NOT (L1 OR L2) © Magister Ltd 2004, 2005
Example answer AN 2003-209212 [20] WPIX TI Emulsions useful as solid dosage forms comprises a mixture of a drug-containing emulsion and a solid particle adsorbent. PI US2002160049 A1 20021031 (200320)* 15p A61K-009-00 AB US2002160049 A UPAB: 20030324 NOVELTY - An emulsion composition (I) in…. M2 *02* A679 A960 A970 B415 B720 B743 B770 B815 B831 C710 H4 H401 H481 H8 J0 J014 J2 J273 J4 J471 J490 J9 K0 L8 L814 L821 L831 M210 M211 M212 M250 M262 M283 M315 M321 M332 M344 M349 M381 M391 M411 M431 M510 M520 M530 M540 M620 M782 M904 M905 N103 DCN: R09330-K; R09330-M; R11043-K; R11043-M © Magister Ltd 2004, 2005