260 likes | 323 Views
Quality of Classification. Recall = = 1. # retrieved relevant documents. # existing relevant documents. What to achieve ?. Optimum: All documents pertaining to specific technical area (concept) are found by classification search.
E N D
Recall = = 1 # retrieved relevant documents # existing relevant documents What to achieve ? • Optimum: • All documents pertaining to specific technical area (concept) are found by classification search For concepts defined in IPC: documents have all appropriate symbols Priority 1: < > Efficiency: documents have no inappropriate symbols Priority 2:
Phenomenology of quality issues • document is unclassified • has wrong / inappropriate classification • has outdated / invalid classification • non-exhaustive/ incomplete classification • > appropriate symbols are missing • > given symbols are not specific enough • varying classifications of family members • excessive classification
Different aspects • individual document / publication • - classification by publishing IPO • - and by other IPOs, e.g. EPO > ECLA • DPMA > "ICP" • JPO,… ? • > examiners create their own search files • different publication levels: • - unexamined (unsearched) applications • - granted patents • families: in MCD reclassification at family level • data in different databases
Unclassified documents Published before 1.1.2006: many documents in MCD still unclassified / not reclassified: 92% of all documents in MCD* 87% of all documents of EPO members Published after 1.1.2006: 97% of all documents in MCD 91% of all WO each week 6 - 8% of WO publications are not classified at all *cf IPC/CE/40/4
Unclassified WO documents • Publication week 50 (13.12.2007): 260 of 3272 (7.9%) • ISA • EP 218 (84%) • KR 27 (10%) • AU 5 • US 5 • RU 2 • SE 2 • CA 1 • Receiving Office • US 177 • IB 31 • EP 26 • GB 9 • KR 3 • DE 2 • FR 2 • IL 2 • : Lesson : There are still many documents without any valid classification > Top priority: All documents should have at least one valid classification
courtesy of M. Meier (Audi) A61N 1/00 Electrotherapy; Circuits therefor Wrong classification
courtesy of M. Meier (Audi) B60K Arrangement or mounting of propulsion units or of transmissions in vehicles Wrong classification Lesson : Completely wrong classifications do occur
Wrong classification • Example: WO2007126503 • ISR: G01L 19/02 • Espacenet: G10L 19/02 Lesson : Typos may occur; flaws of concordance tables Wrong classifications: difficult to investigate because difficult to find feedback by users needed
Outdated / invalid classification • Business methods: G06F 17/60 G06Q [2006.01] • in Espacenet: 0 WO docs with a:G06F17/60 • in Patentscope: 1506 WO docs with G06F17/60 • -e.g. WO2007004271 reclassified in Espacenet only to ECLA Lesson : Classification data may be different in different databases in Espacenet: many non-PCT min are not reclassified - e.g. CZ, UY, NZ, AR not all PCT min is reclassified - e.g. only 678 of 14543 KR docs reclassified in ECLA/IPC Lesson : Reclassification following revision is still incomplete
Outdated / invalid classification • Traditional medicine: A61K 35/78 A61K 36/.. [2006.01] • in Espacenet: 10413 docs still have 35/78 as ECLA • only 7412 thereof have 36/.. Lesson : Reclassification to valid IPC incomplete Further example WO1998039019 in Espacenet: A61K 36/02 as IPC-AL A61K 35/80 as ECLA Patentscope: A61K 35/80 as IPC Lesson : Classification data may be different in different databases
Varying classifications in family • Example: Aircraft cargo loading logistics system • US 2005246132 A1 (3.11.2005) • US 7100827 B2 (5.9.2006) • DE 102005019194 A1 (24.11.2005) • FR 2871269 A1 (9.12.2005) Lesson : Classification of granted patents may be very different Lesson : Assessment of main classification varies
Varying classifications in family Lesson : classification data from subsequent publications may not be in MCD Lesson : some reclassification data may not be in MCD;exist as ECLA only
Varying classifications of single document • Example: WO2007126503 • ECLA: G01L 19/00B (roll up to IPC: G01L 19/00) • IPC: G01L 19/02 Lesson : different views of different classifiers • US7258017 B1 (granted family member) • IPC: G01L 19/04 Lesson : classification of granted patents may be different
by courtesy of H. Wongel Current problems in classification (I): IPC consistency • KR20070005367 A (Prio.: KR20050060661) • Multifocal lens and manufacture method thereof • IPC (AL):G02B3/10 • JP2007017937 A (Prio.: KR20050060661) • Multifocal lens and method for manufacturing the same • IPC (AL):G02F1/13; G02B3/14; G02F1/1334 • US2007008599 A (Prio.: KR20050060661) • Multifocal lens and method for manufacturing the same • IPC (AL):G02B5/32 • CN1892258 A (Prio.: KR20050060661) • Multifocal lens and method for manufacturing the same • IPC (AL):G02B3/10 • EP1742100 A1 (Prio.: KR20050060661) • Multifocal lens and method for manufacturing the same • IPC (AL):G02F1/1334 Lesson : classifiers may have different views of subject matter to be classified or interpret IPC groups differently
Non-exhaustive classification • Example: Secondary scheme A01P [2006.01] • "Biocidal, pest repellant ,… activity of chemical compounds" not in ECLA ! • Espacenet: Lesson : incompatibility of IPC and ECLA may cause non-exhaustive classification
Non-exhaustive classification • Example: A61K 36/.. • ECLA: 22440 documents • IPC: only 17847 thereof have a:A61K 36/.. Lesson : relevant classifications may not be given / available as IPC • Example: EP1881839 • ECLA: A61K 36/487 • IPC: A61K 36/00 • Example: C12Q 1/68 • Espacenet: > 100.000 docs • ECLA: > 40 subgroups • IPC: 0 subgroups Lesson : classifications could be more specific
Causes/sources for deficiencies "wrong" or varying intellectual classification: • rules too complicated • drawbacks of classification scheme (too much overlap) • interpretation of subject matter • differing national practise • lack of expertise, diligence, time pressure granted claims may differ incompatibility ECLA - IPC; USPC concordance tables lack or delay of reclassification: • insufficient resources for intellectual reclassification data exchange / management problems data input (typos)
Options for improvement • on IPO level: - allocate resources - adapt / harmonize classification practise / training - develop classification assistance tools • on user level: - knowing deficiencies > adapt search strategies • on IPC level: - improve user-friendliness (e.g. definitions) - simplify IPC scheme, rules More liberal approach when classifying ? One more symbol better than one symbol missing ? Do we need to be worried about varying classifications ?
Options for improvement On MCD / database level: • crosscheck content of databases • pooling / compiling of classification data (in one searchable field / on family level ?) of - classification data of fam members - subsequent publications - other sources (DE: ICP,…) • processing such compilations of classifications of different origin, e.g.: compare classification of subsequent publications (A, B, ..) > create "trusted" classifications (e.g. class (A) = class (B)) ?
Learn from / go WEB 2.0 ? • "Folksonomy", "social tagging", "cooperative, collaborative classification" > include broader user community ? e.g. any searcher ? > implement feedback channels ?
Are you satisfied with classification in A61N 1/00 ? Yes / No Would you like to suggest further classifications: ....................... ....................... ....................... Click opens Submit
Learn from / go WEB 2.0 ? • "Folksonomy", "social tagging", "cooperative, collaborative classification" > include broader user community > compile varying views, ie classifications • process such data; create "trusted" classifications • broader participation in scheme development, in particular definitions ? Tagging of IPC entries ? Thank you
Top priority: all documents should have at least one valid classification Priority 1: documents have all appropriate symbols Priority 2: documents have no inappropriate symbols More liberal approach when classifying ? One more symbol better than one symbol missing ? Do we need to be worried about varying classifications ? Include broader user community ? e.g. any searcher ? Implement feedback channels ? Create "trusted" classifications (e.g. class (A) = class (B)) ?