1 / 29

Patent Classifications as ‘Knowledge’ …towards a more conscious (auto)categorization of patents

Patent Classifications as ‘Knowledge’ …towards a more conscious (auto)categorization of patents. Arcanum Development 2013. A usual hierarchic categorization task…. Given a hierarchic taxonomy (classification system)

yaphet
Download Presentation

Patent Classifications as ‘Knowledge’ …towards a more conscious (auto)categorization of patents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Patent Classifications as ‘Knowledge’…towards a more conscious (auto)categorization of patents Arcanum Development2013

  2. A usual hierarchic categorization task… • Given a hierarchic taxonomy (classification system) • Provide a list of taxonomy nodes (classification symbols) for a document that best match the subject matter of the taxonomy node • “Best” is based on… • For experts: understanding the subject matter • For computers: providing the hierarchy, and training with sample categorized documents • However, patent classification task is somewhat more complicated… and considering classification rules …and the rules?!

  3. Typical methods of categorization Roots (sections) Non-classifying (‘preclassification’) levels … during training… flatthe best is the winner on each level greedy hierarchictraversing down on best only

  4. Common features of patent classification schemes • Hierarchic • covered subject matter of a higher level contains subject matter of a lower level • but: may be assigned to a higher level when none of the lower levels fit • Some nodes (symbols) cannot be used (in general or alone) for classification • hierarchy levels • indexing schemes • Schemes contain specific rules – relations – between symbols • Place / priority / precedence / limiting rules • Indexing rules • References to symbols to be taken in consideration • Rules given in the scheme are extended by definitions / manuals of classification • Schemes can be multilingual • Used by various offices, cultures (maybe slightly differently)

  5. Using relations in patent categorization Last place rule Takes precedence Hierarchic, rules, references

  6. Why a (more) formal analysis and presentation is advantageous? • Recently: the rules are presented as text, in various master files, machine-readable but not machine-interpretable way (it is ‘content’ but not yet ‘knowledge’) • Lots of complex rules spread over multiple sources (e.g. definitions) and places (e.g. reverse references) • Both for humans and computer programs, it causes trouble to collect and apply all the rules systematically • It is worth then to convert IPC content to more explicit IPC ‘knowledge’…

  7. Hypotheses • Tests were made • to verify confusions • of patent examiners • of various autocategorizers • …if they are in correlation to relations given in IPC • Assumption • more references in IPC: higher overlap between subject matter area • Hypotheses • the more references in IPC between two areas,the higher the confusion of humans and computers • the knowledge coded in IPC is, indeed, used by patent categorizers

  8. Testing the hypotheses • If patent examiners take seriously references in IPC • the more references between two symbols, the higher number of co-classification • Practice between two offices can be different • the more references in IPC, the higher likelihood of different decision • Confusion of autocategorizers • more failures if subject matter area is overlapping

  9. Cocategorization vs. IPC references A47,A61 • When there are references in IPC, patent examiners take them seriously • References mark overlapping subject matter areas and/or • References propose the use of secondary (indexing) symbols • On class level, frequency of references in IPC is similar to the frequency of common use of symbols of both classes in patent documents B65 B60-B65 A61 vs C07 and C12 C07-C12 F16 G-H

  10. Differences in examiner’s practice A47,A61 • When there are references in IPC, patent examiners may assign them differently • References mark overlapping subject matter areas • On class level, frequency of references in IPC is similar to differences between selected first symbol (prereform practice, simulate preclassification) B65 B60-B65 A61 vs C07 and C12 C07-C12 F16 G-H

  11. Confusion of autocategorizers A47,A61 • When there are references in IPC, autocategorizers fail more frequently • “first symbol” may be selected differently • On class level, frequency of references in IPC is similar to differences between selected first symbol of an autocategorizer (2002 data, to simulate preclassification) B65 B60-B65 A61 vs C07 and C12 C07-C12 F16 G-H

  12. Conclusion • Reference statistics in IPC • Co-classification • Human classification differences • Preclassification autocategorization errors show similar characteristicson higher levels of IPC • It may be even more important on lower levels, having there more complex rules • Therefore, an easier access to the rules maybe welcome both by human and machine categorizers

  13. Presentation of IPCInfo • An analysis and data preparation was performed as in-house research • defining relevant relation types (about 15 main relations and further ~20) (excerpts below) • parsing IPC scheme, definitions, catchwords and RCL • building relation graph in RDBMS (>1.5 m relations) • The result is presented on a user interface • Convertible to RDF or OWL for further use

  14. Patent taxonomy relations, samples • reference: (transitive!)A01B 1/00 Hand tools (edge trimmers for lawns A01G 3/06)A01G 3/06 Hand-held edge trimmers or shears for lawns (mowers combined with lawn edgers A01D 43/16) • precedence: (over 600 transitive cases, e.g A61M 3/00  A61M 5/00  A61M 36/00 [in definitions!])A01B 3/24 Tractor-drawn ploughs (A01B 3/04 takes precedence)A01B 3/04 Animal-drawn ploughs • limiting: A01N PRESERVATION OF BODIES…; BIOCIDES, e.g. AS DISINFECTANTS, AS PESTICIDES OR AS HERBICIDES; … in Definitions for A01N subclass:Fungicidal, bactericidal, insecticidal, disinfecting or antiseptic paper D21H

  15. Patent taxonomy relations, samples • indexing : guidance heading before A61K 101/00Indexing scheme associated with group A61K 51/00, relating to the nature of the radioactive substance • placerule : note before A01N 25/00, even specifying an exception…In groups A01N 27/00-A01N 65/00, in the absence of an indication to the contrary, an active ingredient is classified in the last appropriate place. • priorities (standardseq): for main groups in IPC where no place rule is applied • cooccurrence: e.g. in catchwords: also the text of IPC mentions the referenceCONDITIONING harvested crops A01D 43/10, A01D 82/00A01D 43/10 with means for crushing or bruising the mown cropA01D 82/00 Crop conditioners, i.e. machines for crushing or bruising stalks (mowers combined with means for crushing or bruising the mown crop A01D 43/10)

  16. Presentation of IPCInfo / 2

  17. Thank you… And keep reading if interested…

  18. Formalization • With mathematical notations • Targeted for audience not familiar with IPC

  19. The ‘patent’ (auto)categorization task • Regular multiclass hierarchic categorization task • Given a hierarchic taxonomy (a patent classification) with categories • Given a set of training documents, each associated to multiple categories…or…an expert knowing both state of the art of the field and the taxonomy • For a document, provide a list of potential categories (preferably with relevance) • Categorization level may be fixed (preclassification) or full • But…

  20. The ‘patent’ (auto)categorization task, but… • Really a regular multiclass hierarchic categorization task? • Taxonomy: text and definitions (manuals or handbooks) and revisions, and therefore: • known relations between categories (rules of classification, e.g. last place rule, takes precedence) • secondary categories, non-primary categories (indexing codes, ‘not used as first symbol’) • some categories excluded for ‘final’ categorization (top levels of the hierarchy) but required in preclassification (where secondary categories cannot be used) • Documents • contain metadata (priorities, inventor, applicant) • various “fields” (title, abstract, description, claims) • some fields are subject of independent categorization (claims), some fields may be use just globally (abstract, description) • Changes: subject matter of a symbol, classification rules and procedures • provided categories may require revisions, since taxonomy can be revised in regular intervals or immediately • e.g. there is no more ‘main classification symbol’ • preclassification may help to reduce the scope but requires handling failures

  21. Notations: Hierarchic taxonomy • Taxonomy: T • Category: C, supercategory: ⊗∉ C • Parent function: p: C→C⋃⊗ function, describing a non-directed tree graph • Ancestors: p+: C→C+ ⋃ ⊗, transitive closure of p • Child function (subcategories): c: C→C* = p-1 • Descendants: c+: C→C* transitive closure of c • Roots of taxonomy (‘sections’): C⊗ ⊂ C ,C⊗ = {r∊ C | p(r) = ⊗}

  22. Notations: Patent taxonomy • Level of category: L, l: C→L (e.g. ‘subclass’) • Classifying category level: Lc⊂ L • Classifying category: Cc⊂ CCc = { c∊ C: l(c)∊Lc } • Non-classifying category: Cc‾ ⊂ CCc‾ = C∖Cc • Category symbol: s: C↔$($ stands for string) • Category sort relation: c1 < c2⇔ s(c1) < s(c2)also min, max applicable for C+ • Category interval: [f,t] = {c∈ C | f ≤ c∧c≤ t } • Usually: descendants form a contiguous interval, i.e. ∀ a ∈ C : d ∈ [ min(c+(a)), max(c+(a))] ⇔ d ∈ c+(a)

  23. Notations: category relations • Relation types: R ⊂ (C → (℘(C) ∪⊗)) • All relations in a taxonomy: TR⊂ C☓C☓R • All relations for a category: r∀ : C→ (R ☓C)* • Obvious relation types in hierarchies: { parent, child, ancestor, descendant } ⊂ Rdefined as parent ≈ p, child ≈ c etc. • Further obvious relation: sibling (s), as child of parent (c) • Interval and set relations: union of the single-category form, e.g. descendant({c1,[c2,c3]}) • result abbreviated as an interval or set: descendant(a) = [min(c+(a)),max(c+(a))]

  24. Patent taxonomy relations on a single version • Invertable relations • Simple reference: category ‘refers’ to another • ‘Takes precedence’ reference • limiting references, very similar to precedence • Allowed indexing symbols on an interval • Precedence relations on siblings • placerule: first place rule or last place rule • priority: siblings prioritized by ‘standardized sequence’ • cooccurrence of references (commutative)

  25. Patent taxonomy relations, samples • reference: (may refer further!)A01B 1/00 Hand tools (edge trimmers for lawns A01G 3/06)A01G 3/06 Hand-held edge trimmers or shears for lawns (mowers combined with lawn edgers A01D 43/16) • precedence: A01B 3/24 Tractor-drawn ploughs (A01B 3/04 takes precedence)A01B 3/04 Animal-drawn ploughs • limiting: A01N PRESERVATION OF BODIES…; BIOCIDES, e.g. AS DISINFECTANTS, AS PESTICIDES OR AS HERBICIDES; … in Definitions for A01N subclass:Fungicidal, bactericidal, insecticidal, disinfecting or antiseptic paper D21H

  26. Patent taxonomy relations, samples • indexing : guidance heading before A61K 101/00Indexing scheme associated with group A61K 51/00, relating to the nature of the radioactive substance • placerule : note before A01N 25/00, even specifying an exception…In groups A01N 27/00-A01N 65/00, in the absence of an indication to the contrary, an active ingredient is classified in the last appropriate place. • priorities (stand.seq.): main groups in IPC where no place rule is applied • cooccurrence: in catchwords: also the text of IPC mentions the referenceCONDITIONING harvested crops A01D 43/10, A01D 82/00A01D 43/10 with means for crushing or bruising the mown cropA01D 82/00 Crop conditioners, i.e. machines for crushing or bruising stalks (mowers combined with means for crushing or bruising the mown crop A01D 43/10)

  27. Patent taxonomy relations, multiple versions • Patent taxonomies change in time • A former category (or a set) may be • transferred to a single or a set of new categories or, it is recognized that the subject matter is • covered by a single or set of existing categories • In the newer version, all the categories which are associated to a single or a set of former categories, are in concordance relation • concordance relation may be computed by transitive traversing category changes over multiple versions

  28. Patent taxonomy relations:concordance relation sample • 2011: B24B 49/00Measuring or gauging equipment for controlling the feed movement of the grinding tool or work; Arrangements of indicating or measuring equipment, e.g. for indicating the start of the grinding operation • 2012: B24B 49/00  B24B 37/005 - 37/015, B24B 49/00B24B 37/005 . Control means for lapping machines or devicesB24B 37/013 . . Devices or means for detecting lapping completionB24B 37/015 . . Temperature controlB24B 49/00 Measuring or gauging equipment for controlling the feed movement of the grinding tool or work; Arrangements of indicating or measuring equipment, e.g. for indicating the start of the grinding operation ( B24B 33/06, B24B 37/005 takes precedence; if applicable to other machine tools, B23Q 15/00-B23Q 17/00 take precedence)

  29. Effect of relations on categorization • A weighted directed graph can be built between categories • Whenever an ‘oracle’ (e.g. a flat categorizer, a fielded search etc.) proposes a category, related categories must be evaluated and verified, may be, in a given order, considering also weights • Training may also benefit from knowing, in advance • order of evaluation, e.g. standardized sequences, priority rules • relations: • to enhance a good hit or suppress a false hit • or co-classifiy

More Related