290 likes | 411 Views
Patent Classifications as ‘Knowledge’ …towards a more conscious (auto)categorization of patents. Arcanum Development 2013. A usual hierarchic categorization task…. Given a hierarchic taxonomy (classification system)
E N D
Patent Classifications as ‘Knowledge’…towards a more conscious (auto)categorization of patents Arcanum Development2013
A usual hierarchic categorization task… • Given a hierarchic taxonomy (classification system) • Provide a list of taxonomy nodes (classification symbols) for a document that best match the subject matter of the taxonomy node • “Best” is based on… • For experts: understanding the subject matter • For computers: providing the hierarchy, and training with sample categorized documents • However, patent classification task is somewhat more complicated… and considering classification rules …and the rules?!
Typical methods of categorization Roots (sections) Non-classifying (‘preclassification’) levels … during training… flatthe best is the winner on each level greedy hierarchictraversing down on best only
Common features of patent classification schemes • Hierarchic • covered subject matter of a higher level contains subject matter of a lower level • but: may be assigned to a higher level when none of the lower levels fit • Some nodes (symbols) cannot be used (in general or alone) for classification • hierarchy levels • indexing schemes • Schemes contain specific rules – relations – between symbols • Place / priority / precedence / limiting rules • Indexing rules • References to symbols to be taken in consideration • Rules given in the scheme are extended by definitions / manuals of classification • Schemes can be multilingual • Used by various offices, cultures (maybe slightly differently)
Using relations in patent categorization Last place rule Takes precedence Hierarchic, rules, references
Why a (more) formal analysis and presentation is advantageous? • Recently: the rules are presented as text, in various master files, machine-readable but not machine-interpretable way (it is ‘content’ but not yet ‘knowledge’) • Lots of complex rules spread over multiple sources (e.g. definitions) and places (e.g. reverse references) • Both for humans and computer programs, it causes trouble to collect and apply all the rules systematically • It is worth then to convert IPC content to more explicit IPC ‘knowledge’…
Hypotheses • Tests were made • to verify confusions • of patent examiners • of various autocategorizers • …if they are in correlation to relations given in IPC • Assumption • more references in IPC: higher overlap between subject matter area • Hypotheses • the more references in IPC between two areas,the higher the confusion of humans and computers • the knowledge coded in IPC is, indeed, used by patent categorizers
Testing the hypotheses • If patent examiners take seriously references in IPC • the more references between two symbols, the higher number of co-classification • Practice between two offices can be different • the more references in IPC, the higher likelihood of different decision • Confusion of autocategorizers • more failures if subject matter area is overlapping
Cocategorization vs. IPC references A47,A61 • When there are references in IPC, patent examiners take them seriously • References mark overlapping subject matter areas and/or • References propose the use of secondary (indexing) symbols • On class level, frequency of references in IPC is similar to the frequency of common use of symbols of both classes in patent documents B65 B60-B65 A61 vs C07 and C12 C07-C12 F16 G-H
Differences in examiner’s practice A47,A61 • When there are references in IPC, patent examiners may assign them differently • References mark overlapping subject matter areas • On class level, frequency of references in IPC is similar to differences between selected first symbol (prereform practice, simulate preclassification) B65 B60-B65 A61 vs C07 and C12 C07-C12 F16 G-H
Confusion of autocategorizers A47,A61 • When there are references in IPC, autocategorizers fail more frequently • “first symbol” may be selected differently • On class level, frequency of references in IPC is similar to differences between selected first symbol of an autocategorizer (2002 data, to simulate preclassification) B65 B60-B65 A61 vs C07 and C12 C07-C12 F16 G-H
Conclusion • Reference statistics in IPC • Co-classification • Human classification differences • Preclassification autocategorization errors show similar characteristicson higher levels of IPC • It may be even more important on lower levels, having there more complex rules • Therefore, an easier access to the rules maybe welcome both by human and machine categorizers
Presentation of IPCInfo • An analysis and data preparation was performed as in-house research • defining relevant relation types (about 15 main relations and further ~20) (excerpts below) • parsing IPC scheme, definitions, catchwords and RCL • building relation graph in RDBMS (>1.5 m relations) • The result is presented on a user interface • Convertible to RDF or OWL for further use
Patent taxonomy relations, samples • reference: (transitive!)A01B 1/00 Hand tools (edge trimmers for lawns A01G 3/06)A01G 3/06 Hand-held edge trimmers or shears for lawns (mowers combined with lawn edgers A01D 43/16) • precedence: (over 600 transitive cases, e.g A61M 3/00 A61M 5/00 A61M 36/00 [in definitions!])A01B 3/24 Tractor-drawn ploughs (A01B 3/04 takes precedence)A01B 3/04 Animal-drawn ploughs • limiting: A01N PRESERVATION OF BODIES…; BIOCIDES, e.g. AS DISINFECTANTS, AS PESTICIDES OR AS HERBICIDES; … in Definitions for A01N subclass:Fungicidal, bactericidal, insecticidal, disinfecting or antiseptic paper D21H
Patent taxonomy relations, samples • indexing : guidance heading before A61K 101/00Indexing scheme associated with group A61K 51/00, relating to the nature of the radioactive substance • placerule : note before A01N 25/00, even specifying an exception…In groups A01N 27/00-A01N 65/00, in the absence of an indication to the contrary, an active ingredient is classified in the last appropriate place. • priorities (standardseq): for main groups in IPC where no place rule is applied • cooccurrence: e.g. in catchwords: also the text of IPC mentions the referenceCONDITIONING harvested crops A01D 43/10, A01D 82/00A01D 43/10 with means for crushing or bruising the mown cropA01D 82/00 Crop conditioners, i.e. machines for crushing or bruising stalks (mowers combined with means for crushing or bruising the mown crop A01D 43/10)
Thank you… And keep reading if interested…
Formalization • With mathematical notations • Targeted for audience not familiar with IPC
The ‘patent’ (auto)categorization task • Regular multiclass hierarchic categorization task • Given a hierarchic taxonomy (a patent classification) with categories • Given a set of training documents, each associated to multiple categories…or…an expert knowing both state of the art of the field and the taxonomy • For a document, provide a list of potential categories (preferably with relevance) • Categorization level may be fixed (preclassification) or full • But…
The ‘patent’ (auto)categorization task, but… • Really a regular multiclass hierarchic categorization task? • Taxonomy: text and definitions (manuals or handbooks) and revisions, and therefore: • known relations between categories (rules of classification, e.g. last place rule, takes precedence) • secondary categories, non-primary categories (indexing codes, ‘not used as first symbol’) • some categories excluded for ‘final’ categorization (top levels of the hierarchy) but required in preclassification (where secondary categories cannot be used) • Documents • contain metadata (priorities, inventor, applicant) • various “fields” (title, abstract, description, claims) • some fields are subject of independent categorization (claims), some fields may be use just globally (abstract, description) • Changes: subject matter of a symbol, classification rules and procedures • provided categories may require revisions, since taxonomy can be revised in regular intervals or immediately • e.g. there is no more ‘main classification symbol’ • preclassification may help to reduce the scope but requires handling failures
Notations: Hierarchic taxonomy • Taxonomy: T • Category: C, supercategory: ⊗∉ C • Parent function: p: C→C⋃⊗ function, describing a non-directed tree graph • Ancestors: p+: C→C+ ⋃ ⊗, transitive closure of p • Child function (subcategories): c: C→C* = p-1 • Descendants: c+: C→C* transitive closure of c • Roots of taxonomy (‘sections’): C⊗ ⊂ C ,C⊗ = {r∊ C | p(r) = ⊗}
Notations: Patent taxonomy • Level of category: L, l: C→L (e.g. ‘subclass’) • Classifying category level: Lc⊂ L • Classifying category: Cc⊂ CCc = { c∊ C: l(c)∊Lc } • Non-classifying category: Cc‾ ⊂ CCc‾ = C∖Cc • Category symbol: s: C↔$($ stands for string) • Category sort relation: c1 < c2⇔ s(c1) < s(c2)also min, max applicable for C+ • Category interval: [f,t] = {c∈ C | f ≤ c∧c≤ t } • Usually: descendants form a contiguous interval, i.e. ∀ a ∈ C : d ∈ [ min(c+(a)), max(c+(a))] ⇔ d ∈ c+(a)
Notations: category relations • Relation types: R ⊂ (C → (℘(C) ∪⊗)) • All relations in a taxonomy: TR⊂ C☓C☓R • All relations for a category: r∀ : C→ (R ☓C)* • Obvious relation types in hierarchies: { parent, child, ancestor, descendant } ⊂ Rdefined as parent ≈ p, child ≈ c etc. • Further obvious relation: sibling (s), as child of parent (c) • Interval and set relations: union of the single-category form, e.g. descendant({c1,[c2,c3]}) • result abbreviated as an interval or set: descendant(a) = [min(c+(a)),max(c+(a))]
Patent taxonomy relations on a single version • Invertable relations • Simple reference: category ‘refers’ to another • ‘Takes precedence’ reference • limiting references, very similar to precedence • Allowed indexing symbols on an interval • Precedence relations on siblings • placerule: first place rule or last place rule • priority: siblings prioritized by ‘standardized sequence’ • cooccurrence of references (commutative)
Patent taxonomy relations, samples • reference: (may refer further!)A01B 1/00 Hand tools (edge trimmers for lawns A01G 3/06)A01G 3/06 Hand-held edge trimmers or shears for lawns (mowers combined with lawn edgers A01D 43/16) • precedence: A01B 3/24 Tractor-drawn ploughs (A01B 3/04 takes precedence)A01B 3/04 Animal-drawn ploughs • limiting: A01N PRESERVATION OF BODIES…; BIOCIDES, e.g. AS DISINFECTANTS, AS PESTICIDES OR AS HERBICIDES; … in Definitions for A01N subclass:Fungicidal, bactericidal, insecticidal, disinfecting or antiseptic paper D21H
Patent taxonomy relations, samples • indexing : guidance heading before A61K 101/00Indexing scheme associated with group A61K 51/00, relating to the nature of the radioactive substance • placerule : note before A01N 25/00, even specifying an exception…In groups A01N 27/00-A01N 65/00, in the absence of an indication to the contrary, an active ingredient is classified in the last appropriate place. • priorities (stand.seq.): main groups in IPC where no place rule is applied • cooccurrence: in catchwords: also the text of IPC mentions the referenceCONDITIONING harvested crops A01D 43/10, A01D 82/00A01D 43/10 with means for crushing or bruising the mown cropA01D 82/00 Crop conditioners, i.e. machines for crushing or bruising stalks (mowers combined with means for crushing or bruising the mown crop A01D 43/10)
Patent taxonomy relations, multiple versions • Patent taxonomies change in time • A former category (or a set) may be • transferred to a single or a set of new categories or, it is recognized that the subject matter is • covered by a single or set of existing categories • In the newer version, all the categories which are associated to a single or a set of former categories, are in concordance relation • concordance relation may be computed by transitive traversing category changes over multiple versions
Patent taxonomy relations:concordance relation sample • 2011: B24B 49/00Measuring or gauging equipment for controlling the feed movement of the grinding tool or work; Arrangements of indicating or measuring equipment, e.g. for indicating the start of the grinding operation • 2012: B24B 49/00 B24B 37/005 - 37/015, B24B 49/00B24B 37/005 . Control means for lapping machines or devicesB24B 37/013 . . Devices or means for detecting lapping completionB24B 37/015 . . Temperature controlB24B 49/00 Measuring or gauging equipment for controlling the feed movement of the grinding tool or work; Arrangements of indicating or measuring equipment, e.g. for indicating the start of the grinding operation ( B24B 33/06, B24B 37/005 takes precedence; if applicable to other machine tools, B23Q 15/00-B23Q 17/00 take precedence)
Effect of relations on categorization • A weighted directed graph can be built between categories • Whenever an ‘oracle’ (e.g. a flat categorizer, a fielded search etc.) proposes a category, related categories must be evaluated and verified, may be, in a given order, considering also weights • Training may also benefit from knowing, in advance • order of evaluation, e.g. standardized sequences, priority rules • relations: • to enhance a good hit or suppress a false hit • or co-classifiy