850 likes | 1.03k Views
Dynamic Classification Workshop. Roadmap & Quality Metrics. Claude Vogel. Outline = Roadmap. Definitions Step by step Phase 1 Taxonomy design [QA] Implementation & Tests Lexicon extraction [QA] Meta data generation [QA] Phase 2 Classification design [QA] Implementation & Tests
E N D
Dynamic Classification Workshop Roadmap & Quality Metrics Claude Vogel
Outline = Roadmap • Definitions • Step by step • Phase 1 • Taxonomy design [QA] • Implementation & Tests • Lexicon extraction [QA] • Meta data generation [QA] • Phase 2 • Classification design [QA] • Implementation & Tests • Portal generation [QA] • Conclusion
Your Problem • Hit lists are inefficient • Information is unstructured • Information structure is irrelevant
Define “Find” • I’m looking for an “APARTMENT in CARLSBAD” Apartment Studio Carlsbad Oceanside Oceanside • I end up with a STUDIO in OCEANSIDE • “Find” is a result, not a starting point • Find is not: Search + Retrieval system • Find is a dynamic process
Relate available information to OUR decision-making processes
Dynamic Classification Rationale: Associate a semantic signature to structured and unstructured sources, then use this semantic representation to slice n’ dice sources. • Example 1 : Endeca • Meta-data index • Parametric classification • Example 2: Convera • Taxonomic index • Topical classification
Reduce Complexity Domestic Sales and Marketing ? Jobs and Marketing ?
Bonus Domestic Sales Marketing Jobs Categorize…
Bonus Domestic Sales Marketing Jobs …And Classify! Domestic Sales and Marketing ?
Bonus Domestic Sales Marketing Jobs …And Classify Again! Jobs and Marketing ?
TAGS Leverage K-Assets
Africa Munitions Genus to species Somalia Bombs Categories = Essential Knowledge “A reasonably stable definition of the basic components of the world”
Africa Missiles Whatever Missiles Africa Classification = Accidental Knowledge “A relevant answer to a practical problem”
A Twofold Process • Taxonomy driven categorization • Steady • Accurate • Scalable • Classification driven user interface • Flexible • Relevant • Focused
Glossary • Paradigmatic models • Ontology, Taxonomy • Practical models • Inventory, Catalog, Classification
Mammals Carnivora Canidae Canids Boxer … It stands about 56 to 61 cm (about 22 to 24 in) high and weighs about 30 kg (about 66 lb) Source: Microsoft Encarta. The Semiotic Triangle Concept Word Reference “Boxer” Boxer
Taxonomy Lexicon Catalog Lexicon, Taxonomy, Catalog
Ontology • An ontology is a foundation of categories representing a view of the world. An ontology reflects the commonly used and trusted breakdown of categories. For example, the breakdown of news items into categories of ‘World’, ‘Sports’, ‘Politics’, etc. is ontological.
Taxonomy • A taxonomy is a hierarchical system describing genera and species. Species derive from a common genus and are hierarchically represented according to their essential characteristics and differences. For example, animals are categorized with the "Taxonomy of Life" which separates mammals from birds and spiders from insects, based on proper features and relative differences. This genus to species nomenclature is highlighted by terminology which moves from generic terms to binomial terms through lexical derivation and compounding. • A taxonomy doesn’t deal with things, but with the essence of things: a taxonomy is based on an ontology.
Inventory, Catalog • Inventory • List of things which stand for themselves, as they are, where they are. • Catalog • Consolidated inventory, introducing for that purpose some kind of elementary classification. • In both cases, the things listed have a unique and non-ambiguous name: e.g. URL, serial number, etc.
Classification • Arrangement of things according to some of their properties • Arrangement of types of things according to some of their properties Multiple classification systems might combine multiple ontologies in multiple ways. Things might have multiple locations in any given classification.
Glossary • ANSII/NISO Z39.19-1993 • A thesaurus is a controlled vocabulary arranged in a known order and structured so that equivalence, homographic, hierarchical, and associative relationships among terms are displayed clearly and identified by standardized relationship indicators that are employed reciprocally. • The primary purposes of a thesaurus are (a) to facilitate retrieval of documents and (b) to achieve consistency in the indexing of written or otherwise recorded documents and other items, mainly for postcoordinate information storage and retrieval systems.
Outline = Roadmap • Definitions • Step by step • Phase 1 • Taxonomy design [QA] • Implementation & Tests • Lexicon extraction [QA] • Meta data generation [QA] • Phase 2 • Classification design [QA] • Implementation & Tests • Portal generation [QA] • Conclusion
Terrorism Geography Weapons Vertical Cartridges PlugandPlay
Africa Algeria Angola Asia Afghanistan Armenia Europe Albania Andorra Middle East Bahrain Iran North and Central America Antigua and Barbuda Bahamas Pacific Australia Fiji South America Argentina Bolivia U.S. Alabama Alaska Example 1: Geography
Example 2 : Defense Defense Communications Satellite Communications Tactical Communications Defense Systems Air Defense Antiaircraft Defense Systems Gun Air Defense Systems Antimissile Defense Systems Forward Area Air Defense Systems Terminal Defense Aircraft Defense Systems Antisubmarine Defense Systems Antiswimmer Defense Systems Countermeasures Acoustic Countermeasures
Taxonomy Design Canon Ordnance Unique Beginner Fire Control Systems Life Form Sights Generic Gun Sights Specific Radar Gun Sights Varietal
Mass Nouns • Linnaeus: Higher taxa are artefacts: “ An order is a subdivision of classes needed to avoid placing together more genera than the mind can follow.” Philosophia Botanica • Some life-form categories are created to group objects together. Terms associated to these are often mass nouns (versus count nouns) like “furniture”: “a kind of things of different kinds made by people to etc.”
Person Unwelcome person Unpleasant person Selfish person Opportunist Backscratcher Synonyms (WordNet)
Cycles • Life-form Genus • Species Life-form (mass noun) Genus (having derivate forms) Species (derivates from genus)
Ontology Vacuum Acceptance Product Acceptance Accountability Social Responsibility Social Investing Accountants Public Accountants Cpas Attorney Cpas Accounting Firms Big Five Accounting Firms Big Six Accounting Firms
Unbalanced derivation Acceptance Product Acceptance Accidents Accident Prevention Aircraft Accidents and Safety Air Traffic Control Hijacking Boating Accidents and Safety Construction Accidents and Safety Electrocutions Falls Firearm Accidents and Safety Household Accidents and Safety Nuclear Accidents and Safety Occupational Accidents Industrial Accidents Occupational Safety Indoor Air Quality Railroad Accidents and Safety Ship Accidents and Safety Lighthouses Swimming Accidents and Safety Drownings Traffic Accidents and Safety Hit and Run Accidents
Tax payers Assets Liabilities Individuals Organizations Debts Loans Assoc. Corporations Split Paradigms in Multiple Taxonomies Tax items