260 likes | 275 Views
Location Terminologies. ASIS&T Annual Meeting Austin, TX November 7, 2006. Agenda. Who we are Overview Using ISO 3166 Accommodating special needs. Who we are: Ron Daniel, Jr. Over 15 years in the business of metadata & automatic classification Principal, Taxonomy Strategies
E N D
Location Terminologies ASIS&T Annual Meeting Austin, TX November 7, 2006
Agenda Who we are Overview Using ISO 3166 Accommodating special needs
Who we are: Ron Daniel, Jr. • Over 15 years in the business of metadata & automatic classification • Principal, Taxonomy Strategies • Standards Architect, Interwoven • Senior Information Scientist, Metacode Technologies (acquired by Interwoven, November 2000) • Technical Staff Member, Los Alamos National Laboratory • Doctoral and post-doctoral research in pattern recognition • Metadata and taxonomies community leadership • Chair, PRISM (Publishers Requirements for Industry Standard Metadata) working group • Acting chair, XML Linking working group • Member, RDF working groups • Co-editor, PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2 reports.
Agenda Who we are Overview Using ISO 3166 Accommodating special needs
Potential facets in the petroleum industry Moderately related to location Strongly related to location Wells Disciplines Facilities Lease Mgmt Orgs. Process Mgmt Reserves Human Resources Maint. Production Content Types E&P Lifecycle Hydro carbon System Geologic Age Basins, Reservoirs & Fields Locations Company Org Company Facets Community Standard Should be part of community standard
Location names serve as surrogates for other things • Company divisions • Company facilities • Regulatory regimes • Currency regions • Product marketing areas • Sales territories • Customer locations
What is a good taxonomy? • A means to an end, and not the end in itself. • Not perfect, but it does the job it is supposed to do—such as improving search and navigation. • Improved over time, and maintained. • Incremental, extensible process that identifies and enables owners, and engages stakeholders. • Quick implementation that provides measurable results as quickly as possible. • Not monolithic—has separately maintainable facets. • Re-uses existing IP as much as possible.
Location names are used as part of different purposes • Typical correspondence and shipping • “Libya” • “South Korea” • Official correspondence with government ministers • “Great Socialist People's Libyan Arab Jamahiriya” • “Republic of Korea” • Corporate division of responsibility • “Western Region” – does that include Montana?
Location terminologies may be used to organize different collections of information ABC Computers.com Content Type Competency Industry Service Product Family Audience Line of Business Region-Country Award Case Study Contract & Warranty Demo Magazine News & Event Product Information Services Solution Specification Technical Note Tool Training White Paper Other Content Type Business & Finance Interpersonal Development IT Professionals Technical Training IT Professionals Training & Certification PC Productivity Personal Computing Proficiency Banking & Finance Communica-tions E-Business Education Government Healthcare Hospitality Manufacturing Petro-chemocals Retail / Wholesale Technology Transportation Other Industries Assessment, Design & Implementation Deployment Enterprise Support Client Support Managed Lifecycle Asset Recovery & Recycling Training Desktops MP3 Players Monitors Networking Notebooks Printers Projectors Servers Services Storage Televisions Other Brands • All • Business • Employee • Education • Gaming Enthusiast • Home • Investor • Job Seeker • Media • Partner • Shopper • First Time • Experienced • Advanced • Supplier All Home & Home Office Gaming Government, Education & Healthcare Medium & Large Business Small Business All Asia-Pacific Canada EMEA Japan Latin America & Caribbean United States
Location terminologies may be used to limit search results • Category • Company • City • State • Salary
Problems with location vocabularies Placenames change over time Codes may be reused over time Familiarity leads to proliferation Many versions of pseudo-standard lists Guessing what the standard will become (e.g. KOS as a code for Kosovo) Approximate alignment between placenames and business functions leads to errors when mapping data from one purpose to another Geopolitical names get applied to sales territories with different company history and importance (e.g. Japan vs. Asia-Pac) Natural messiness of human affairs States vs. Provinces vs. Protectorates, Territories, Possessions, Tribal territories,… Disputed territories (Palestine, Kashmir, Taiwan, Kurdistan) Proto-states (Kosovo, Somaliland) Complexity tradeoff in software Very few invariant properties of countries and their groupings Passions Boycotts and death threats have been received by people who do or do not list particular places in their lists of ‘countries’
Agenda Who we are Overview Using ISO 3166 Accommodating special needs
ISO 3166 is a fundamental vocabulary for dealing with locations • UPS maintains a central World Wide Code Repository (WWCR) to store the metadata used throughout the corporation • Based on the data identified in the enterprise data models • They also have a Corporate Code Table Database, populated via extract files from the WWCR. • These tables contain the complete list of standardized corporate code values for each code type. • Country codes are ISO 3166-1, with local extensions obeying ISO restrictions. • The data modeler for the Corporate Code Table Database is the primary contact from UPS to ISO and the UN with respect to codes for countries. Source: Barbara LaRobardier, “Taxonomy and Metadata at United Parcel Service (UPS): World Wide Code Repository and Corporate Code Tables”; Semantic Technologies Conference, San Francisco, 2005.
ISO 3166 is the world’s most widely-used list of country names 3166 is divided into 3 lists: 3166-1: Countries 3166-2: Sub-regions 3166-3: Changes The lists contain three different codes for the same places: alpha-2 alpha-3 numeric-3 The source for the list is the UN Statistics Division
ISO 3166 codes change, and are even re-assigned! * ISO 3166 first published in 1974. Czechoslovakia dates from 1918.
What is the code for Kosovo? • No code currently exists for Kosovo, but “KS” is unassigned. Should we use it in the expectation that eventually it will be assigned? • No. • To quote from ISO 3166-1:1997, clause 8.1.3 User-assigned code elements: "If users need code elements to represent country names not included in this part of ISO 3166, the series of letters AA, QM to QZ, XA to XZ, and ZZ, and the series AAA to AAZ, QMA to QZZ, XAA to XZZ, and ZZA to ZZZ respectively and the series of numbers 900 to 999 are available."
There are many categories of ISO 3166-1 alpha-2 codes These are reserved for local extensions. Use them when you need a new code! http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/iso_3166-1_decoding_table.html#AW
Agenda Who we are Overview Using ISO 3166 Accommodating special needs
Usual and unusual requirements for handling country names • One client needed to maintain multiple country lists: • ISO 3166 used in most systems • Maintained a separate editorial style list for correspondence and reports • Still other lists were used for statistical information on country subdivisions and multi-country regions • Organization maintained a variety of historical information on countries and regions: • Effective dates for codes were needed (note – dates were for codes within a system, not for the countries) • Mappings from old countries to successors were also needed
Enterprise taxonomy governance environment Web CMS Archives Intranet Search ERMS ’ ’ CVs Other Controlled Items Change Requests & Responses Published Facets Consuming Applications 1: External vocabularies change on their own schedule, with some advance notice. 2: Team decides when to update facets within Taxonomy ISO 3166-1 Vocabulary Management System Other External Notifications 3: Team adds value via mappings, translations, synonyms, training materials, etc. Intranet Nav. ERP DAM Custodians … Other Internal … ’ ’ 4: Updated versions of facets published to consuming applications CV (Controlled Vocabulary) – The list of values for one facet in the Taxonomy. Taxonomy Governance Environment
The client defined a process for country vocabulary changes Notify Board • The different vocabularies had different processes. • Custodians of the different vocabularies communicate so that if one changes, the others know about it. Notify Board
Conclusion • Location terminologies are commonly used • They fulfill many different purposes • Keeping up-to-date is an ongoing effort • The rate of change is low, but ongoing • The issues can be complex • Anything out of the ordinary will not be well-served by off-the-shelf software • Most organizations have a proliferation of pseudo-3166 vocabularies. Start there to get things under control.
Questions? Ron Daniel 925-368-8371 rdaniel@taxonomystrategies.com