470 likes | 482 Views
Learn how to effectively organize content using taxonomy and metadata design. This includes identifying objectives, inventorying content, specifying metadata and vocabularies, and developing procedures and workflows.
E N D
Tagging, Interfaces & Content Organization Infrastructures Joseph A Busch, Principal
Who I am • Over 25 years in the business of organized information • Founder & Principal, Taxonomy Strategies • Director, Solutions Architecture, Interwoven • VP, Infoware, Metacode Technologies • Program Manager, Getty Foundation • Manager, Pricewaterhouse • Assistant Director for Technical Services, Hampshire College • Chief, Technical Services, Paul Weiss Rifkind Wharton & Garrison • Metadata & taxonomies community leadership. • President, American Society for Information Science & Technology • Trustee, Dublin Core Metadata Initiative • Co-Founder, Networked Knowledge Organization Systems/Services • Adviser, National Research Council Computer Science and Telecommunications Board • Reviewer, National Science Foundation Division of Information and Intelligent Systems
Recent & current projects Government Commercial Not-for-Profit
What I do Organize Stuff
For us, taxonomy work includes: • Metadata specification defines the properties needed to describe content so that it can be found & used. • Vocabularies are collections of terms that are used to specify some of the metadata properties. • Some vocabularies are big and hierarchical, some are small and flat. • An application profile specifies what metadata & vocabularies are required, and then represents them formally.
1 Identify Objectives Conduct interviews 2 Inventory Content ID sources, spider assets & extract metadata 3 Specify Metadata Define fields & purpose 4 Model Content Define content chunks & XML DTDs 5 Specify Vocabularies Compile controlled vocabularies 6 Specify Procedures Develop workflow, rules & procedures 7 Train Staff Develop materials & train staff Seven phases of taxonomy and metadata design
Subject metadata – What, Where & Why: Subject, Type, Coverage Use metadata – When & How: Date, Language, Rights Complexity Asset metadata – Who: Identifier, Creator, Title, Description, Publisher, Format, Contributor Relational metadata – Links between and to: Source, Relation Enabled Functionality Use metadata to support core purposes • Metadata can be used to provide enough information for any user, tool, or program to find out everything needed to find and apply any piece of content. http://dublincore.org/documents/dces/
Subject metadata – What, Where & Why: Subject, Type, Coverage Use metadata – When & How: Date, Language, Rights Complexity Asset metadata – Who: Identifier, Creator, Title, Description, Publisher, Format, Contributor Relational metadata – Links between and to: Source, Relation Enabled Functionality Use metadata to support core purposes • Metadata can be used to provide enough information for any user, tool, or program to find out everything needed to find and apply any piece of content. Better navigation & discovery More efficient editorial process http://dublincore.org/documents/dces/
Agenda • Tagging • Interface • Content Organization
Tagging Overview • Tagging is better than the words that happen to occur in a piece of content. • All tagging is useful • End user tagging • Tagging by librarians • Automated tagging by OS and algorithms • Content should be tagged throughout its lifecycle, each time the content is handled and used so that it accrues value or its significance is diminished.
MS Office: File Properties How many people fill this in?
Flickr: Organize How many people click on this?
Agenda • Tagging • Interface • Content Organization
Requirements for a tagging interface • Automated form fill-in (automatically fills in known data) • Tagging precedents (see tags already assigned by others) • Controlled vocabularies, e.g., with pull-down list • Multi-valued tags • Geo-tagging • Group tagging • Clean-up tag tools, e.g., alpha list • Batch editing • Share/Don’t share (Public/Private) • Identified owner (who can be emailed) • Almost immediate feedback, e.g., tag cloud
Form fill-in: Automatically filled-in known data Manual form fill-in w/ check boxes, pull-down lists, etc. Auto keyword & summarization
Form fill-in: Automatically filled-in known data Auto-categorization Rules & pattern matching Parse & lookup (recognize names)
Bulk Tagging • ID collection of related content items by pattern or context • Then, apply same attributes to all content items
Tag a folder • Drag & drop content items into folder • Then, content items inherit properties of folder
Create Content Add Metadata Publish Review & Improve Review & Improve Workflow • Approve & improve mindset
Interactive rewards • Almost instantaneous exposure of tags in simple user interfaces on the web provides positive reinforcement for user tagging that simply did not exist before. • For example, • Most popular • Tag clouds • Alerts
Most popular • Another example is most emailed from, e.g., the NY Times.
Alerts • New (content selected by date) • Subscriptions (content selected by tags) • Interest (content selected by other people) • Individual (content selected for you by other people)
Agenda • Tagging • Interface • Content Organization
Content organization models: The Information Architect • Saul Wurman’s 5 ways to categorize things • By location (spatially) • By alphabet (alphabetically) • By time (chronologically) • By category (subject) • By hierarchy (BT/NT, etc) Richard Saul Wurman. Information Architects (1996)
Content organization models: The Records Manager • Archives & business records • By function (business purpose) • By genre (document type) Brands & Varieties Events Ingredients Locations Nutrients Organizations Functions Doc Types Account Listings Acquisitions Cash Disbursements Cash Receipts Contract Accounting Records Credit Advices Credit Card Charges Donations Employee Expense Reports Invoices Petty Cash Records Permits & Licenses Plans & Forecasts Royalty Payments Sales Receipts Accounting Administration Environment Finance Human Resources Legal Marketing & Sales Plant Operations Projects Public Relations Research & Development Tax Treasury
Content organization models: The Product Manager • Management (for general business operational purposes) • By products and services Systems Peripherals Services Support My Account Handhelds Monitors Printers Projectors TVs Parts All Electronics & Accessories CRT Monitors LCD Monitors All-in-One & Photo Printers B/W & Multifunction Laser Printers Color Laser Printers Ink & Printer Accessories LCD TVs Plasma TVs Desktop Accessories Notebook Accessories Digital Photography Handhelds Memory Monitors MP3 Players Networking Power Printers & Ink Projectors Software & Games Storage & Drives TVs & Home Theater
Content organization models: Marketer • Marketing & sales • By psycho social profiles such as lifestyle stages, personas, etc. • By industry • By location Audience Intention Lifecycle Industry Location Age Group Aisles Business Consumer Financial Risk Service Standard Inquiry Research Support Upgrade Pre-Sales Early Life Purchase Experience & Sales Process Set Up / Installation Billing Experience Support Retain & Renew Construction & Building Field Services Finance & Insurance Financial Services Government Healthcare Higher Education Hospitality Services Insurance K-12 Education Manufacturing Professional Services Real Estate Retail Transportation & Distribution Regions ZIP Code
Content organization models: Editor • Editorial • By content lifecycle Social Aspects of Digital Libraries: Final Workshop Report (Nov 1996) http://is.gseis.ucla.edu/research/dl/UCLA_DL_Report.doc
Faceted taxonomy theory & practice • How many terms are needed to provide sufficient granularity? • Not as many as you think • Post-coordinate indexing allows several simple controlled vocabularies to be combined, rather than using a single large pre-coordinated vocabulary.
The power of faceted taxonomy • 4 independent categories of 10 nodes each have the same discriminatory power as one hierarchy of 10,000 nodes (104) • Easier to maintain • Easier to tag by content authors • Can be easier to navigate Audience Health Industry Substance Advocacy Contractors & Grantees Environmental Professionals Federal Facilities General Public Industry Kids Researchers & Scientists Small Business Students Advisory Exposure Food Safety Health Assessment Health Effect Health Risk Occupational Health Pesticide Effects Sun Protection Toxicity Agriculture & Cattle Automobile Repair Chemical Dry Cleaning Electronics & Computer Energy Extractive Industries Food Processing Leather Tanning & Finishing Metal Finishing Allergen Biological Contaminant Carcinogen Chemical Explosive Liquid Waste Microorganism Ozone Pesticide Radioactive Waste
Impact on collection size by increasing number of terms per facet
Facetted tagging • How well can end users (content authors) do this? • Incentives help such as almost instantaneous feedback (AIF) • Importance of workflow (new slide?) • Tagging & re-tagging throughout content life cycle • Show graphic of content lifecycle (from UCLA NSF workshop?) • Approve & improve mindset • Test & improve
Summary • There are lessons to be learned from web tagging about how to get good metadata in document and content management applications. • Document and content management system tagging must be simple, and it must be almost instantaneously easier to find relevant work products.
Questions? Joseph A. Busch415-377-7912, jbusch@taxonomystrategies.com
Tagging Overview • Tagging, any kind of tagging is better than the words that happen to occur in a piece of content. End user tagging is useful, so is tagging by librarians, as are tags automatically assigned by operating systems and language processing algorithms. Content should be tagged throughout its lifecycle, each time the content is handled and used so that it accrues value or its significance is diminished. • Almost instantaneous exposure of tags in simple user interfaces on the web provides positive reinforcement for user tagging that simply did not exist before. It should not be surprising that a good user interface improves usability. • As content users flock to websites that help to organize the content on the web, advertisements and value added content services follow. The bottleneck in the semantic web has been not enough tagged content. The end user tagging revolution may begin to address this shortcoming. • There are lessons to be learned from web tagging about how to get good metadata in document and content management applications. Document and content management system tagging must be simple, and it must be almost instantaneously easier to find relevant work products.