370 likes | 420 Views
Classification and Taxonomy. Greg Argo. Brief origins of the organization of information. Large amounts of information became difficult to store and retrieve. Although the classes used vary wildly across cultures, grouping based on the class level is nearly universal.
E N D
Classification and Taxonomy Greg Argo
Brief origins of the organization of information • Large amounts of information became difficult to store and retrieve. • Although the classes used vary wildly across cultures, grouping based on the class level is nearly universal. • Organizational structures provide the context in which humans transform information into knowledge. • It’s not just handy, it’s essential.
Humans classify “with a pronouncedly mental scalpel that helps us carve discrete mental slices out of reality” because “reality is not made up of insular chunks unambiguously separated from one another by sharp divides, but, rather, of vague, blurred-edge essences that often spill over into one another.” -Eviatar Zerubavel (1991)from The fine line: Making distinctions in everyday life
“Cognitive scientists have noticed that much of our mental commerce with an environment deals with classes of things rather than with unique events and objects.” -Mark Stefik (1995) from Introduction to knowledge systems For example, the people seen below could probably all be placed in both the class “Cognitive Scientists” and the class “Nerds”. Can you think of other possible classes? Possible relationships? Clinical vs. academic cognitive scientists? Beards and nerds?
Why consider classification and taxonomy together? • Both are methods for grouping objects or ideas sharing useful, although sometimes superficial, similarities • Both group to make retrieval easier • Both are very basic and pervasive elements of information architecture • It is often difficult to tell them apart • It is often unnecessary to tell them apart
Why tell them apart then? • To become knowledgeable about the different limitations and possibilities in their interaction • Differential demand on and payoff for users • It is important to understand the specific qualities by which each can achieve organizational objectives
Organization Retrieval Controlled vocabulary/thesauri Ambiguous vs. Exact Searching vs. Browsing Content-based vs. User-based Descriptive vs. Navigational Precision vs. Recall Structures vs. Applications Concise vs. Broad Specific qualities presented as keywords and key-dichotomies
Classifications, Taxonomies, and Ontologies -Classifications • Relationships expressed are not essential, but are based on arbitrary, external attributes (color, genre, format, geography, subject, alphabetical order) • Created broadly from the top-down, based on conceptual frameworks • Created by subject experts • Usually don’t change significantly after their creation • Generally applicable to specific domains
Classifications, Taxonomies, and Ontologies -Taxonomies • Relationships expressed are usually essential, based on internal properties of the related pieces of information • Created concisely from the bottom-up from actual content • Created by multidisciplinary teams • Are process-oriented, and so are updated frequently • Oftentimes can be used and reused in different situations and environments • Relationships commonly represented hierarchically • Can be include many classifications connected together
Example of internal properties of taxonomic relationship • All zippers are clothes fasteners • Not all clothes fasteners are zippers • Because of the essential nature of their relationship, zippers is a sub-class of clothes fasteners, and clothes fasteners is a superordinate class of zippers
Classifications, Taxonomies, and Ontologies -Ontologies • Like taxonomies, relationships expressed are also essential • Scope is more overarching due to inclusion of supplemental information • Descriptions and definitions of concepts and their corresponding relationships • Can include many sub-class taxonomies connected together
Classifications, Taxonomies, and Ontologies • Classifications guide users to a body of information • Taxonomies guide users through a body of information • Ontologies guide users in becomingproficient in the retrieval of and understanding of a particular body of information
Classification • To classify something is to identify it as a member of a known class • On the Web, information architects organize classification schemes into either exact or ambiguous schemes • Classification problems begin with data and identify predetermined classes as solutions
Exact classification schemes • Items are categorized mutually exclusively • Useful to users who know exactly what they are looking for • By definition, are easier to create and maintain than ambiguous schemes • Alphabetical, chronological, geographical
Alphabetical schemes • Directories and lists • User must have a good idea of what they are searching for and be able to spell it • On the Web, usually utilized deeper in the scheme inside of sub-sites
Chronological schemes • Have an intuitive advantage for users because they are organized in the same linear scheme in which humans experience the dimension of time • Yearbooks, historical sites, and news headline sites • Ebay offers results organized by a few different types of chronologies
Geographical schemes • Have intuitive appeal to rich spatial faculties and needs of users in their experience of reality • Geographical divisions coincide with governing bodies which restrict and encourage behaviors through law and language • Requires knowledge of geographical divisions and map reading on the part of the user
Ambiguous classification schemes • Items are categorized into intellectually meaningful groups • Useful to users who don’t know quite what information they are searching for • Facilitate iterative, serendipitous learning • Audience-based, Subject-based, Task-based • Each should be based on scheme specific research and development processes (e.g. user and task analyses)
Audience-based classification schemes • Makes sense if the informational domain caters to clearly delineated audiences • Homepage becomes a filter that leads to sub-sites organized some other scheme • Suggests customization/personalization • Recommendations are sometimes powerful, sometimes failures
IA research for audience-based classification schemes • Map services and applications to their appropriate group • Discern what types of technology-use are associated with specific populations • Find points of overlap between audience categories • User research sessions, usage statistics, search log analysis, focus groups, critical incident reports
Subject-based classification schemes • Most immediately recognized are the library classification schemes (DDC, LC) • When used in IA, they generally work best when hybridized with other types of schemes • Are challenging to implement because different words, symbols, and idioms mean different things to different people • Breadth of subjects included should be decided early on because these parameters will affect much of the rest of the IA and content work for the Web site
IA research for subject-based classification schemes • Solicit development team to write down each content item that will be part of site • IA’s perform card sorting exercise to establish initial subject categories • Take it to the user • Further card sorting • Survey with questions about navigation • Continually refine
Task-based classification schemes • Useful for action and transaction related Web sites • Rarely drive a Web site on their own, but are typically embedded deeper as part of a hybrid scheme • Desire of businesses to remove labor costs will likely increase their ubiquity
IA research for task-based classification schemes • The field of usability arose from the need to research the success and value of tools and their applications • Traditional usability tests are a good fit • Analyses of video-taped sessions, navigation logs, heuristic reviews, surveys, critical incident reports
Taxonomies • Information architects have two major types to utilize: descriptive and navigational • They contrast well and each excels for different organizational and user needs • Central ideas include creating hierarchies, controlled vocabularies, and variant/preferred term and synonym relationships • Build on classifications by supporting applications and many different types of content, including images, email, search engines, process funnels, and site registration
Descriptive taxonomies • Operate outside of a user’s immediate awareness • Supplement information retrieval during keyword searching • IA’s create controlled vocabularies and synonym rings which they use to maintain consistency across applications and departments • By analyzing emerging content and search logs, IA’s maintain currency and map alternative terminology used by searchers back to the preferred form
Controlled vocabularies in descriptive taxonomies • Done by attaching tags to content with metadata derived from controlled vocabulary usage logs • The resulting thesaurus with related and variant terms makes a descriptive taxonomy more robust
Using the controlled vocabulary to increase recall or precision • A user’s search can be expanded to increase recall by mapping the search term to its variants • Or a user’s search can be narrowed to increase precision by mapping a user’s term to the preferred term in the controlled vocabulary
More about descriptive taxonomies • Created from the bottom-up • Are called descriptive because they are derived directly from the content that is being used • Data management vocabularies allow workers in disparate domains to report information using the same terminology • Makes it easier for management to mine information from this data in the future
Navigational taxonomies • Have a lot of overlap with exact and ambiguous classification schemes • In contrast to descriptive taxonomies, navigational taxonomies command the user’s conscious awareness • Allow the user to guide the seeking process themselves by browsing instead of searching
Navigational taxonomies cont’d • Created from the top-down based on mental models of users • Hierarchical structures visually imply sequences of events and relationships • These relationships provide context similar to words in a sentence • Works best when users are unsure of what they are seeking
Breadth vs. Depth • Breadth is how many categories are contained in each level • Depth refers to how many levels are contained in the hierarchy • Too broad and shallow causes user too many choices and not enough content • Too narrow and deep causes user to click more than they will stand for • It is best to err on the side of broad and shallow to allow for add-ons and to avoid restructuring the home page
Summary • Distinction is more pronounced in theory than in practice because both are essentially controlled vocabularies structured by logical relationships • Generally, as one moves from classifications to taxonomies to ontologies, the structures, relationships, and supplemental descriptions become more complex
Summary cont’d • Since humans seem to perform all three of these innately, it matters less what they are called than how their elements can be tailored to specific scenarios to improve retrieval of information, consistency of communication, and creation of knowledge
References Adams, K. (2000). Immersed in structure: the meaning and function of taxonomies.Internetworking, 3.2. Retrieved October 25, 2004 from: http://www.internettg.org/newsletter/aug00/article_structure.html Brown, J., & Duguid, P. (2002). The social life of information. Boston: Harvard Business School Press. Conway, S., & Sligar, C. (2002). Unlocking knowledge assets. Redmond, Washington: Microsoft Press. Edols, L. (2001).Taxonomies are what?FreePint, 97, 9-11. Retrieved October 25, 2004 from the FreePint Web site: http://www.freepint.com/issues/041001.pdf Goodall, G. (2003). Business taxonomies and bibliographic objective: Facetation. Retrieved October 25, 2004 from: http://www.deregulo.com/facetation/pdfs/businessTaxomies_goodall.pdf Nielsen, J. (2001). Designing web usability. Indianapolis, IA: New Riders Publishing. Rosenfeld, L., & Morville, P. (2002). Information architecture for the World Wide Web. Cambridge ; Sebastopol, CA: O'Reilly.
References cont’d Shank, P. (2004). Get organized or get lost. OnlineLearningMag. Retrieved October 25, 2004 from: http://www.onlinelearningmag.com/onlinelearning/magazine/article_display.jsp?vnu_content_id=1108349 Stefik, M. (1995). Introduction to knowledge systems. San Francisco: Morgan Kaufmann. Svenonius, E. (2001). The intellectual foundation of information organization. Cambridge, MA: The MIT Press. Taylor, Arlene G. (1999). The organization of information. Englewood, CO: Libraries Unlimited. van Duyne, D. K., Landay, J. A., & Hong, J. I. (2003). The design of sites. Cambridge: Addison-Wesley. van Rees, R. (2003). Clarity in the usage of the terms ontology, taxonomy and classification. CIB73 2003 Conference Paper. Retrieved October 25, 2004 from http://vanrees.org/research/papers/cib2003.pdf Zerubavel, E. (1991). The fine line: Making distinctions in everyday life. New York: Free Press.