600 likes | 1.08k Views
Classification Systems. Spring 2006, 3 April Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of Tennessee. Objectives: to understand different subject access methods to compare these methods Part I. Controlled Vocabulary
E N D
Classification Systems Spring 2006, 3 April Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of Tennessee
Objectives: to understand different subject access methods to compare these methods Part I. Controlled Vocabulary In UTK OPAC, select subject index to browse April First and Holidays. Look at the LC Authority Records for the two concepts to understand the structure of the controlled vocabulary: authorized heading, lead-in terms (Use For), narrower terms, broader terms, and the corresponding LCC number (similar to the Relative Index in DDC). Look at the use of the heading Holidays in pre-coordinated subject cataloging in UTK OPAC: What types of subdivisions are being used? Find examples for topical subdivision, geographical subdivision, chronological subdivision, and form subdivision. Browse the list forward (Next Page button) and backward (Previous Page button) to see how various holidays (New Year’s Day and Thanksgiving Day) are dispersed in the alphabetical listing. Are the headings in near proximity always related concepts? Part II: Classifications Take a tour of DDC at http://www.oclc.org/dewey/resources/tour/default.htm Read the comparison of DCC and LCC, both enumerative classifications, at http://staff.oclc.org/~vizine/Intercat/vizine-goetz.htm Read “Was Ranganathan a Yahoo!?” about the colon classification, a facet classification at http://scout.wisc.edu/Projects/PastProjects/toolkit/enduser/archive/1998/euc-9803.html Assignment 4: Subject Access
Report The report should read like a well-organized essay. No need to answer the specific questions above; just use the results you obtained as examples to illustrate or back up your arguments. You must use some examples from the above activities to make your points and to show that you gained some understanding while performing the above tasks. Your essay should have sections that include the following parts: Summarize your understanding of the roles of controlled vocabularies in providing subject access to intellectual works Summarize your understanding of the roles of classifications in organizing information objects in physical libraries Compare classification systems with alphabetical subject headings or thesauri (controlled vocabulary) in providing subject access (pros and cons) Discuss the new roles of controlled vocabularies and classifications in organizing electronic resources on the Web Assignment 4: Subject Access
Subject Analysis and Classification • Subject analysis: Is part of creating metadata that deals with the conceptual analysis of an information object to determine what it is about, and • Translating “aboutness” of an info object to create controlled vocabulary terms for subject headings and classification notations
Knowledge Classification • A logical system for the organization of knowledge • The division of knowledge into classes usually is based on disciplines • Classes are arranged into a hierarchical and coherent framework
Knowledge Classification: Multistage Process • Identifying property of interest • Distinguishing objects that possess that property or those which lack it • Grouping objects that have the property into one class • Identifying relationships between classes • Finding distinctions within classes to arrive at subclasses Classical theory: From general to specific Problems???
Fuzzy Set Theory (Lotfi Zadeh) • Some categories are well defined, others are not • Continuum of property rather than discrete marks • If categories defined by properties members share, then no member should be “better” than the others (prototypes) • Categories should be independent of humans doing the categorization • Ad hoc categories: on the spur of the moment
Natural sciences Philosophy Literature Chemistry Math Astronomy Physics …… Plants Geometry Algebra Classificatory Structure (Tree) This may be arranged using indentation as seen often in printed schedules
Philosophy Natural sciences Literature **************** Natural sciences - Math - Astronomy - Physics - Chemistry …… - Plants Literature 1 Natural sciences - Math -- Algebra -- Geometry - Astronomy …… 201 Classification (Print Format)
Natural sciences 100 800 Philosophy 500 Literature Chemistry Math Astronomy Physics …… Plants 540 510 520 530 580 …… Geometry 516 Algebra 512 Linearization using notations The linear order of these concepts using numeric notation 100 ... 500 510 512 ... 516 ... 520 ... 530 ... 540 ... 580 ... 800
Algebra 512 Astronomy 520 Chemistry 540 Geometry 516 Literature 800 Math 510 Math 510 Natural sciences 500 Philosophy 100 Physics 530 Plants 580 Classification vs. Alphabetical Order 100 (Philosophy) ... 500 (Natural sciences) 510 (Math) 512 (Algebra) ... 516 (Geometry) ... 520 (Astronomy) ... 530 (Physics) ... 540 (Chemistry) ... 580 (Plants) ... 800 (Literature)
Algorithm for Browsing and Searching • Traverse the hierarchical (tree) structure is a top-down process • At each level of hierarchy the searcher must select one node to expand to the next level • Think about how you find information using a Web directory: what is the path?
Library Classification • A way that helps organize information objects by grouping subjects in the manner which is most useful to the users • The most-used systems are • LCC: Library of Congress Classification • DDC: Dewey Decimal Classification Hierarchical Enumerative: attempt to assign designation for every subject concept needed in the system LCC more enumerative than DDC UDC: now faceted
Classification Schemes • Verbal description (topic by topic) of things/concepts that can be represented • Arrangement of verbal descriptions in classed or logical order • Notational system alongside each verbal description (schedules) • Cross-references for navigation within the schedules • Alphabetical index of terms used in schedule (and synonyms) • Instructions for use • Organization that maintains classification scheme
Ranganathan’s Colon Classification: A Faceted Approach • Parts of the whole: faces of a diamond • Notations for subparts strung together • 5 fundamental categories of a subject • Personality (focal or most specific subject) • Material • Energy (activity, operation, process) • Space (place) • Time e.g., design of wooden furniture in eighteenth century America Faceted indicators: not convenient for shelves Convenient in the age of the Internet. Why?
Library Classification: Functions • Arrange items in a logical manner on the shelves • Locate known work through call number: shared mark on item and catalog • Collocate “like” items: chosen property is subject • Provide systematic display of bibliographic entries in printed catalogs, indexes, etc. • Help in direct retrieval
Basics • Successive stages of classes and subclasses with a chosen property as the basis of each stage • Hierarchical tree structure: Genus and species • Facets, arrays, chain, citation order
Classification Concepts • Broad vs. Close Classification • Classification of Knowledge vs. Classification of a Particular Collections (Literary warrant) • Integrity vs. Keeping Pace with Knowledge • Fixed vs. Relative Location • Closed vs. Open Stacks • Location Device (call number) vs. Collocation Device (classification notation)
Library of Congress Classification A -- GENERAL WORKS B -- PHILOSOPHY. PSYCHOLOGY. RELIGION C -- AUXILIARY SCIENCES OF HISTORY D -- HISTORY (GENERAL) AND HISTORY OF EUROPE E -- HISTORY: AMERICA F -- HISTORY: AMERICA G -- GEOGRAPHY. ANTHROPOLOGY. RECREATION H -- SOCIAL SCIENCES J -- POLITICAL SCIENCE K -- LAW L -- EDUCATION M -- MUSIC AND BOOKS ON MUSIC N -- FINE ARTS P -- LANGUAGE AND LITERATURE Q -- SCIENCE R -- MEDICINE S -- AGRICULTURE T -- TECHNOLOGY U -- MILITARY SCIENCE V -- NAVAL SCIENCE Z -- BIBLIOGRAPHY. LIBRARY SCIENCE. INFORMATION RESOURCES (GENERAL)
Library of Congress Classification • Subclass B Philosophy (General) • Subclass BC Logic • Subclass BD Speculative philosophy • Subclass BF Psychology • Subclass BH Aesthetics • Subclass BJ Ethics • Subclass BL Religions. Mythology. Rationalism • Subclass BM Judaism • Subclass BP Islam. Bahaism. Theosophy, etc. • Subclass BQ Buddhism • Subclass BR Christianity • Subclass BS The Bible • Subclass BT Doctrinal Theology • Subclass BV Practical Theology • Subclass BX Christian Denominations
Library of Congress Classification Subclass B • B1-5802 Philosophy (General) • B69-99 General works • B108-5802 By period (Including individual philosophers and schools of philosophy) • B108-708 Ancient • B720-765 • Medieval B770-785 • Renaissance B790-5802 • Modern B808-849 Special topics and schools of philosophy • B850-5739 By region or country • B5800-5802 By religion Subclass BC • BC1-199 Logic • BC11-39 History • BC25-39 By period • BC60-99 General works
LCC—Some Features Notations • lack of built-in hierarchy • alphanumeric--linearization Advantages • comprehensive • flexible • inclusive • adaptive / hospitable Cons • difficult to search hierarchically
Dewey Decimal Classification • From the divine to the mundane (except 000) • Choosing decimals for its categories, allows purely numerical and infinitely hierarchical • Faceted classification: combines elements from different parts of the structure to construct a number representing the subject content • Except for general works and fiction, works are classified principally by subject, with extensions for subject relationships, place, time or type of material, producing classification numbers of not less than three digits but otherwise of indeterminate length with a decimal point before the fourth digit, where present • Classmarks are to be read as numbers, in the order: 050, 220, 330.973, 331 etc.
Dewey Decimal Classification Main classes=>divisions=>sections The system is made up of ten categories: • 000 Computers, information and general reference • 100 Philosophy and psychology • 200 Religion • 300 Social sciences • 400 Language • 500 Science and mathematics • 600 Technology • 700 Arts and recreation • 800 Literature • 900 History and geography 330 for economy + 94 for Europe = 330.94 European economy; 973 for United States + 005 form division for periodicals = 973.005, periodicals concerning the United States generally
Dewey Decimal Classification • 000 Generalities 001 Knowledge 002 The book 003 Systems 004 Data processing Computer science 005 Computer programming, programs • 006 Special computer methods 007 Not assigned or no longer used 010 Bibliography 011 Bibliographies 012 Bibliographies of individuals 200 Religion 201 Philosophy of Christianity 202 Miscellany of Christianity 203 Dictionaries of Christianity 204 Special topics 205 Serial publications of Christianity 206 Organizations of Christianity 207 Education, research in Christianity 208 Kinds of persons in Christianity 209 History & geography of Christianity 210 Natural theology 211 Concepts of God 212 Existence, attributes of God • 100 Philosophy & psychology 101 Theory of philosophy 102 Miscellany of philosophy 103 Dictionaries of philosophy 104 Not assigned or no longer used 105 Serial publications of philosophy 106 Organizations of philosophy 107 Education, research in philosophy
Comparison of DDC & LCC • Knowledge • Arabic numerals • Universal • Uneven classes • Logical placement of subjects • Developer (“generalist”) • Mnemonic • Literary warrant • Alphanumeric • American • Hospitable • Logical hierarchies often lost • Developer (“Specialists”) • Confusing notation
Every book is given a unique call number to serve as an address for locating the book on the shelf Call Numbers LLC Call number has two parts— (Library of Congress Classification or Dewey Decimal Classification) and the Cutter number or book number
Every book is given a unique call number to serve as an address for locating the book on the shelf Call Numbers DDC Call number has two parts--Dewey Decimal Classification and the Cutter number or book number CUTTER NUMBER for a book usually consists of the first letter of the author's last name and a series of numbers (from a table designed to help maintain an alphabetical arrangement of names). Conley, Ellen C767 Conley, Robert C768 Cook, Robin C77Cook, Thomas C773 How do we keep the call number unique if the library has several works by the same author? 813.54 Cook, Robin C77aAcceptable Risk C77fFever C77faFatal Cure work mark or work letter
Call Numbers DDC 813.54 Farthest shore L52f Ursula Le Guin 813.54 Four ways to forgiveness L52fo Ursula Le Guin 813.54 Planet of Exile L52p Ursula Le Guin 813.54 Approaches to the Fiction of Ursula Le Guin L52Z James Bittner B54 813.54 is the Dewey number for American Literature after 1945, L52Z is the Cutter number for Ursula Le Guin, Z is for a work of criticsm, B54 is for James Bittner, the author of Approaches..... The capital Z the last letter in the alphabet, insures that all criticisms are shelved after the author's work
Assign Call Numbers • Select appropriate class number from the schedule • Add auxiliary number from tables or based on rules to extend the class number • Add cutter number as book mark (use cutter tables)
Call Numbers using LCC • QE534.2.B64 Call numbers can begin with one, two, or three letters • The first letter of a call number represents one of the 21 major divisions of the LCC System. In the example, the subject "Q" is Science. • The second letter "E" represents a subdivision of the sciences, Geology. All books in the QE's are primarily about Geology. • Books in categories E, United States History, and F, Local U.S. History and American History, do not have a second letter (exception: in Canada, FC is used for Canadian history). • Books about Law, K's, can have three letters, such as KFH, Law of Hawaii. Some areas of history (D) also have three-letter call numbers.
Call Numbers using LCC Numbers after letters. • The first set of numbers in a call number help to define a book's subject. "534.2" teaches us more about the book's subject. The range QE 500-625 are books about "Dynamic and Structural Geology" • Books with call numbers QE534.2 are specifically "Earthquakes, Seismology - General Works - 1970 to Present" • One of the most frequently used number in call numbers is "1" which is often used for general periodicals in a given subject area. • For example, Q1.S3 is the call number for the journal Science. • Journals are also given call numbers based on the specific subject. • For example, QE531.E32 is the call number for the journal Earthquake Spectra as QE531 is the call number for periodicals about "Earthquakes, Seismology"
Call Number using LCC • QE534.2.B64, the B64 is taken from the two-number table and represents the author's last name, Bruce A. Bolt. • The book is Earthquakes. • Some books have two Cutters, the first one is usually a further breakdown of the subject matter. • QA 76.76 H94 M88 is a book located in the Mathematics section of the Q's. • QA 76 is about Computer Science • The ".76" indicates Special Topics in Automation • "H94" tells us that this is a book about HTML • "M88" represents the last name of the first author “Musciano” • The book is HTML: The Definitive Guide
Call Number using LCC • Class mark: Letters Numbers Decimal ... • Cutter numbers: Letters plus numbers • --single cutter as a book mark • --double cutters • a first Cutter number as class extension by topic geographic etc.; • a second Cutter number as book number
Call Number in MARC • 050 00 $a Q184 $b I87 • 050 00 $a QA76.9.C64 $b C36
Application -- One • How to organize periodicals on the shelves? • Method 1. Alphabetical by title • Method 2. Classification • Pros and cons of each method?
Application – Two • How to organize monographs in a series? • Method 1. ASIST conference proceedings as a monograph series (see record 1) • Method 2. as a serial: journals or magazines (see record 2)
Definitions • Serials: publication issues in successive parts that is intended to continue indefinitely • Monograph series contain individual objects that are complete bibliographic units (not intended to be continued indefinitely) • Pros and cons of each practice?
Well organized subject headings -- beyond listing • Medical Subject Headings MeSH)http://www.nlm.nih.gov/mesh/2005/MeSHtree.A.html
Purpose of Classification • Provides meaningful subject access via retrieval tool • Provides collocation of objects of a like nature (Cutter) • Provides a logical location for similar objects • Saves user time
Purpose of Classification Because books are classified by subject, you can often find several helpful books on the same shelf, or nearby
Other subject access tools • Facet -- synthetical classification was developed to overcome the limitations of enumerative hierarchical classifications to allow combination of classes • Taxonomy -- organization or subject oriented: classification of things, or the principles underlying the classification • Ontology -- building shareable knowledge structures (among people, computer, …): "What are the fundamental categories of being?"
Semantic Web • is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation. The first steps in weaving the Semantic Web into the structure of the existing Web are already under way. In the near future, these developments will usher in significant new functionality as machines become much better able to process and "understand" the data that they merely display at present. • ---Tim Berners-Lee, etc. Scientific America, May 17, 2001