E N D
1. Thesaurus Management Tools Introduction
Who am I and what do I do
--taxonomy development
--information retrieval
Why this presentation
--needed a tool
--not enough engineering staff for development, so given go-ahead to purchase out-of-the-box software
--created spreadsheet comparing characteristics of candidate products
Plan
--discuss controlled vocabulary concepts
--why buy?
--what to consider
--product taste testing
Introduction
Who am I and what do I do
--taxonomy development
--information retrieval
Why this presentation
--needed a tool
--not enough engineering staff for development, so given go-ahead to purchase out-of-the-box software
--created spreadsheet comparing characteristics of candidate products
Plan
--discuss controlled vocabulary concepts
--why buy?
--what to consider
--product taste testing
2. Definitions
3. Why control? People just don’t say or spell things the same way
People just don’t say or spell things the same way
4. From a recent gathering of variants in our query logs
From a recent gathering of variants in our query logs
5. Relationships
6. Hierarchical
7. Equivalent
8. Associative
9. classification
10. classification
11. classification IE Call letters
IE Controlled Vocabularies
IE Call letters
IE Controlled Vocabularies
12. Controlled Vocabularies
13. Authority Lists
14. Gazetteers
15. Gazetteers
16. Glossaries
17. Taxonomies vs. Thesauri Both are hierarchical (trees) and usually have associative and equivalent relationships as well
Both have applications for indexing, navigation, and search
Both typically are built with a specific topic area or collection in mind
Both are hierarchical (trees) and usually have associative and equivalent relationships as well
Both have applications for indexing, navigation, and search
Both typically are built with a specific topic area or collection in mind
18. Taxonomies vs. Thesauri
19. Information Access
20. Information Access
21. Information Retrieval
22. Trees & Webs
23. Topic Maps Based on traditional indexing concepts.
--Knowledge structures: topics and relations/associations
--Information Resources: occurences
Just like SGML was originally developed for print publishing (this is a header, this is body text), topic maps originally conceived for representing indexes for complex information. Evolved into a navigational aid that encompasses the characteristics of taxonomies and thesauri, with particular utility for electronic documentation.
Topic map sits above the occurrences. Is not built in response to a body of documents. It is a stand-alone structure to which occurrences attach. Nodes. Based on traditional indexing concepts.
--Knowledge structures: topics and relations/associations
--Information Resources: occurences
Just like SGML was originally developed for print publishing (this is a header, this is body text), topic maps originally conceived for representing indexes for complex information. Evolved into a navigational aid that encompasses the characteristics of taxonomies and thesauri, with particular utility for electronic documentation.
Topic map sits above the occurrences. Is not built in response to a body of documents. It is a stand-alone structure to which occurrences attach. Nodes.
24. Topic Maps
25. Topic Maps Variety of relationships/associations available. Not limited by three traditional.Variety of relationships/associations available. Not limited by three traditional.
26. Why buy? Government and Politics vs. Politics and Government
Classic Rock
1970s Music
Classic Rock
Government and Politics vs. Politics and Government
Classic Rock
1970s Music
Classic Rock
27. Why buy?
28. Why buy? Variety of products starting at less than $500
Average full-time worker:
--$50,000 and $100,000 per year or $4167 to $8333 per month.
--Bureau of Labor Statistics National Compensation Survey: $28-$39 per hour (plus benefits, capital expenses, and other forms of compensation)
Quickly exceed budget
Can be painful as you have to live with developing productVariety of products starting at less than $500
Average full-time worker:
--$50,000 and $100,000 per year or $4167 to $8333 per month.
--Bureau of Labor Statistics National Compensation Survey: $28-$39 per hour (plus benefits, capital expenses, and other forms of compensation)
Quickly exceed budget
Can be painful as you have to live with developing product
29. Choices --Automated:
*little or no human intervention, usually uses rules or training sets, derives vocabulary from
collection itself
*Sometimes comes with its own built-in vocabulary – very broad.
--Manual
*You do all the work
*Only automatic characteristic is cross-checking references, global changes, report-generating,
and sometimes spell-checking, etc.
--Bundled
*Vocabulary module as part of a larger classification/management package.
*However, sometimes the vocabulary module can be purchased separately.
--Stand-alone
*The product does vocabulary management only.
--Single:
*Can mean only one workstation (or client)
*Can mean data generally is stored on that workstation (although it can be stored on a server
* means that only one user at a time can use the tool (no collision monitoring available)
--multi-user
*many users at one time (collisions detected and managed)--Automated:
*little or no human intervention, usually uses rules or training sets, derives vocabulary from
collection itself
*Sometimes comes with its own built-in vocabulary – very broad.
--Manual
*You do all the work
*Only automatic characteristic is cross-checking references, global changes, report-generating,
and sometimes spell-checking, etc.
--Bundled
*Vocabulary module as part of a larger classification/management package.
*However, sometimes the vocabulary module can be purchased separately.
--Stand-alone
*The product does vocabulary management only.
--Single:
*Can mean only one workstation (or client)
*Can mean data generally is stored on that workstation (although it can be stored on a server
* means that only one user at a time can use the tool (no collision monitoring available)
--multi-user
*many users at one time (collisions detected and managed)
30. Tasks Vocabulary construction and maintenance
Obvious
Editing, creating
Reporting
Term usage
Term history
Search and indexing
Exposed to end users for querying and browsing
Exposed to indexers for term assignment
Candidate term suggestionVocabulary construction and maintenance
Obvious
Editing, creating
Reporting
Term usage
Term history
Search and indexing
Exposed to end users for querying and browsing
Exposed to indexers for term assignment
Candidate term suggestion
31. Criteria Technical
*Operating system, platform
*database software or off-site storage
*Technical support: availability?
*Who is the developer? Are there IS people on staff?
Pricing and licenses
*one time purchase or yearly
*maintenance fees
*price of new versions or other updates?
*Extra services for cost? Customization? formatting and
importing existing thesaurus?
Acceptance
*who uses it? Widely adopted?
*is it a new product? Well –tested?
*product reviews
*can you contact current users?
Technical
*Operating system, platform
*database software or off-site storage
*Technical support: availability?
*Who is the developer? Are there IS people on staff?
Pricing and licenses
*one time purchase or yearly
*maintenance fees
*price of new versions or other updates?
*Extra services for cost? Customization? formatting and
importing existing thesaurus?
Acceptance
*who uses it? Widely adopted?
*is it a new product? Well –tested?
*product reviews
*can you contact current users?
32. Criteria Documentation
*Printed?
*Online? Searchable?
*call center? 24/7?
User experience
*interface: can you look at it all day?
*usability: easy to use, not needing a million clicks to
accomplish a task, navigation
*input style: drag and drop? All manual typing?
*accessibility for disabled persons?
*error and feedback messaging understandable? Cryptic?
*confirmation messages before major changes?
Data integrity
*backup copies to roll back?
*administrative access levels: read only, limit who can add
and delete?
Documentation
*Printed?
*Online? Searchable?
*call center? 24/7?
User experience
*interface: can you look at it all day?
*usability: easy to use, not needing a million clicks to
accomplish a task, navigation
*input style: drag and drop? All manual typing?
*accessibility for disabled persons?
*error and feedback messaging understandable? Cryptic?
*confirmation messages before major changes?
Data integrity
*backup copies to roll back?
*administrative access levels: read only, limit who can add
and delete?
33. Criteria Structural
*field character limits and data types
*pre-defined fields and relationship types
*user defined fields and relationships?
*Notation?
*limit levels (depth)?
*polyhierarchical or multiple relationships between terms, such
as a term being synonymous to more than one preferred term?
Editing
*how easy to change status or relationships of a term?
*deletion. Global? Is term archived or completely removed?
*automatic relationship validation? spell-checking?
Importing, Exporting, Reports
*special import format?
*mapping for heterogeneous or multilingual vocabularies
*import/export formats: proprietary or standard? MARC? ASCII? XML?
*report configurations: KWIC & KWOC? Alpha, Hierarchical?
By dated added or last edited? By notation?
*user/use statistics?
Structural
*field character limits and data types
*pre-defined fields and relationship types
*user defined fields and relationships?
*Notation?
*limit levels (depth)?
*polyhierarchical or multiple relationships between terms, such
as a term being synonymous to more than one preferred term?
Editing
*how easy to change status or relationships of a term?
*deletion. Global? Is term archived or completely removed?
*automatic relationship validation? spell-checking?
Importing, Exporting, Reports
*special import format?
*mapping for heterogeneous or multilingual vocabularies
*import/export formats: proprietary or standard? MARC? ASCII? XML?
*report configurations: KWIC & KWOC? Alpha, Hierarchical?
By dated added or last edited? By notation?
*user/use statistics?
34. Products Referenced in This Presentation
35. The End