1 / 87

Indexing & retrieval

Indexing & retrieval. Approaches to indexing. Key word indexing. Concept indexing. Social indexing. Non-text indexing. Keyword Indexing. Keyword indexing (1). Entity-oriented - draw terms from entity itself. Advantages:. Quick. How. to. succeed. in. graduate. school.

jesperanza
Download Presentation

Indexing & retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indexing & retrieval

  2. Approaches to indexing Key word indexing Concept indexing Social indexing Non-text indexing

  3. Keyword Indexing

  4. Keyword indexing (1) Entity-oriented - draw terms from entity itself Advantages: • Quick How to succeed in graduate school

  5. Keyword indexing (1) Entity-oriented - draw terms from entity itself Advantages: • Quick • Inexpensive • No vocabulary lag • Multiple access points • Accuracy • No intellectual effort needed

  6. Keyword indexing (2) Disadvantages: • No control over synonyms, near synonyms • No control over homographs

  7. Keyword indexing (3) Disadvantages: • Dependent on authors for informative and accurate titles Artificial metalloenzymes based on the biotin−avidin technology: enantioselective catalysis and beyond The golden peaches of Samarkhand

  8. Keyword indexing (4) Disadvantages: • No control over word forms Communicating in the library or Communications in libraries

  9. Keyword indexing (5) Disadvantages: • No cross reference structure

  10. Historical key word indexing methodologies Uniterm cards Edge-notched cards Optical coincidence cards Key word in context (KWIC) Spatial indexing

  11. Pre- versus post-coordinate indexing Mortimer Taube China—Folklore China—History China —Politics France —Folklore France —History France —Politics Germany —Folklore Germany —History Germany —Politics Russia —Folklore Russia —History Russia —Politics (12 terms) China, France, Germany, Russia, Folklore, History, Politics (7 terms)

  12. Post-coordinate index searching History of France → France * History Two sets of documents France History Boolean AND search yields intersection of the two sets France AND History

  13. Advantages to Taube's system No need to develop a list of authorized terms—pulling terms from documents themselves No need to articulate rules of punctuation for representing complex concepts (France—History) No need to delineate citation order (France—history v. History—France) No need to formulate rules for subheadings ("May subdivide geog.")

  14. Uniterm cards One card per term Document no. 102 "Arrest statistics of the Arizona State Police" state 31 102 53 24 75 96 107 68 49 70 34 95 117 59 115 147 109 police 11 102 23 85 96 87 68 49 60 91 115 107 79

  15. Searching with uniterm cards Query: looking for documents about state police state 31 102 53 24 75 96 107 68 49 70 34 95 117 59 115 147 109 police 11 102 23 85 96 87 68 49 60 91 115107 79 102 Arrest statistics of the Arizona StatePolice. 107 A short history of the Wisconsin StatePolice. 115 The modern police state.

  16. Edge-notched cards One card per bibliographic item pet-care Whirdeaux, Ima Caring for your pet pterodactyl / by Ima Whirdeaux Call no. Q54321 .W45 bears Turner, Paige Caring for your pet grizzly / by Paige Turner Call no. Q12345 .T8 pterodactyls

  17. Pyramid coding for edge-notched cards Coding the year 1947* 20 dots 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 dots 9 5 2 0 9 5 2 0 8 4 1 8 4 1 7 3 7 3 6 6 *They hadn't heard of the Y2K problem yet.

  18. Optical coincidence cards Pre-printed cards with numbers for entire database fleas 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

  19. Key Word in Context (KWIC) Index Stop word Stop word Doc 15 title: "A comparison of OCLC and WLN hit rates for monographs and an analysis of the types of records retrieved" CONTEXT ttems of remote users: an hit rates for monograph/A comparison of OCLC and WLN OCLC and WLN hit rates for onographs/ A comparison of arison of OCLC and WLN hit n analysis of the types of s of the types of records phs and an analysis of the A comparison of OCLC and KEY WORDS analysis of the types of comparison of OCLC and WLN hit rates for monographs and / monographs and an analysi/ OCLC and WLN hit rates for rates for monographs and / records retrieved. A com/ retrieved. A comparison / types of records retrieve/ WLN hit rates for monogra/ POINTER 15 15 15 15 15 15 15 15 15 15

  20. Key Word Out of Context (KWOC) Index aardvark 101 baggage 123 banyan 128, 159, 179 coconut 955, 654 driving 196, 488, 788 elementary 455, 785 elephant 128, 465, 783 garage 678, 398 hardware 849, 483, 399 meter 768 nadir 877 noxious 112 opium 289 opus 985, 159, 849 people 629, 458 quark 137, 492 radar 968, 295 radio 430, 206, 749 stereo 294, 837, 873 television 745, 727, 883 ultraviolet 958, 774 zebra 276

  21. Vector space model (VSM) Each document represented by a vector assistive technology Vector for document entitled "Assistive technology for libraries" libraries

  22. Vector space model matching Similarity between query and document vectors assistive Vector for document 1 technology Vector for document 2 Vector for query libraries

  23. VSM term weighting Assign high weights to terms that appear frequently in the document but infrequently in the database Term conclusion information blind Freq. w/in document low high high No. of documents with term high high low Query: "I'm looking for articles about assistive technology for the blind."

  24. VSM refinements Adding semantic and syntactical parsing. Bill is going to the store to make a purchase. Bill is going to purchase the store. Bill is going to storehis purchase.

  25. Concept indexing

  26. Concept indexing • Rather than pulling terms from documents, assign concept identifier (e.g. France—History) to documents dealing with history of France • Requires intellectual effort • Takes more time than key word indexing so less economical • Avoids problems of false coordination and synonymy through use of vocabulary control

  27. Vocabulary control (1) One indexing term or phrase to represent a concept • Unidentified flying objects not flying saucers • Point user to correct term with "use" reference • Reduces number of searches needed to find items about a particular topic

  28. Vocabulary control (2) One form of a word to represent the concept • Dictionaries not dictionary

  29. Vocabulary control (3) One usage of a homographic term • Fault (geologic) not fault (responsibility for error) • Usage identified though scope note • Consistency among indexers as well as one indexer over time • Helps user to avoid false drops

  30. Vocabulary control (4) Syndetic structure • Broader terms • Narrower terms • Related terms (see also) • User can negotiate structure to find most appropriate term, as well as identify additional related terms of potential use in finding relevant documents

  31. Social network indexing • Tags • Tag clouds • User-created tags providing access to library resources

  32. flickr http://www.flickr.com/

  33. Tags

  34. Tags architecture Bohemian South Country Czech Republic Europe European historical medieval old Old Town Other Keywords River Snow town Vltava Tags

  35. Tags

  36. Tags

  37. Tags (177,583 photos)

  38. Tags

  39. Tag clouds

  40. Geotagging

  41. Librarian tagging

  42. Library using flickr

  43. Peace Palace Library (PPL)

  44. Social bookmarking: http://www.delicious.com

  45. http://www.delicious.com/mauicclibrary

  46. http://www.delicious.com/mauicclibrary technology The economic case for open access in academic publishing Portable software for USB drives CU Researcher Finds 10,000-Year-Old Hunting Weapon in Melting Ice Patch

  47. University of Pennsylvania http://www.library.upenn.edu/

  48. PennTags

  49. Item list with PennTags

  50. Adding a PennTag Add to PennTags

More Related