1 / 96

Text information storage and retrieval and the CDS/ISIS program

***. Text information storage and retrieval and the CDS/ISIS program. Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussel, Belgium. ***. What is a database?.

Download Presentation

Text information storage and retrieval and the CDS/ISIS program

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. *** Text information storage and retrieval and the CDS/ISIS program Paul NIEUWENHUYSEN pnieuwen@vub.ac.be University Library, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussel, Belgium

  2. *** What is a database? • A database is a collection of similar data records stored in a common file (or collection of files).

  3. *** Software type =information retrieval software • Software for information storage and retrieval (ISR software) • Text(-oriented) database management systems (Text-DBMS) • Text information management systems (TIMS) • Document retrieval systems • Document management systems

  4. *** Information retrieval: via a database to the user Informationcontent Linear file Inverted file Database Search engine User Search interface

  5. *** Information retrieval: the basic processes in search systems Information problem Text documents Representation Representation Query Indexed documents Evaluation and feedback Comparison Retrieved documents

  6. *** Information retrieval systems: many components make up a system • Any retrieval system is built up of many more or less independent components. • These components can be modified to increase the quality of the results more or less independently.

  7. *** Information retrieval systems: important components the information content system to describe formal aspects of information items system to describe the subjects of information items concrete descriptions of information items = application of the used information description systems information storage and retrieval computer program(s) computer system used for retrieval type of medium or information carrier used for distribution

  8. *** Information retrieval systems: the information content • The information content is the information that is created or gathered by the producer. • The information content is independent of software and of distribution media. • The information content is input into the retrieval system using • a system (rules) to describe the formal aspects • a system (rules) to describe the contents (classification, thesaurus,...)

  9. *** Information retrieval systems: media used for distribution • Hard copy (for information retrieval systems only in the broad sense) • Print • Microfiche • For computers: (for information retrieval systems strictu sensu) • Magnetic tape • Floppy disk; optical disk (CD-ROM, CD-i, Photo-CD,...) • Online

  10. *** Information retrieval systems: the computer program The information retrieval program consists of several modules, including: • The module that allows the creation of the inverted file(s) = index file(s) = dictionary file(s). • The search engine provides the search features and power that allow the inverted file(s) to be searched. • The interface between the system and the user determines how they (can) interact to search the database (using menus and/or icons and/or templates and/or commands).

  11. *** What determines the results of a search in a retrieval system? • the information retrieval system ( = contents + system) • the user of the retrieval system and the search strategy applied to the system Result of a search

  12. *** Characteristics / definition of structured text-information • The text information is structured.(files, records, fields, sub-fields, links/relations among records,...) • The length of records and fields can be “long”. • Some fields are multi-valued, i.e. they occur more than once.

  13. *** Layered structure of a database Database File Records Fields Characters + in many systems:relations / links between records

  14. *** Structure of a bibliographic file Record No. 1 Title Author 1: name + first name Author 2: ... Source Descriptor 1 Descriptor 2 ... Record No. 2 Sub- fields Repeated fields

  15. *** Thesaurus: description • Thesaurus = • system to control a vocabulary + • the contents of this vocabulary • Thesaurus program = program to create, manage, modify and/or search a thesaurus using a computer

  16. *** Thesaurus relations Term(s) with broader meaning BT (= Broader Term) RT (Related Term) UF (= Use For) Other term(s) TermSynonym(s) NT (= Narrower Term) Term(s) with narrower meaning

  17. *** Thesaurus applications • To find/choose index terms to add these to items, when terms are taken from a controlled vocabulary • To find more and/or better terms to search a database (to increase recall and precision) • To find more and/or better terms during writing • To understand the meaning of a term, by inspecting • the scope note of the term and/or • the relations with other terms

  18. *** Database systems: why study this subject briefly ? • To achieve a better understanding of the inner workings of the external information retrieval systems that you use, so that you can exploit these more efficiently • To be able to evaluate the quality of database systems you are confronted with, so that you can • make better choices among available systems, • offer constructive suggestions to the manager, • ...

  19. **- Database systems: why study this subject in detail? To acquire the knowledge and skills to create / set up / manage your own local database system on a computer

  20. *** Database systems: definition A database (management) system is a program or set of programs, providing a means by which a user can easily store and retrieve data in the form of “databases”.

  21. **- Information retrieval software: related terms • Software for information storage and retrieval (ISR software) • Text(-oriented) database management systems (Text-DBMS) • Text information management systems (TIMS) • Document retrieval systems • Document management systems

  22. **- Information retrieval software: applications (Part 1) • Documentation centres • Archives • Libraries • Musea • Medical files • Marketing departments • Schools • Bibliographic databases Documents Archived documents Books / Documents Objects / Books / ... Patient’s histories Clients / Potential clients Courses / Teachers Publications / ...

  23. **- Information retrieval software: applications (Part 2) • Meeting calendars • Product information • Laboratories • Personal documentation • Patent office • Co-operating information networks • ... Meetings = conferences Product descriptions Recipes Documents Patents Documents / Persons / Institutes / Events / ...

  24. **- Cataloguing: hard copy versus computer-based • Hard copy • “Input” , i.e. cataloguing, on cards determines directly the “ouput”, i.e. the format of the data on the card as presented to the user • Summarized: INPUT=OUTPUT • Computer-based • Input in the database in fields allows later output in various formats for presentation • Summarized: 1. INPUT, 2. various OUTPUTs

  25. *** Text-information management systems: characteristics and definition The information in the database is text oriented.Therefore, several features are required: • ability to store relatively long blocks of texts • ability to retrieve items in which specific words or terms occur anywhere

  26. **- Text-information management: from free-form to structure Free form text information without structure Text database with information structured in files, records, fields, sub-fields, with links/relations among records,...(Ideally, each fields is repeatable = can be multi-valued, = can occur more than once in each record.)

  27. Software type Word processing software Free-form or structured text information database software Features Must be learnt anyway.Slow sequential searching. Additional software to be purchased and learnt.Fast searching via index(es). *** Text-information management: types of software

  28. **- Advantages of structuredtext-retrieval versus X-base systems Feature • Many long fields, forming long records • Repeatable fields • Subfields • Variable field lengths • Fast searching any word in all fields • Thesaurus to help searching Text-retrieval Yes Yes Yes Yes Yes Yes X-base systems No No No No No No

  29. *** Hierarchy in the use of a database Database structure Input / Editing Searching / Output

  30. *** Functions of database management software • Input / edit using keyboard or batch input • Indexing of the database(s) • Browse / Search / Select / Retrieve data from database • Output (Sort / Display / Print to file / Print to paper) + • Export / Import

  31. *** !? Question !? Task !? Problem !? Which advantages offers a document management system on computer?

  32. *** Advantages of a document system on computer, for the user(s) • Access to information is easier. • Access to information is faster. • Online access is possible even when centre is closed. • Online access is possible from a distance. • Integration in search module with data on loan status. • More elements of the records can serve as search term. • Combinations of search terms can be used. • Results /selections can be stored as computer files.

  33. **- The CDS/ISIS text database management program • Software to create and manage local, in-house databases with primarily structured text as contents (NOT numbers, graphics, sound,...) • Versions available for • Mainframes (IBM) • Minicomputers (Digital VAX) • Microcomputers (DOS )

  34. *-- Micro-CDS/ISIS: original main menu on the display

  35. *-- CDS/ISIS database definition services: display menu

  36. *-- CDS/ISIS database definition table: display of an example

  37. *-- CDS/ISIS manual data entry, editing / input services: display menu

  38. **- Batch input / Import • Is batch input possible? • Is a format conversion program included or available? • ...

  39. **- Activities related to indexing • Activity • Intellectual, human indexing • Develop an automatic indexing method • Automatic indexing Who does it? Database producer / Thesaurus producer Database producer / Software features Computer with program Concrete action Attribute subject terms to records Making an index method file Making inverted file(s)

  40. **- Indexes in books and databases: a comparison Book Database Index_term_1 page x1, y1, z1,... Index_term_2 page x2, y2, z2,... ... Printed Invisible • Index_term_1 record nr. x1 / field type nr. x1 / field occurrence x1 / position x1 • record nr. y1 / field type nr. y1 / field occurrence x1 / position y1 • ... • Index_term_2 record nr. x2 / field type nr. x2 / field occurrence x2 / position x2 • record nr. x2 / field type nr. x2 / field occurrence x2 / position x2 • ... • ...

  41. **- Index in a text retrieval system (such as CDS/ISIS) Terminology: Index = Inverted file = Dictionary database dictionary on display database complete inverted file

  42. **- Methods of inverted file creation Æ Word indexing J Simple / automatic / no indication required L Loss of word context J A field structure is not required Æ Phrase indexing L Indication of phrases during input is required J Richer than separate words J A field structure is not required Æ Field indexing J Simple / automatic / no indication required J Context is better preserved L A field structure is required

  43. *-- CDS/ISIS inverted file services: display menu

  44. **- Automatic indexing (file inversion) • Word indexing? with proximity indexing? • Field indexing? • Sub-field indexing? • Phrase indexing? Æ Maximum length of index entry? Æ List of stopwords available? Æ Immediately after input or in batch? (Slow down...?) Æ Indexing speed? Æ Adding prefixes/tags possible? Æ Modification of indexing possible? Possible? Obligatory?

  45. **- !? Question !? Task !? Problem !? Why can the index of a database be so large in comparison with the size of the database?

  46. *-- CDS/ISIS information retrieval services: display menu

  47. *-- CDS/ISIS information retrieval: example of a dictionary on the display

  48. **- Output from a database to various “devices” • to video display • to printer • to computer file (“printing” to a file) =< ;

  49. *-- CDS/ISIS output (sorting and printing) services: display menu

  50. **- Formatting of data within each record in output • Independent of output device: • Determine the sequence of the fields in each record. • Omit specific fields from each record. • Add field names or tags to the fields in each record. • Indicate the search term(s) in each record. • Dependent of output device: • Specify character formats in each (sub)field: typeface + size + bold/italic/underline

More Related