Information Access I An Introduction to Information Retrieval

Information Access IAn Introduction to Information Retrieval GSLT, Göteborg, September 2003 Barbara Gawronska, Högskolan i Skövde

Topics: 1st intensive week: • Introduction • Knowledge representations for IA • Text categorization, indexing (theory + labs) • Text summarization • User aspects 2nd intensive week: • Interactivity • Multilingual systems and resources • Evaluation

Schedule and content Thursday 11/9 8-10 BG: An Introduction to Information Retrieval Central notions: • Information/knowledge/data/metadata • Information Retrieval vs. Data Retrieval • Information Extraction, abstracting, summarization A survey of the history of Information Retrieval (Standard IR-models)

Schedule and content 2 10-12 BG: Representation of information and identification of significant text features (Standard IR-models) Different types of information and knowledge representation: • classical knowledge representation methods ( top-down and bottom-up hierarchical classifications, thesauri) • weighting techniques and co-occurrence-based techniques The notion of Retrieval Status Value (rsv) and methods for rsv-evaluation.

Schedule and content 3 15-17 HD: User aspects in information retrieval and text categorization • presentation of search results using KWIC, Key-word-in-context, • marking up words in documents, • automatic dynamic spell checking of the search query, • term expansion and synonym search • categorization and clustering of texts

Schedule and content 4 Friday 12/9 8-10 HD: Information extraction and automatic text summarization • information extraction techniques for text summarization (statistics, linguistics and heuristics) • a demonstration of SweSum • evaluation of automatic text summarization systems. 10-12, 13-15 ED: Testing automatic indexing with predefined categories (labs)

Shannon’s and Weaver’s definition of information (1959)

Data – Knowledge – Information...(Fuhr 1995, modified)

Information Access • Information Recovery • Information Retrieval (Information recovery /= Information Retrieval ; Information Retrieval includes a selection process) • Information Refinement: • Multilingual Information Retrieval • Information Extraction • Abstracting, summarization

Data Retrieval vs. IR(Rijsbergen 1979)

Data Retrieval vs. IR (2)(the German IR Research Group) IR systems have to handle ”uncertain knowledge” (”unsicheres Wissen”): • Vague queries; reformulation frequently required • The problem of the user’s own understanding of his/hers information need • Limitations of knowledge representations

IR – Main Issues How to • Represent • Interpret • Categorize • Evaluate

The history of IR in brief • The early IR – ”a history of how indexes were created and searched” (Meadow et al 2000:20) • Index – a broad definition: a systematic scheme that places like material together Thus, books arranged in alphabetical order can also be seen as an index

The history of IR in brief (2) • Pre-alphabetical systems: • ca 2700 A.C., the Sumerian culture: grouping by similarity among initial ideograms (birds, bowls, trees...) • ca 1500 A.C. – first phonetic based systems (syllable-based, later – phoneme-based)

The history of IR in brief (3) • First attempts to utilize letter frequency in search: Arabs, 9th century • Categorization of documents in ancient libraries: • Babylonian ”libraries” 12th-7th century A.C.: categories like astronomy, geography, history, mathematics, natural science, laws and...linguistics • The Alexandrian library; Callimachus (310-240 A.C): 8 categories; subject matter and genre as criterions (history, laws, medicine, philosophy, lyric poetry, oratory, tragedy – a catalogue in 120 scrolls - pinakes )

The history of IR in brief (4) • CALLIMACHUS in Áitia: ”A big book is a big disaster...”

The history of IR in brief (5) • 1876 – Melvil Devey, USA – Devey Decimal Classification (DDC) Universal Decimal Classification (UDC, Otlet & Lafontaine) 10 main classes hierarchical organization max 10 branches from 1 node today 130 000 classes

The history of IR in brief (6) • Universal Decimal Classification, an example: 3Social science, laws, administration 33 National economics 336 Finances 336.7 Banking 336.76 Stock exchange 336.763 Share market

The history of IR in brief (7) • Card catalogues – 18th/19th century • How to represent the content of a document in an index? Precoordination vs. postcoordination of index terms

The history of IR in brief (8) • 1950 – • M. Taube – the Uniterm systems • W.E. Batten – the optical coincidence system (in both systems, the TERM serves as the starting point) • C. Moores – the Zatocode system (cards represent DOCUMENTS and are provided with descriptors, coded as series of holes at the edge of the card)

The history of IR in brief (9) • 1950- • First computerized IR systems (special purpose computers): • The Western Reserve Rapid Searching Selector (1957, Shera, Kent & Berry) • Based on human-created telegraphic abstracts • Aimed at technical texts • Semantic categories like product, process, material... • The Minicard Selector (1959, Kessel & DeLucia)

The history of IR in brief (10) • Late 1950s - early 1970s: • first IR-systems on general purpose computers (Bracken & Tillit 1957) • computerized IR should become more than simple string matching: the idea of utilizing word frequencies and inverse document frequency (idf) – Luhn, Bar Hillel • first online IR services (MAC at MIT, MEDLINE, Lexis/Nexis) • The Internet (4 hosts 1967)

IR today and in the future • From simple string matching towards NLP-techniques (statistics/heuristics/morphology/semantics/pragmatics) • Natural language in queries • Integrating speech technology • Multilingual retrieval and extraction • Multimedial retrieval

A General Model of an IR system (Fuhr 1995:11)

A Basic Model of a Document Retrieval System (Fuhr 1995:11)

A document from different perspectives (Meghini et al. 91, modified)

Different aspects of a search

Information Access I An Introduction to Information Retrieval

Information Access I An Introduction to Information Retrieval

Presentation Transcript

Introduction to Information Retrieval (IR)

Introduction to Information Retrieval

Introduction to Information Retrieval

Introduction to Information Retrieval

CSE484 Introduction to Information Retrieval

Introduction to information retrieval

Introduction to Information Retrieval

Introduction to Information Retrieval

Introduction to Information Retrieval

Information Access Category I Information Retrieval

Introduction to Information Retrieval

Introduction to Information Retrieval

Introduction to Information Retrieval

Introduction to Information Retrieval

Introduction to Information Retrieval

Introduction to Information Retrieval

Introduction to Information Retrieval

Introduction to Information Retrieval

Introduction to Information Retrieval

Introduction to Information Retrieval

Introduction to Information Retrieval