260 likes | 400 Views
Advance Information Retrieval Topics. Hassan Bashiri. Information Filtering. Agenda. Information filtering Automatic profile learning Social filtering Training Strategies. Information Access Problems. Different Each Time. Retrieval. Information Need. Data Mining. Filtering. Stable.
E N D
Advance Information Retrieval Topics Hassan Bashiri
Agenda • Information filtering • Automatic profile learning • Social filtering • Training Strategies
Information Access Problems Different Each Time Retrieval Information Need Data Mining Filtering Stable Stable Different Each Time Collection
Agenda • Inverted indexes • Computational complexity
An Example Postings Term Inverted File Doc 3 Doc 1 Doc 2 Doc 4 Doc 5 Doc 6 Doc 7 Doc 8 aid 0 0 0 1 0 0 0 1 4, 8 AI A all 0 1 0 1 0 1 0 0 2, 4, 6 AL back 1 0 1 0 0 0 1 0 1, 3, 7 BA B brown 1 0 1 0 1 0 1 0 1, 3, 5, 7 BR come 0 1 0 1 0 1 0 1 2, 4, 6, 8 C dog 0 0 1 0 1 0 0 0 3, 5 D fox 0 0 1 0 1 0 1 0 3, 5, 7 F good 0 1 0 1 0 1 0 1 2, 4, 6, 8 G jump 0 0 1 0 0 0 0 0 3 J lazy 1 0 1 0 1 0 1 0 1, 3, 5, 7 L men 0 1 0 1 0 0 0 1 2, 4, 8 M now 0 1 0 0 0 1 0 1 2, 6, 8 N over 1 0 1 0 1 0 1 1 1, 3, 5, 7, 8 O party 0 0 0 0 0 1 0 1 6, 8 P quick 1 0 1 0 0 0 0 0 1, 3 Q their 1 0 0 0 1 0 1 0 1, 5, 7 TH T time 0 1 0 1 0 1 0 0 2, 4, 6 TI
The Finished Product Term Inverted File Postings aid 4, 8 AI A all 2, 4, 6 AL back 1, 3, 7 BA B brown 1, 3, 5, 7 BR come 2, 4, 6, 8 C dog 3, 5 D fox 3, 5, 7 F good 2, 4, 6, 8 G jump 3 J lazy 1, 3, 5, 7 L men 2, 4, 8 M now 2, 6, 8 N over 1, 3, 5, 7, 8 O party 6, 8 P quick 1, 3 Q their 1, 5, 7 TH T time 2, 4, 6 TI
Agenda • Cross-language IR • Controlled vocabulary • Automatic indexing • Free text • Evaluation • User interface design
What is CLIR? Users enter their query in one language and the search engine retrieves relevant documents in other languages. English Query French Documents Retrieval System
Cross-Language Text Retrieval Query Translation Document Translation Text Translation Vector Translation Controlled Vocabulary Free Text Knowledge-based Corpus-based Ontology-based Dictionary-based Term-aligned Sentence-aligned Document-aligned Unaligned Thesaurus-based Parallel Comparable 11
Agenda • Query interface • Selection interface • Examination interface • Document delivery
Retrieval System Model User Query Formulation Detection Selection Index Examination Indexing Docs Delivery
Query Formulation User Query Formulation Detection Index
The Different Levels of Language Analysis 1-Phonetic or Phonological Level 2-Morphological Level 3-Syntactic Level 4-Semantic Level 5-Discourse Level
How Information Retrieval Works Step 1: Document Processing Step 2: Query Processing Step 3: Query Matching Step 4: Ranking & Sorting
What Is Different From IR? • IR is more concerned with words and concepts. • IIR or KBIR is more concerned about relations. • Most of IR models assume term independence. • IIR or KBIR acknowledges existence of relationships. • IR more suitable for large scale and general retrieval • IIR or KBIR more suitable for domain specific tasks.
IIR-KBIR • Expectation or Interaction With User • Objects • KB • Relation Between the objects • Reasoning • Learning • Relation Extraction
Retrieval Models Investigated • Fuzzy Logic • MMM, Paice • Vector Space • Probabilistic, BM25 • N-Grams • Combinational