190 likes | 294 Views
Building an Augmented Index for Genomic Information Retrieval. Hohyon Ryu , Xiangming Mu, Kun Lu University of Wisconsin-Milwaukee School of Information Science Information Intelligence & Architecture Research Lab. Problems in Genomic Information Retrieval. Introduction.
E N D
Building an Augmented Index for Genomic Information Retrieval HohyonRyu, Xiangming Mu, Kun Lu University of Wisconsin-Milwaukee School of Information Science Information Intelligence & Architecture Research Lab
Augmented Index using Neural Networks Document Collection Neural Networks TF, IDF Baseline Index Training Part of speech Word Location (FO, LO, WD) TF, IDF Keywords and Keyphrases Author-assigned Keyphrase A: information B: science Augmented Index
Keyword Extraction: Learning (Training) Author Assigned Keyword? 1 or 0 Hidden Layer TF*IDF Part of Speech First Occurrence Last Occurrence Word Distribution Various Features of Each Word
Keyword Extraction: Learning (Training) Keyword Suitability Score 0≤x≤1 Hidden Layer TF*IDF Part of Speech First Occurrence Last Occurrence Word Distribution Various Features of Each Word
Text Retrieval Experiment 26 TREC Queries Indri Search Engine (Based on Lemur and Language Modeling) Baseline Index MAP Mean Average Precision Augmented Index MAP Mean Average Precision
Results + 4.54% (df=25, p<0.01) + 3.12% (df=25, p<0.05)
AP Difference by Topic (for top 50 returned documents)
Topic 176 Retrieval by the augmented Index Baseline Retrieval MAP=0.25 MAP=0.32 1 2 … 6 7 8 … 14 15 16 17 18 19 20 1 2 … 6 7 8 … 14 15 16 17 18 19 20 12426234 CFTR: 94cystic: 59degradation: 3fibrosis: 60Sec61: 0 16166089 CFTR:191cystic: 21degradation: 9fibrosis: 21Sec61: 20 Neural network processing 16166089 CFTR:182cystic: 6degradation: 9fibrosis: 6Sec61: 14 12426234 CFTR: 106cystic: 71degradation: 3fibrosis: 72Sec61: 0 Irrelevant Document Relevant Document
Topic 170 Retrieval by the augmented Index Baseline Retrieval MAP=0.87 MAP=1 1 2 3 4 5 … 17 18 19 20 21 … 38 39 40 1 2 3 4 5 … 17 18 19 20 21 … 38 39 40 11799116 CFTR: 247endoplasm: 15reticulum: 16 11799116 CFTR: 238endoplasm: 3reticulum: 4 Neural network processing 15459206 CFTR: 12endoplasm: 5reticulum: 5 15459206 CFTR: 12endoplasm: 5reticulum: 5 Irrelevant Document Relevant Document
Topic 183 Retrieval by the augmented Index Baseline Retrieval 16106028 NM23: 14development: 4gene: 14mutation: 0tracheal: 0 MAP=0.59 MAP=0.56 10952986 NM23: 91development: 1gene: 1mutation: 2tracheal: 0 … 10 … 23 … 30 31 32 33 34 35 36 37 38 … 44 … 10 … 23 … 30 31 32 33 34 35 36 37 38 … 44 Neural network processing 14960567 NM23: 174development: 9gene: 9mutation: 45tracheal: 0 16106028 NM23: 2development: 4gene: 2mutation: 0tracheal: 0 10952986 NM23: 109development: 1gene: 1mutation: 2tracheal: 0 14960567 NM23: 159development: 9gene: 9mutation: 21tracheal: 0 Irrelevant Document Relevant Document