390 likes | 575 Views
Korean script searching in Korean Library OPACs. Junglim Chae Yonsei University. Indexing Method. N-Gram Morphological Analysis. N-Gram Indexing. N-Gram : Unigram, Bigram, Trigram, N-Gram E.g.) 아버지가 방에 들어가신다 12 Index by Bigram Segmentation
E N D
Korean script searching in Korean Library OPACs Junglim Chae Yonsei University
Indexing Method • N-Gram • Morphological Analysis
N-Gram Indexing • N-Gram : Unigram, Bigram, Trigram, N-Gram • E.g.) 아버지가 방에 들어가신다 • 12 Index by Bigram Segmentation • 아버, 버지, 지가, 가0 , 0방, 방에, 에0 , 0들, 들어, 어가, 가신, 신다 • Many index terms-many results but lots of noise • High recall ratio but low precision ratio
Morphological Analysis • Requires a morphological analysis dictionary • E.g.) 아버지가 방에 들어가신다 • Three Index by morphological analysis • 아버지, 방, 들어가다 • Ability to match linguistically similar terms • Faster performance with a smaller index • Accurate matches that meet user expectations • High precision ratio but low recall ratio
A Case Study Yonsei University Library • Library System: Maestro-Y • Search Engine: K2 by Verity • Indexing Method • N-Gram (bigram) + Morphological Analysis • Indexing Rules • Rule1: Divide Strings by space • Rule2: Extract index using bigram indexing method • Rule3: Add the whole string excluding spaces between strings • Rule4: Add words from Korean morphological analysis dictionary
A Case Study Yonsei University Library • E.g.) ‘국어문법의 이해’ • 국어문법의/ 이해(rule1) • 국어, 어문, 문법, 법의, 이해(rule2) • 국어문법의이해(rule3) • 국어문법(rule4) • Index: 국어, 어문, 문법, 법의, 이해, 국어문법, 국어문법의이해
Search Tips(1) • Keyword Search • 키워드검색, 임의검색 • Default Search Option • Use at most 3 keywords • Use Boolean operators • Omit Stop-words
Search Tips(2) • Keyword Search • Follow the Korean Word Division Rules • E.g.) 동해물과 백두산이(O) 동해물과백두산이(X)
Search Tips(3) • Keyword Search • Compound Nouns • do not use spaces between nouns • E.g.) 서울대학교(O), 서울 대학교(X )
Search Tips(4) • Browse Search • Begin with or Truncation • 전방일치검색, 우측절단검색 • When you already know the first word of the title, author, or publisher • E.g.) 한글과
Search Tips(5) • Browse Search • Korean Classics • E.g.) 열여춘향슈절가라
Search Tips(6) • Exact Match • Precise Search • 완전일치검색 • Known items • E.g.) 난중일기
Search Tips(7) • Exact Match • Single character words • E.g.) ‘산’, ‘흙’, ‘C’
Search Tips(8) • Support Hangul/Hancha Searching • E.g.) 中國歷史文選/중국역사문선
Search Tips(9) • Japanese Kana • Archaic Korean • Russian • Special characters : Choose scripts from Multi-language Input Table
Search Tips(10) • Japanese Kana • 日本の歷史/일본の역사/일본노역사 • 日本デザイン論 일본デザイン론 일본데자인론
Search Tips(11) • Personal names • 윤동주 • 이광수 ; 춘원 • Shakespeare ; 셰익스피어 • Murakami, Haruki ; 村上春樹 ; 촌상춘수, 무라카미 하루키
Search Tips(12) • Space • Considered as AND • E.g.) 한국 역사=한국 AND 역사 • In some OPACs, spaces in the character fields do make a difference in retrieval
謝謝 Thank You 감사합니다 ありがとうございます junglim.chae@yale.edu