160 likes | 284 Views
Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein. Presented by SUN Jun. Overview. N-gram LMs A short review of LM implementation Trie Array: implicit Trie This work: Combination of Multiple techniques Implicit Encoding of query word Variable length encoding for compression
E N D
Faster and Smaller N-Gram LMsAdam Pauls and Dan Klein Presented by SUN Jun
Overview • N-gram LMs • A short review of LMimplementation • Trie • Array: implicit Trie • This work: Combination of Multiple techniques • Implicit Encoding of query word • Variable length encoding for compression • Speed up for decoder
Back-Off LM • LM: An n-gram LM represents the probability of a word sequence, given history • Back-Off LM: Trust the highest order language model that contains n-gram
Implementation of Back-off LM • File based • Trie • Reverse Trie • Array-a: implicit Trie • Array-b: implicit Trie with reverse index to parent
This paper • This work: Combination of Multiple techniques • Implicit Encoding of query word • Variable length encoding for compression • Speed up for decoder
Implicit Encoding of query word • Sorted array
Implicit Encoding of query word • Hash Table
Implicit Encoding of query word • We can exploit this redundancy by storing only the context offsets in the main array, using as many bits as needed to encode all context offsets (32 bits for Web1T). • In auxiliary arrays, one for each n-gram order, we store the beginning and end of the range of the trie array in which all (wi; c) keys are stored for each wi.
Speed up decoder • Repetitive Queries • By cache • Scrolling Queries