150 likes | 303 Views
Making Watson Fast. Daniel Brown HON111. Need for Watson to be fast to play Jeopardy successfully All computations have to be done in a few seconds Initial application speed: 1-2 hours processing time per question.
E N D
Making Watson Fast Daniel Brown HON111
Need for Watson to be fast to play Jeopardy successfully • All computations have to be done in a few seconds • Initial application speed: 1-2 hours processing time per question
Unstructured Information Management Architecture (UIMA): framework for NLP applications; facilitates parallel processing • UIMA-AS: Asynchronous Scaleout • UIMA chosen at start for these reasons; other optimization work only began after 2 years (after QA accuracy/confidence improved)
UIMA implementation of DeepQA • Type System • Common Analysis Structure (CAS) • Annotator • CAS multiplier (CM): creates new “children” CASes • Flow Controller • CASes can be spread across multiple systems (processed in parallel) for efficiency
Scaling out • Two systems: • Development (+question processing) • Meant to analyze many questions accurately • Production (+speed) • Meant to answer one question quickly
Scaling out: UIMA-AS • (UIMA-AS: Asynchronous Scaleout) • Manages multithreading, communication between processes necessary for parallel processing • Feasibility test: simulated production system with 110 processes, 110 8-core machines • Goal: less than 3 seconds; actual: more than 3 seconds • Two sources of latency: CAS serialization, network communication • Optimizing CAS serialization resulted in runtime of <1s
Scaling out: Deployment • 400 processes, 72 machines
How to find time bottlenecks in such a system? • Monitoring tool • Integrated timing measurements (in flow controller component)
RAM Optimizations • Wanted to avoid disk read/write time delays, so all (production system) data was put into RAM • Some optimizations: • Reference size reduction • Java object size reduction • Java object overhead • String size • Special hash tables • Java garbage collection with large heap sizes • *Full GC between games
Indri Search Optimizations • Indri search: used to find most relevant 1-2 sentences from Watson database • Using single processor, primary search takes too long (i.e. 100s) • Supporting evidence search even longer • Solution? • Divide corpus (body of information to search) into chunks, then assign each search daemon a chunk • (specifically, 50GB corpus of 6.8 million documents, 79 chunks of 100000 documents each, 79 Indri search daemons with 8 CPU cores each; end result, 32 passage queries could be run at once)
Preprocessing and Custom Content Services • Watson must first analyze the passage texts before being able to use them • Deep NLP analysis - semantic/structural parsing, etc. • Since Watson had to be self-contained, this analysis could be done before run time (preprocessed) • Used Hadoop (distributed file system software) • 50 machines, 16GB/8 cores each
Preprocessing and Custom Content Services • Retrieving the preprocessed data? • Preprocessed data much larger than unprocessed corpus (~300GB total) • Built custom content server – allocated data to 14 machines, ~20GB each • Documents then were accessed from these servers
End result • Parallel processing combined with a number of other performance optimizations resulted in a final average latency of less than 3 seconds. • No one “silver bullet” solution