1 / 15

Making Watson Fast

Making Watson Fast. Daniel Brown HON111. Need for Watson to be fast to play Jeopardy successfully All computations have to be done in a few seconds Initial application speed: 1-2 hours processing time per question.

amma
Download Presentation

Making Watson Fast

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making Watson Fast Daniel Brown HON111

  2. Need for Watson to be fast to play Jeopardy successfully • All computations have to be done in a few seconds • Initial application speed: 1-2 hours processing time per question

  3. Unstructured Information Management Architecture (UIMA): framework for NLP applications; facilitates parallel processing • UIMA-AS: Asynchronous Scaleout • UIMA chosen at start for these reasons; other optimization work only began after 2 years (after QA accuracy/confidence improved)

  4. UIMA implementation of DeepQA

  5. UIMA implementation of DeepQA • Type System • Common Analysis Structure (CAS) • Annotator • CAS multiplier (CM): creates new “children” CASes • Flow Controller • CASes can be spread across multiple systems (processed in parallel) for efficiency

  6. Scaling out • Two systems: • Development (+question processing) • Meant to analyze many questions accurately • Production (+speed) • Meant to answer one question quickly

  7. Scaling out: UIMA-AS • (UIMA-AS: Asynchronous Scaleout) • Manages multithreading, communication between processes necessary for parallel processing • Feasibility test: simulated production system with 110 processes, 110 8-core machines • Goal: less than 3 seconds; actual: more than 3 seconds • Two sources of latency: CAS serialization, network communication • Optimizing CAS serialization resulted in runtime of <1s

  8. Scaling out: Deployment • 400 processes, 72 machines

  9. How to find time bottlenecks in such a system? • Monitoring tool • Integrated timing measurements (in flow controller component)

  10. RAM Optimizations • Wanted to avoid disk read/write time delays, so all (production system) data was put into RAM • Some optimizations: • Reference size reduction • Java object size reduction • Java object overhead • String size • Special hash tables • Java garbage collection with large heap sizes • *Full GC between games

  11. Indri Search Optimizations • Indri search: used to find most relevant 1-2 sentences from Watson database • Using single processor, primary search takes too long (i.e. 100s) • Supporting evidence search even longer • Solution? • Divide corpus (body of information to search) into chunks, then assign each search daemon a chunk • (specifically, 50GB corpus of 6.8 million documents, 79 chunks of 100000 documents each, 79 Indri search daemons with 8 CPU cores each; end result, 32 passage queries could be run at once)

  12. Preprocessing and Custom Content Services • Watson must first analyze the passage texts before being able to use them • Deep NLP analysis - semantic/structural parsing, etc. • Since Watson had to be self-contained, this analysis could be done before run time (preprocessed) • Used Hadoop (distributed file system software) • 50 machines, 16GB/8 cores each

  13. Preprocessing and Custom Content Services • Retrieving the preprocessed data? • Preprocessed data much larger than unprocessed corpus (~300GB total) • Built custom content server – allocated data to 14 machines, ~20GB each • Documents then were accessed from these servers

  14. End result • Parallel processing combined with a number of other performance optimizations resulted in a final average latency of less than 3 seconds. • No one “silver bullet” solution

More Related