140 likes | 288 Views
Text Analytics on UIMA and UIMA Semantic Search Engine. ISM209 David Lewis Student Project Presentation 2006-12-05. What. Learn about UIMA UIMA Origins and Applications UIMA Architecture and Components Juru extended For XML Document Search Demonstration. UIMA Origins and Goals.
E N D
Text Analytics on UIMA and UIMA Semantic Search Engine ISM209 David Lewis Student Project Presentation 2006-12-05
What • Learn about UIMA • UIMA Origins and Applications • UIMA Architecture and Components • Juru extended For XML Document Search • Demonstration
UIMA Origins and Goals • Developed by IBM Research over 4 years • Offered by IBM as open source EOY05 • DeveloperWorks –WebSphere production • AlphaWorks – Early adopters • Source Forge – Handoff In Process • “Bridge from the unstructured word to the structured world” • “UIMA SDK supports development, discovery, composition and deployment of multi-modal analytics for the analysis of unstructured information”
UIMA Applications • WebSphere Information Integrator OmniFind Edition (search engine) • Lotus Notes search • DARPA UIMA Working Group (WWW mining) • Unstructured Information Management (UIM) Research and Instruction • CMU, Stanford, UMass Amherst • Others • SAIC, BBN, Mayo Clinic, MITRE Corp • “14 Software Vendors” (press in open source announcement
Architecture and Components • UIMA Framework - run-time environment • UIMA SDK – all Java implementation of framework with Eclipse IDE integration
Components • UIMA Framework Core • Externalized Framework Plug-ins • Common Annotation Structure (CAS) • Type System (Person, Organization, Bank, etc) • Document Annotator, Analysis Engines • Collection Processing Engine • CAS Sources and Sinks • Resource and Configuration Manager, Logger, etc
Aggregate Analysis Engines • Analysis engines may be composed into aggregate engines • Analysis Engine Assembler • Distributed execution support
UIMA Tools and Utilities • CAS Save/Restore • Configuration Editors • Annotation Viewer • CAS Visual Debugger • Document Analyzer • Graphical tool for applying analysis engines and viewing results • Juru-based Semantic Search Engine
Exploiting Analysis Results • Semantic Search • Contribute analysis results (CASs) to “Juru” XML search engine indexer • Typed-entity recognizers (e.g., name-entity) • XML Fragments query language • Database Insert/Update Stream • Contribute analysis results to database
Juru Search Engine Extensions for XML • Extended Vector Space Model • Compound index items: ( context, word ) • Cosine distance with context • Relaxed match on context (context resemblance measure)
Demonstrations • Running an Analysis Engine • Building Collection Processing Engine • Running Semantic Search
References • UIMA SDK Users Guide Reference • http://dl.alphaworks.ibm.com/technologies/uima/UIMA_SDK_Users_Guide_Reference.pdf • An Extension of the Vector Space Model for Querying XML Documents via XML Fragment • http://xml.coverpages.org/CarmelFragments.pdf