210 likes | 352 Views
Aspire Document Processing. Document Processing – “Aspire”. Very High Performance Structured Document Processing Architecture Dynamic configuration and deployment Based on Open Source Technologies Well Supported (wiki, javadoc) Administration interface built-in
E N D
Aspire Document Processing
Document Processing – “Aspire” • Very High Performance • Structured Document Processing Architecture • Dynamic configuration and deployment • Based on Open Source Technologies • Well Supported (wiki, javadoc) • Administration interface built-in • Vendor Neutral (CMS and search engine)
Top-Level Overview Aspire Document Processing Pipelines Data Sources Index Feeders Indexing
Components In Aspire (today) Aspire Component Manager Pipeline Manager Feeders SubJob Extractors Enhancers Metadata Manipulation Output RSS Push XML to REST Get CCD Metadata Unload CSV Date Chooser Hot Folder Split Multi-valued data Error Job Handler RDB Enhancer Unload ARC Files Single Page Host to Domain Debug Output RDB Fetch URL RDB Unloader Text Extraction JMS Groovy Scripting Common Resources Feed One Category Tagger JDBC Connection Content Control DB Content Boost
Functions Handled by Aspire • Threading • Collection Deployment • Error handling and notification • Including individual sub-job notifications • Collection Configuration • Component Scripting • Job Processing • Admin I/F, performance, live system status
Benefits • Much lower lifecycle cost • File processing no longer an ad-hoc collection of java objects and methods • Encourages re-use of components • New collections with no programming • Just re-configure existing components • Flexibility: deploy collections individually • Much better visibility into the file processing internals, performance, and queuing
Typical Installation Structure Machine #1 Machine #2 Crawler Aspire (other feeders and doc processing) Search Engine
Aspire and OSGi Components Aspire Component Aspire Component Factory Manufactured By OSGi Bundle ISA Java Jar File ISA
Deployment • Architected to the latest deployment standards • Distribution Archetypes • Component Repositories • Redeploy collections independently • In a live running system • Redeploy and update components • In a live running system • Ready for the cloud
Deployment Structure Administrator Aspire load/reload configuration Resources Feeders & Pipelines Collection Config Collection Config Configuration Control Collection Config Collection Config Collection Config Collection Config re-useable components ComponentRepository
Deployment Implications • Collections are configured independently • Collections use standard components • Can be dynamically and remotely deployed Collection Config load remote configurations Remote System Aspire (always running) remote admin control