140 likes | 330 Views
Alvis status report: Index Data. Annual meeting, 23 January 2006. Check out the exciting things to come! 1. Technical contribution 2. Status of tasks 3. Status of milestones 4. Status of deliverables 5. Contribution to other work packages. Alvis status report: Index Data.
E N D
Alvis status report: Index Data Annual meeting, 23 January 2006 Check out the exciting things to come! 1. Technical contribution 2. Status of tasks 3. Status of milestones 4. Status of deliverables 5. Contribution to other work packages Alvis status report: Index Data Mike Taylor <mike@indexdata.com>
1. Technical contribution Metadata formats: Fat Peer description format complete Enriched Document format complete Architecture: Many details of fat-peer architecture resolved Alvis status report: Index Data Mike Taylor <mike@indexdata.com>
1. Technical contribution Indexing Engine (Zebra): Improved performance bottlenecks Support 2^64 word and document occurrences Improved indexer performance (approx. 1000 docs/s) Improved boolean 'and' search performance Implemented approximate hit counts Created XML/XSLT indexing input filter Fixed truncation error found in WP8 testing Set up HP 64-bit dual AMD Opteron box for load testing Alvis status report: Index Data Mike Taylor <mike@indexdata.com>
2. Status of tasks Task 3.1 - Network Node Metadata Framework. Designed and documented (see D3.1), not yet deployed Task 3.2 - Semantic Document Metadata Framework. Designed, documented and tested, except WP5 contribution Task 3.3 - Database Engine Framework. Several releases made Documentation and further development work required Alvis status report: Index Data Mike Taylor <mike@indexdata.com>
2. Status of tasks Task 3.4 - Semantic Indexing Support. Prototype facilities are complete Provided XML/XSLT-based indexing specification Documentation and further development work required Task 3.5 - Distributed Network Integration. The indexing engine integrated with processing pipeline Integration between fat peers still to be done Alvis status report: Index Data Mike Taylor <mike@indexdata.com>
3. Status of milestones MS3.1. (M6) First version of network node metadata framework. MS3.2. (M12) First version of semantic document format. MS3.3. (M12) Database engine framework initial release. MS3.4. (M18) Semantic indexing support complete in DB engine. MS3.5. (M18) Network node metadata framework complete. MS3.6. (M20) Semantic document metadata framework complete. MS3.7. (M24) Database engine participating in ALVIS network. All achieved except MS3.4 ... “complete” is overstating it. MS3.8 (feature lockdown) is still to come. Alvis status report: Index Data Mike Taylor <mike@indexdata.com>
4. Status of deliverables D3.1. (M24) Report on metadata frameworks, including concrete representations, for network nodes and semantic document analyses. Delivered this morning :–) D3.2. (M36) Database engine framework extended to support external semantic indexing modules and the ALVIS network architecture, fully documented and packaged. To follow Alvis status report: Index Data Mike Taylor <mike@indexdata.com>
5. Contribution to other WPs WP2 (Document Probability Model) Implemented static ranking plugin API for Zebra Dynamic relevance-scoring plugin API for Zebra Experimental support for various TF-IDF algorithms Fuzzy set ranking and hit-set computation Alvis status report: Index Data Mike Taylor <mike@indexdata.com>
5. Contribution to other WPs WP4 (Distributed Search) Contribution to architecture decisions Wrote use-case document (Digital library system) CQL query-trickling design Prototype P2P hit-set merge module Alvis status report: Index Data Mike Taylor <mike@indexdata.com>
5. Contribution to other WPs WP7 (Topic Specific Crawl) Debian packaging of WP7 Crawler Integration of crawler into pipeline. ZOOM-Perl API for feeding harvested documents into indexer Alvis status report: Index Data Mike Taylor <mike@indexdata.com>
5. Contribution to other WPs WP8 (Integration and Evaluation) Pipeline protocols and implementation Alvis::Pipeline Perl module Pipeline-testing GUI client, “DC-TUNES”. Built local testing database containing 7GB Wikipedia Snippet generation plugin API for Zebra Alvis status report: Index Data Mike Taylor <mike@indexdata.com>
5. Contribution to other WPs WP10 (Dissemination and Exploitation) Paper presented to the IDDI session of DEXA 2005: Searching very large bodies of data using a transparent peer-to-peer proxy. ZOOM-Perl and Alvis::Pipeline modules on CPAN Alvis status report: Index Data Mike Taylor <mike@indexdata.com>
5. Contribution to other WPs WP11 (Demonstration) Preparation of software to participate in demonstration Guidance of demo-system integration efforts Alvis status report: Index Data Mike Taylor <mike@indexdata.com>
The End ... finally I can't believe we have to sit through eleven of these presentations. Alvis status report: Index Data Mike Taylor <mike@indexdata.com>