1 / 12

What won’t change

What won’t change. Harvest’s basic design SOIF for inter-component communication Development model. General Goals. Increase search speed Shift focus to HTTP and HTML Internationalisation Improve scalability Increase availability Improve access control. General Goals.

terah
Download Presentation

What won’t change

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What won’t change • Harvest’s basic design • SOIF for inter-component communication • Development model http://harvest.sourceforge.net/

  2. General Goals • Increase search speed • Shift focus to HTTP and HTML • Internationalisation • Improve scalability • Increase availability • Improve access control http://harvest.sourceforge.net/

  3. General Goals • Integration of other search systems into Harvest system • Remove all non GPLed components • Improve ranking • Promote Harvest to attract more users and developers http://harvest.sourceforge.net/

  4. Gatherer • Shift focus to HTTP • Improve gathering over slow connection • Improve HTTP gatherer • Create multiple Gatherers “on the fly” where possible • Evaluate larbin and curl • Migrate from GDBM to Sleepycat’s DB for local disc cache management http://harvest.sourceforge.net/

  5. Gatherer • Remove local disc cache • Implement candidate selection filter for HTTP enumerator based on mime type • Trust mime type sent by HTTP servers • Add HTTPS support • Evaluate improvements of HTTP 1.1 over HTTP 1.0 • Replace unnesters with exploders http://harvest.sourceforge.net/

  6. Gatherer • Improve object storage system • Improve expiring objects • Evaluate viability of an expire daemon • Split file: and news: rootnodes into leafnodes • Remove All-Templates • Make SOIF objects shareable between Gatherer and Broker if possible http://harvest.sourceforge.net/

  7. Summarizer • Shift focus to HTML • Improve existing HTML summarizers • Create HTML summarizer which “understands” HTML • Improve support for Microsoft Office documents http://harvest.sourceforge.net/

  8. Broker • Add Indexdata’s Zebra as fulltext indexer • Implement method to retrieve an SOIF object by URL • Improve temporary file/directory handling used for paging search results • Improve SOIF object storage • Extend “shell indexer” functionality http://harvest.sourceforge.net/

  9. Broker • Implement an user interface in PHP • Separate data from metadata when storing SOIF objects • Minimise size of Registry • Use cookies to save user preferences of the search interface • Evaluate and write SOIF filter for Namazu http://harvest.sourceforge.net/

  10. Broker • Evaluate RDBMS (Postgresql, MySQL) • Evaluate Xquery and SOAP http://harvest.sourceforge.net/

  11. Documentation • Switch from linuxdoc to docbook for manual and FAQ http://harvest.sourceforge.net/

  12. Problems • PostScript and PDF summarizers • Apache’s multiviews • IMS Gathering • Stemming and Soundex are language dependant • Language recognition • No free thesauri available http://harvest.sourceforge.net/

More Related