240 likes | 534 Views
Cold Fusion Verity Oracle Enterprise Search Acrobat PDF Index Assistant Comparison. Arden O. Weiss ardenweiss@ verizon .net. Scope of Presentation:. The Application using these Tools The Menu Structure Tool Pros/Cons Output Examples Conclusions. The Application Purposes:.
E N D
Cold Fusion Verity Oracle Enterprise Search Acrobat PDF Index Assistant Comparison Arden O. Weissardenweiss@verizon.net Arden Weiss
Scope of Presentation: • The Application using these Tools • The Menu Structure • Tool Pros/Cons • Output Examples • Conclusions Arden Weiss
The Application Purposes: • A Cold Fusion Application that: • Organizes all archival Facilities, Goals andDocuments by source and content. • Assists users do archival research using SQL on fields and full-text search logic. • Reports on quantity of archival documents and types thereof. Arden Weiss
The Application Attributes: • Searches the content of: • Archival PDF files • Several Oracle Databases • Email Archive • Other Public/shared Folders • Displays Search Results • Does bean-counting type reports • Access controlled by Database/App. Arden Weiss
The Analyses Tree Structure: • Facilities (name/address/contact data) • Research Goals (descp/conflicts/dates) • Research Documents, Interviews, categorization data (detailed info) ____________________ Full text Verity Search includes all Oracle fields plus PDF documents. Arden Weiss
The Cold Fusion Program Menu: Arden Weiss
The CF Program In Action: • Accessing documents from the top-down. • Searching for Facilities. • Searching for Goals • Searching for Documents • Searching for words in Verity Index. Arden Weiss
Cold Fusion Verity Pros/Cons: • Cold Fusion Verity Pros: • Search under Cold Fusion Program control. • Search speed and results display is fast. • Drilldown logic can be used to add specificity. • Output can be formatted as desired. • Output can be redirected to other data stores. • Index update under Cold Fusion Program control. • Cold Fusion Verity Cons: • Problems Indexing large data stores. • Context for search results not always obvious. Arden Weiss
Oracle Search Pros/Cons: • Oracle Enterprise Search Pros: • Search can do a global search of LAN/WAN. • Indexing of large data stores is excellent. • Search speed is fast even for large data stores. • Output is displayed in Google-like display. • Search can be run external to/parallel w/CF App. • Oracle Enterprise Search Cons: • Search logic limited to simple queries. • Output can not be formatted as desired. • Output is displayed in Google-like display. • Search results export is manual via copy/paste. Arden Weiss
PDF Index Assistant Pros/Cons: • Index Assistant Search Pros: • Indexing of large folders of PDF files works well. • Search speed is fast even for a big set of PDF files. • Results are displayed in a well-organized manner. • Results are displayed/highlighted in full context. • Search can be run external to/parallel w/CF App. • Index Assistant Search Cons: • Search is limited to contents of indexed PDF files. • All included files must be in PDF format. • Keeping the index current is a manual process. • Search results export is manual via copy/paste. Arden Weiss
Oracle Search Screen: Arden Weiss
Oracle Search Example Results: Results can be: - Grouped by: Source, Date, Author, File Format - Sorted by: Relevance, Date, Author, File Format, Title, Path, Language Arden Weiss
Oracle Example Search Results: Matching Attribute Names Include (any or all): Author, Description, Headline 1 2 or 3, Host, Keywords, Language, Last Modified Date, Mimetype, Reference to Text, Subject, Title, Urldepth, Url Arden Weiss
Loading Acrobat Index Builder: Arden Weiss
Opening PDF Index (PDX) File: Arden Weiss
Selecting Folders to Include: Rebuild recreates PDX file from Scratch – about8 min for 1471 PDF files. Build updates existing PDXfile (took seconds when changes were minimal). Arden Weiss
Finding PDF Files to Include: This is a rebuild operation – 1st looks for files to include. Arden Weiss
Building Acrobat PDX Index: Build (update) operation is faster than Rebuild operation. Arden Weiss
Scheduled PDX Index Updates: • Use a catalog batch PDX file (.bpdx) to schedule when to automatically build, rebuild, update, and purge an index. • A BPDX text file contains a list of platform- dependent catalog index file paths and flags. • Use a scheduling application, such as Windows Scheduler, to display the BPDX file in Acrobat. • Acrobat re-creates the index according to the flags in the BPDX file. Arden Weiss
Searching PDX Index (1 of 3): • On Acrobat’s Main Menu click on “Edit” then “Search” or press <Shift> <Ctrl> F to display the Search Window. • Click on Advanced Search Options link at Screen Bottom. • Click on Select Index at top of Window to display: Arden Weiss
Searching PDX Index (2 of 3): • The Search Window then changes to show “Currently Selected Indexes” with excellent search options. • Enter criteria and Press the “Search” button to display results. Arden Weiss
Searching PDX Index (3 of 3): • Search for “WEISS” in the “Currently Selected Indexes” -- whole words only checked. Results shown below. Arden Weiss
Conclusions and Thoughts: • All three search technologies co-exist well. • Oracle Search is not PDF-centric and may be too broad a search function to easily control. • Oracle Search may be a good way to discover what missed being put into the CF Archive. • SQL Server may have functionality similar to Oracle Enterprise Search. • Acrobat Index Search gets you immediately closer to the real PDFs (Verity does not highlight search words displayed PDFs. • Verity and Acrobat are much cheaper dates. Arden Weiss
Tha-Tha-That’s All Folks Arden Weiss