430 likes | 583 Views
Clients: Ken Bolton Lynn Brown Angela K. Horne Don Schneder Doris Smith JGSM Library Reference Team. Project Team: Jonathan Gong Benson Lee Man Fai Matthew Lee Greg Leedberg Liz Xu. Johnson Graduate School of Management Library Project. Since the last presentation….
E N D
Clients: Ken Bolton Lynn Brown Angela K. Horne Don Schneder Doris Smith JGSM Library Reference Team Project Team: Jonathan Gong Benson Lee Man Fai Matthew Lee Greg Leedberg Liz Xu Johnson Graduate School of Management Library Project JGSM Library Project - CS 501
Since the last presentation… Tasks accomplished: • Decided on using PHPDig as the backend • Implemented many functional requirements • Adjusted PHPDig code to improve ranking based on client requirements • Discussed with the client additional functionality to be added to the system JGSM Library Project - CS 501
Presentation Outline • New Requirements / Why PHPDig? • Implemented Functionality • Abstract Display • Advanced Search • Administrative Features • Ranking Adjustments • Task List for Final Milestone (Things to Do) • Demo of Current System JGSM Library Project - CS 501
New Requirements • Boosting • Display Statistics Page • Batch Adding • Search Results Display • Add/Remove Categories JGSM Library Project - CS 501
Why PHPDig? Non-technical • Client prefers using PHP/MySQL since both technologies are on their web server • JGSM Library site has less than 300 HTML pages • A requirement: database • Client involved in decision of continuing with PHPDig • Focus on maintainability and usability JGSM Library Project - CS 501
Why PHPDig? Technical • PHPDig code is relatively short • PHPDig = Open Source = Free to modify • Florida State University, Dept. of Biology • www.bio.fsu.edu/phpdig • The Kiwi Search Engine http://www.linknz.co.nz/ • 123,000+ web sites indexed • Ranking is similar to Lucene since they both use the same ranking algorithm (tf-idf) • PHPDig version 1.8.7 www.phpdig.net JGSM Library Project - CS 501
Implemented Functionality: Abstract Display • Purpose • Users can get a description written by a librarian/administrator • Implementation • Modified PHPDig code to look for an abstract • Added a table to the database: auxiliary • spider_id : int • full_url : string • abstract : string • category : string JGSM Library Project - CS 501
Example of Abstract Display JGSM Library Project - CS 501
Example of Abstract Display (Cont’d) JGSM Library Project - CS 501
Our Current Working Interface • We now have a functional interface which can actually perform searches, and display results. • The interface has evolved from the prototype previously presented, based on feedback from our clients. JGSM Library Project - CS 501
Evolved Interface • Started with the prototype presented for progress report 1 as target design. • One we started working with PhpDig’s template system, made some slight changes to the original target interface due to the reality of what PhpDig can handle. JGSM Library Project - CS 501
Evolved Interface JGSM Library Project - CS 501
Evolved Interface • After presenting this design to our clients and discussing possible alternatives, we jointly came up with the current working design: JGSM Library Project - CS 501
Our Current Working Interface: Advanced Search JGSM Library Project - CS 501
Our Current Working Interface: Search Results JGSM Library Project - CS 501
How We Implemented the Interface • PhpDig uses a template system • Allow us to write HTML code for the search page, and use special PhpDig tags to generate form controls, results, etc., within that page JGSM Library Project - CS 501
How We Implemented the Interface • Some problems came up during this process: • Problem: Some of the static HTML generated automatically by PhpDig tags to produce the search form does not match our desired style. • Solution: We do not depend on PhpDig to generate all of the form HTML, some is hand-coded by us to match our style JGSM Library Project - CS 501
How We Implemented The Interface • Some problems arose during this process: • Problem: Some of the dynamic HTML generated by PhpDig tags also does not match our style. • Solution: We cannot hand-code this HTML (category drop-down, etc.), so we modified the PhpDig source code which is called in response to these tags so that the generated HTML matches our desired style. JGSM Library Project - CS 501
Where To Go From Here • Based on future discussions with our client, we will continue to refine the interface towards an ideal goal. • More source-level changes to PhpDig to get the details right • Example: Context currently cuts off words in the middle JGSM Library Project - CS 501
Administrative Features Implemented: • Add a page • Options: abstract & category • Remove a page from database • Update a page in database • Options: update abstract & category • Content is re-indexed JGSM Library Project - CS 501
Administrative Features To be Implemented: • Manual ranking abilities • Give a page more weight overall • Give a page more weight for certain words • Feedback • Kerberos authentication JGSM Library Project - CS 501
Administrative Features To be Implemented: (continued) • Display statistics • Statistics useful to the administrators, such as most frequent searches, searches with no results, etc • Batch adding of pages • Category Administration JGSM Library Project - CS 501
Ranking • Improved from before, mostly complete • Formula similar to Lucene default now: • Our formula: JGSM Library Project - CS 501
coord function • coord(): q is the # of query terms matched in document Q is # terms in query • only relevant in search for “any of the terms” JGSM Library Project - CS 501
Current Progress Completed: • Ranking implementation complete Left to do: • Admin Panel to modify boosted pages/words • Uses boost, but need to finalize how to modify boosting parameter JGSM Library Project - CS 501
Boosting Methods • Two possibilities: 1. Admin modifies score of page relative to current score. 2. Specify position a page should appear given a one-term query. JGSM Library Project - CS 501
Pros and Cons • Method 1: Modify relative to current score + More careful manipulation of score possible + Faster to code, more time to test - More difficult to use • Method 2: modify rank + Easier to use - Adjustments only possible on one-word queries JGSM Library Project - CS 501
Task List for Final Milestone • Feedback • Confirmations and errors will be adjusted to display the message on the administrative page to improve usability. JGSM Library Project - CS 501
Display stats page • Links for the relevant log pages will be added to the main administration page. JGSM Library Project - CS 501
Batch adding • To facilitate the indexing process, we will add batch adding feature to the main administration page. JGSM Library Project - CS 501
Adjust search results display • The page description will have no cut off words and that the client is satisfied with the search results interface. JGSM Library Project - CS 501
Limit by category • Search by category will be implemented. JGSM Library Project - CS 501
Administrative function to add and remove categories • Adding and removing categories will be implemented and linked to the administrative page. JGSM Library Project - CS 501
Administrative function to weight ranking • Manual ranking adjustments will be added so that the client would be fully satisfied with the search results. JGSM Library Project - CS 501
Authentication • Access to the administration page will use Cornell University’s Web Authentication (CUWebAuth) for authentication. JGSM Library Project - CS 501
Unit Testing and Integration Testing • Every unit that is implemented will be fully unit tested on our own computers, and also integrated into the rest of the code for integration testing. JGSM Library Project - CS 501
Installation and Refinement • The installation of the final system will take place early before the next milestone in order to avoid any delay. • This time period is reserved for any last minute minor changes to the system to ensure the client’s satisfaction. JGSM Library Project - CS 501
Documentation and Training Slides • Our final milestone includes a detailed documentation of the project, training slides and an informal training session to help administrators to learn the control of the system. JGSM Library Project - CS 501
Deployment • After careful testing and feedback, the search system will go live. JGSM Library Project - CS 501
Timeline JGSM Library Project - CS 501
Demo… JGSM Library Project - CS 501
The End. • Questions? • Comments? JGSM Library Project - CS 501