390 likes | 551 Views
Site Explorer Server: an integrated, client-server, query system for Web sites Giancarlo Bongiovanni, Flavio Fontana, Stefano Borghetti Dept. Of Computer Science, University of Rome, “La Sapienza” ENEA’s Usability Lab. Summary: Introduction Information Retrieval Systems and keyword score
E N D
Site Explorer Server: an integrated, client-server, query system for Web sites Giancarlo Bongiovanni, Flavio Fontana, Stefano Borghetti Dept. Of Computer Science, University of Rome, “La Sapienza” ENEA’s Usability Lab
Summary: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works
Information Search in Internet 158 Milions of accesses in Junary ‘99 Millions of heterogeneous users A forecast of 200 millions in 2000 Billions of information sources provided by Web Exponantial increasing of Web site count 33 millions in the United States, 1 million and 300 thousand in Germany, 371 thousand in Italy Increasing of network access by end users Increasing of web browser funtionalities The users that use Internet since more than 3 years are only the 11% Increasing of search engines performs Internet is the biggest and the most widespread network Internet • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works
Il problema della ricerca delle informazioni sul Web Internet ? Issue: Information search in Internet could be a problem for particular type of users? • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works Today a better scenario • Users problems related to information search: • Many users don’t know the Web information model • Users have problems to find a valid tools able to locate the relevant information • Users have problems to describe searched information using right and concise terms • Users have problems to use advanced search tools (i.e. Site Explorer Server is more difficult to use rather than browser)
Analisi dei requisiti dell’utente Site Explorer v1.1 IR New search and exploration tools New and alternative Web approach to traditional browser Implementation of a Client/Server tools able to make Web IR using Java, experimented and tested ENEA • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works Tool integrated with browser Network service
Information Retrieval Systems Struttura generale User Result Query Similar Data structure in pre-definded language Documents Indexing Gerard Salton, Introduction to modern information retrieval, Ed. 1983, McGraw-Hill, Inc. IRS • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works • Result formulation • Query formulation by user • Indexing process
Information Retrieval Systems Formulazione della richiesta Query formulation is a list of terms able to express and summarize the searched argument IRS • Boolean Systems combine the terms using boolean operators: • and • or • andnot Examples: Information and retrieval Information or retrieval Information andnot retrieval Operatori booleani • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works • Extended boolean systems use additional operators: • nearness of terms • cutting of terms • search using particular field Examples: Information adj retrieval Inform* Information [in titolo] Operatori estesi In Ranking systems query formulation is made using natural language phrases Examples: “Uman influence in Information Retrieval systems Ranking
Information Retrieval Systems Indicizzazione Indexing is a process to analyse documents and to provide a short contents rapresentation. IRS Rapresentation is based on a keyword vector. These keywords are choosen by a manual process or are extracted by an authomatic process Example: “Information Retrieval Data Structure & Algorithms” • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works Terms vector <information, retrieval, data-strucuture, alghoritms> Example: List, tree, index file, etc. Data structure to contains document rapresentation Data structures Example: A file where every record describe the releted record with each particular term Iverted indexing
Information Retrieval Systems Formulazione e presentazione del risultato In traditional IRS the result is a potential relevant document list Gerard Salton, Introduction to modern information retrieval, Ed. 1983, McGraw-Hill, Inc. IRS William B. Frakes, Ricardo Baeza-Yates, Information Retrieval Data Structure & Algorithms, Ed. 1992, Prentice Hall, Inc. • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works Documents ordinated by relevance level Resuls order Explicit measure of relevance level (score) Dynamic presentation (results manipulation) Graphic and direct method presentations New features Multimedia integration Use of windows (different way to present the results)
Information Retrieval Systems Calcolo dello score Information Retrieval Systems Calcolo dello score Score compute is focused to measure the relevance of specific terms in specific documents IRS Key point in score compute Example: A method to weight the term relevance in the whole document collection • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works (Sparck Jones, 1972) (Dennis, 1967) Example: Frequence normalization for particular document collection (Croft, 1983) (Harman, 1986) Compute of a term weght for a document Term frequence in the document * term relevance weigth in the collection • Compute the score: • Boolean system: use SOP method • Ranking system: use particular formula.
I motori di ricerca Web interface (Query and results) Index DB SIMILAR Web pages Authomatic indexing system • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works New functionality in the most popular search engine: • Sites classification • Integration of new advanced search services to search information in particular format (picture, sounds, MP3, e-mail etc.) • not much search engines provide a document score • Migration from search service to on-line seller guides Media Matrix - June 1999
Internet Da trent’anni ad oggi 1969 First transmission on ARPANET 1983 NSFNet 1978 ufficialization of TCP/IP 1991 World Wide Web 1992 ISOC 1999 Inet’99 30 years Internet • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works Source: FIND/ITPD, III, Gennaio 1999 - NII project, supported by DOIT, MOEA
Internet Da trent’anni ad oggi 1969 First transmission on ARPANET 1983 NSFNet 1978 ufficialization of TCP/IP 1991 World Wide Web 1992 ISOC 1999 Inet’99 30 years Internet • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works Source: FIND/ITPD, III, Gennaio 1999 - NII project, supported by DOIT, MOEA
Internet Da trent’anni ad oggi 1969 First transmission on ARPANET 1983 NSFNet 1978 ufficialization of TCP/IP 1991 World Wide Web 1992 ISOC 1999 Inet’99 30 years Internet • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works Source: FIND/ITPD, III, Gennaio 1999 - NII project, supported by DOIT, MOEA
Internet Da trent’anni ad oggi 1969 First transmission on ARPANET 1983 NSFNet 1978 ufficialization of TCP/IP 1991 World Wide Web 1992 ISOC 1999 Inet’99 30 years Internet • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works Source: FIND/ITPD, III, Gennaio 1999 - NII project, supported by DOIT, MOEA
Internet Verso il domani Internet • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works • Future Tracks: • Research and technologies • Educational • The Public Administration • E-commerce
Java Main features Technologies Applet Oriented to Graphic User Interfaces implementation Multithread Client • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works Object-oriented Site Explorer Server v2.0 Oriented to Client/Server systems implementation Dynamic Portable High functionalities for networking Platform independence Server
Site Explorer Server v2.0 Obiettivi Goals - To implement a new system: able to work directly on Web able to helps the user to find interesting documents on Web • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works with an high usability degree able to integrate: • search functions • alternative approach rather than browser • management functions • user position to access to the Web etherogeneous data using a unique way.
Site Explorer Server v2.0 Site Explorer Server v.2.0. AClient/Server system, implemented using Java, able to make automatic Web site analyse, and to provide, as result, the tree site structure where the root node represents the site home-page. • Focused on information search and retreiving by keywords search approach • an easy information-filtering service • a score computation service • user management Additional features • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works User Client A network service An accessible (open to everybody) open and multi-platform service Interface INTERNET Website Site Explorer Server
Site Explorer Server v2.0 Architettura esterna e configurazione HTTP SEP Web site #1 Web site #2 Web site #n Internet SEJA applet Browser (SEJA) SEC Mac-OS User 1 User m HTTP Server (SES) Windows Unix User 2 User 3 • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works • Client/Server system • The Server (SES) is a Java application • The Client (SEJA) is a Java applet • SES and SEJA speak using a dedicated Application layer protocol (SEP) Technical features
Site Explorer Server v2.0 Funzionamento e processi Query selector process Query USER Web sites HTTP connection process • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works Links extraction process Contents extraction process Keywords analisys process Score process Result builder Next site’s page Client user interface Result-display process Result
Site Explorer Server v2.0 Sottocomponenti del SES Site Explorer Server v2.0 Page analyser Connection request (client) Main Site analyser Internet Query (client) Comunicator Function manager Results (client) User manager Retriever • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works • full-text document analyse • Links cheking using connection requests • HTML 4 oriented Features
Site Explorer Server v2.0 Lo score di Site Explorer Server Three score level: • Level 1 score. It’s based only on the keywords items inside the Web page. • Level 2 score. It’s also based on the keywords distribution inside the whole Web site. • Level 3 score. It’s based also on the position of keywords items inside the Web page structure. • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works
Site Explorer Server v2.0 Site Explorer Java Applet GUI Menù-bar Tool-bar Displayed result Tree structure area Retrieved object in Web site Textual area Multimedia area State bar State indicator
Site Explorer Server v2.0 Site Explorer Java Applet GUI
Site Explorer Server v2.0 Site exploration Connessione al server
Site Explorer Server v2.0 Site exploration Indicatore di connessine attiva
Site Explorer Server v2.0 Site exploration New site analyse request
Site Explorer Server v2.0 Site exploration Use of a favorite site analyse request
Site Explorer Server v2.0 Site exploration Use of a pre-defined site analyse request
Site Explorer Server v2.0 Site exploration Receiving result
Site Explorer Server v2.0 Site exploration Score level Relevat page indicator Results navigation
Site Explorer Server v2.0 Site exploration Results browsing
Site Explorer Server v2.0 Site exploration
Site Explorer Server v2.0 Il pilot-center Lo Usability Lab (Ulab), istituito nel 1992 presso il pilot-center del progetto ESPRIT III VENUS e svolge un’attività di Ricerca & Sviluppo nel campo delle interfacce visuali avanzate a basi di dati e sistemi informativi multimediali in rete. • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works Macchine di sviluppo e test: • Intel Pentium II 350Mhz / Windows 98 (Netlab) • Intel Pentium MMX 166Mhz / Windows 95 (Fontanaulab) • AMD K6 300Mhz/ Windows 98 (Ulab) • Sun Sparc Station 5 / Unix Solaris 2.5 (Venus) • Sun Sparc Station 10 / Unix Solaris 2.5 (Dafne) Strumenti software: • JDK v1.1.6, JDK v1.1.7, JDK v1.1.7a, JDK v1.17b, JDK 1.1.8 • Edit+, Netbeans • Java Swing v1.0.3, Java Media Framework v1.1
Site Explorer Server v2.0 Conclusion and experimental results • A strong system • good/exellent usability degree • A good response time (Analyse and result build) • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works • 50 users selected using ENEA/VENUS methodology: • random user. Occassional system use. • Professional users: System user related to their work. • Expert user.
Site Explorer Server v2.0 ENEA applications • G7 Global-Inventory project • A project data card collection • Site search engine vs Site Explorer Server • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works Plus - Prosoma LinkUp Service A multimedia data card collection Experimental sites: ULAB sites • Future testing: • Virtual Lab Site • FAD
Site Explorer Server v2.0 e altri sistemi esistenti Esplorazione dei link LinkBot - Analisi dei link Site Explorer - Costruzione di un albero per un singolo sito SurfMap JavaNavigator Applet per navigazione su mappa • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works Ricerca su un sito PersonalSearch: applet come motore di ricerca per un sito Virgilio - Funzione di ricerca su un sito Esplorazione e rappresentazione di un sito HyperSystem Net40 - esplora un sito e ne da una rappresentazione ad albero permettendo la navigazione Navigazione su mappa e funzione di ricerca MerzeScope: applet di navigazione su un grafo con funzione di ricerca per un solo sito
Site Explorer Server v2.0 Future works A totally modular internal architecture to be able to add new modules and news functions in the simplest and most dynamic way. • Index: • Introduction • Information Retrieval Systems and keyword score • Search engines • Internet now and the future • Java • Site Explorer Server v2.0 • Conclusion and experimental results • Future works The implementation of a user profile system based on the user’s interests constantly updateable by a feed-back technique. The insertion of a new system agent able to make automatic off-line Web site analysis to suggest to the user, using his profile information, a set of query about specific themes.