130 likes | 296 Views
Document management (aka ‘digital libraries’). The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve Jones, Te Taka Keegan, Annika Hinze. Document management Content management Metadata management Multimedia documents
E N D
Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve Jones, Te Taka Keegan, Annika Hinze
Document management Content management Metadata management Multimedia documents Alerting and event notification support OCR-ing services Document & collection visualization User needs analysis Text mining Automatic metadata extraction Our work includes…
Greenstone software • ‘digital library’ construction, use, and maintenance software • Developed at Waikato (www.greenstone.org) • Open Source • Widely used internationally (UNESCO, FAO, Texas A&M Uni, Kyrgyz Republic, …) Digital library: A collection of digital objects (text, video, audio) along with methods for access and retrieval, [user]and for selection, organisation, and maintenance[librarian]
Greenstone software features Collections • “Library” = set of separate collections“Collection” = set of separate documents • Multigigabyte collections • Hierarchical document model • Multimedia picture, voice, music, video collections • Multi-language documents Unicode throughout • Multi-language interfaces French, Chinese, Arabic … • Web browser or CD-ROM • Searching full-text and fielded, ranked or boolean • Browsing hierarchical indexes created from metadata • Metadata Dublin core + collection-specific extensions • Plugins different document types and metadata specifications • Classifiers create browsing indexes (collection editor decides) • Compression techniques throughout uses MG • Distributed collections coming soon, with Corba • Open-source software free, extensible Documents Access Importing Distributing
Greenstone supports: hierarchically structured documents A book
Greenstone supports: collection design, maintenance Designing a collection with the Gatherer
Greenstone supports: a wide (and growing) set of file formats • DOC • PDF • XLS • LaTeX • Refer • MARC • … • highly extensible through ‘plugin’ mechanism
Mobile document access • handheld information access • browsing methods for varying screen sizes • studies on search behaviour (on- and off-line) • support for non-text documents (FunkyZoom views of maps, images)
Browsing and exploration: hierarchical phrase index • What’s in this collection? • Is it any good? • What coverage for topic X? • My query returned too much/little, what now?
Recent and proposed projects • Making documents mobile: moving between large online collections and a PDA • Text mining: extracting quality metadata from legacy documents • User needs analysis: what sort of documents do a given set of users require, and how can the collection be managed? • Visualization: making it easy to ‘see’ what’s in a collection, and supporting effective browsing
Recent and proposed projects • Multi-language collections: tailoring a document collection interface and interaction mechanisms to the language of its users • Alerting services: bringing potentially useful documents to the user’s attention, without overwhelming them • Supporting unusual users: collections for the physically disabled, illiterate or semi-literate, children, … • Audio and image collections: novel browsing and searching mechanism
Recent and proposed projects • Storage and searching: developed highly efficient techniques for storing, indexing, and searching text documents; implemented in Greenstone, but portable to other document management software • Usability analysis: how easy is it to use your current document collection? How can access be improved? • And a host of wacky and cool things: collaging document collections, music retrieval systems, ‘aerial’ views of documents, …