310 likes | 523 Views
Creating a National Electronic Thesis and Dissertation Portal in South Africa. Lawrence Webley, Hussein Suleman, Tatenda Chipeperekwa { chippytdm,lwebley }@ gmail.com , hussein@cs.uct.ac.za University of Cape Town Department of Computer Science Digital Libraries Laboratory. Outline. Present
E N D
Creating a National Electronic Thesis and Dissertation Portal in South Africa Lawrence Webley, Hussein Suleman, TatendaChipeperekwa{chippytdm,lwebley}@gmail.com, hussein@cs.uct.ac.za University of Cape TownDepartment of Computer ScienceDigital Libraries Laboratory
Outline • Present • History • Requirements • ETD Environment in SA • Design Principles • System Architecture • Screen Snaps • Future work
Requirements:What is the purpose of a national ETD Portal? • To link South Africa into international efforts • To gather data on university output • To deal with specific local issues • To showcase local accomplishments locally and internationally • To promote local universities • To motivate institutions to have active ETD projects
Requirements:University Concerns • Metadata only • Metadata standards • What to expose – Masters/PhD only? • Who will provide support? • What about small institutions? • How to provide access? – OAI-PMH
Requirements:Additional Requirements • Create reusable, customisable, open source ETD portal management software • Preferable not to reinvent the wheel! • Composed entirely of open source components • Can be customised to meet other use cases • Scalability • National archives are constantly growing
Current ETD Environment in South Africa Institution A @ NRF NRF Central Archive NRF ETD Portal Institution B @ NRF TD Archive ... SA NRF Institution X NDLTD Union Archive Institution Y SCIRUS ... ... SA Universities and Technicons International Partners
Institutional Collections • ETD collections at approximately 12 institutions • Mostly larger, research intensive institutions • Various software packages in use • Eprints, Dspace, ETD-db, other • OAI-PMH support in all systems
Hosted Collections • ETD Collections hosted remotely at the NRF • For smaller institutions with few resources and few ETDs • Multiple instances of Dspace • Temporary arrangement • Technical support from NRF – collection management from institutions
Metadata Repository • Our repository software fits in here • Collection of metadata records from all institutions • Any/all metadata formats • Harvested from institutions using OAI-PMH • Provides OAI-PMH and RSS interfaces • No digital objects
ETDPortal • Web interface to collection • Search/Browse/View metadata • Statistics for collections • Latest entries • Administrative interface • For managing source repositories
International Links • NDLTD Union Archive • International Collection • SCIRUS • Science specific search engine
Design Principles • All modern Linux-based software components • Multi-tiered, simple architecture of complex components • Clean separation between components • Scalability • More easily customised (simply replace a component) • Failure resistant • Any metadata • Simplicity (minimal dependencies) • Java/Tomcat/Lucene
System Architecture: Repository & Harvester Summary Info Database portal Harvester RSS Feed portal Institutions Harvester Web Interface OAI-PMH data provider portal portal Higher up repositories
System Architecture:Harvester • Retrieves metadata from a set of ETD repositories • Via OAI-PMH interfaces • Performs incremental harvests • Performs record validation • Simple validation checks • Performs twice daily harvests • Configurable via web frontend.
System Architecture:Repository • Provides machine access points to metadata harvested • OAI-PMH interface • Can use any SQL-compliant DB • Our implementation used MySQL • Additional services provided • RSS feed of latest records • Summary statistics for records from each institution • Designed to fit into a hierarchy of OAI-PMH compliant DLs
System Architecture: Portal Harvester Web Admin RSS Portal Web Interface Harvester Portal Database repository Lucene Search, browse statistics
System Architecture:Portal • Harvests from Repository into portal DB • Lucene indexes records • Portal provides human interface • Allows keyword searching, browsing, category searches • Also offers links to OAI-PMH and RSS interfaces
Future Work • Packaging into Ubuntu repository • Generic browsing categories • Content Management System • Favourites, citation • Social media buttons • Facebook like, google plus • Bug fixes
Links • Live portal @ www.netd.ac.za • Source Code Available @ http://dl.cs.uct.ac.za/projects/etd_portal Questions?