340 likes | 462 Views
Data Desiccation : Facilitating Long-term Access, Use, and Reuse of ETDs. Daniel Gelaw Alemneh and Mark Edward Phillips. 14th International Symposium on Electronic Theses and Dissertations (ETD-2011) 13-17 Sept. 2011, Cape Town, South Africa. UNT’s ETDs. -General Background
E N D
Data Desiccation: Facilitating Long-term Access, Use, and Reuse of ETDs Daniel Gelaw Alemneh and Mark Edward Phillips 14th International Symposium on Electronic Theses and Dissertations (ETD-2011)13-17 Sept. 2011, Cape Town, South Africa
UNT’s ETDs -General Background -Libraries Role
Background • The University of North Texas (UNT) began accepting theses and dissertations in electronic format in 1999. • UNT is one of the early adopters of what was to become the ETD movement in higher education • One of the first three American universities to require ETDs for graduation.
UNT Libraries Role • The UNT Libraries play an active role in facilitating access to UNT’s ETDs • In 2007 the Digital Projects Unit took on a stewardship role • Develop appropriate Metadata • Integrate Value added services into the ETDs • In 2010 we started retrospective conversion projects: • Digital retro-conversion (in-house project) for pre-1999 theses and dissertations previously available only in paper or microform. • Digital retro-conversion for ETDs (1999 to 2009) previously available only in PDF file format.
What makes up UNT’s ETDs? -UNT ETDs Size -By Access Level -By Degree Level
Access Levels of UNT’s ETDs • 1. Public: - • These ETDs are open or there are no restrictions on these resources. • 2. Restricted:- • 2.1 UNT-Community:- • These ETDs are restricted to users associated with UNT. • Users are normally required to log in using their EUID if they are located outside the UNT campus. • The restricted ETDs after 2007 have a delay (2-5 years) and then they will be moved to "Public" • 2.2 UNT-Strict:- • These ETDs are restricted to the UNT Community. • This will be strictly enforced and users are always required to log in using their EUIDs, regardless of their location.
Data Desiccation -Overview -Magick Numbering -Multiple Data Formats -Submission Information Package (SIP)
Data Desiccation • In the context of the UNT ETDs, data desiccation first involves converting the deposited PDF into a series of image files that serve as the primary access point to the documents online. • High quality JPEG images as the image format • Magick numbering involves two running sequences of numbers (an eight digit filename).
Multiple Data Formats • PDF • Originally deposited version • JPGs • A series of derivatives converted from the original pdf: • jpg:- (serve as the primary access point to the documents online) • pro:- (the proprietary format from the PrimeOCRengine) • xml:- (a UNT-specific word bounding box file) • txt:- (ASCII text file converted from Pro format).
Enhancing UNT’s ETDs Access/Use via Desiccation -Multiple Formats Access Strategy -Access by Degree Level -Access by Country -Access via Mobile Devices
Multiple Formats Access Strategy • In addition to the originally deposited PDF format, the data desiccation process provides and facilitates additional methods of access by: • exposing the page level OCR text to an increasing number of search engines • allowing page turning interfaces or other interfaces designed for emerging mobile devices
Multiple Formats Access … • Longitudinal data will be collected to see if desiccated ETDs receive more use than the older, single-format PDF versions. • We are already witnessing an overall increase in access to the ETDs in the UNT Digital Library.
Summary References
Summary • Given the pressure of reading more in less time, today’s users demand access to various formats regardless of temporal and spatial restrictions and the types of devices used. • Based on the data, users: • -Increasingly use Mobile devices • -Come from different countries (with varied bandwidth) • -View one or a few pages • -Visit just once • Understanding user communities, their information needs, and their use behavior will help to move contents into the users’ space and facilitate access and use of ETDs.
Summary • The successful management of ETDs requires multifaceted effort across the entire life-cycle to ensure that ETDs are managed, preserved, & made accessible in a manner that today’s users expect. • Over the past year, the UNT Libraries have put forth great effort in making digital collections more accessible and useful in research processes. • Data desiccation or providing multiple options certainly facilitates both enhanced and long-term access to the contents of ETDs!
References • - The University of North Texas (UNT) ETD-Progress: http://www.library.unt.edu/digitalprojects/procedures/etd/etd-progress • UNT Metadata: http://www.library.unt.edu/digitalprojects/metadata • UNT Theses and Dissertations: http://digital.library.unt.edu/explore/collections/UNTETD/browse/
Questions? Mark.Phillips@unt.edu Daniel.Alemneh@unt.edu and/or