160 likes | 172 Views
Workshop on XML-Based Library Applications 5 . Library Applications (Part One). Outline. Part One Using XSLT (New Acquisitions List) Metadata Design (Electronic Journals) Multi-Script Considerations (Theses and Antique Maps) Part Two XML Name Access Control Repository.
E N D
Workshop on XML-Based Library Applications5. Library Applications(Part One) Hong Kong University of Science & Technology Library
Outline Part One • Using XSLT (New Acquisitions List) • Metadata Design (Electronic Journals) • Multi-Script Considerations (Theses and Antique Maps) Part Two • XML Name Access Control Repository Hong Kong University of Science & Technology Library
New Acquisitions List (1) • http://library.ust.hk/res/newbooks/ • Design considerations: • No need to build database • Static files, one set for each week • Web interface by Perl script • Weekly static files generated by Perl script as a batch job at night Hong Kong University of Science & Technology Library
List of III record numbers Create Review List Retrieve metadata by xrecord= command HTML pages Send metadata xrecord requests Transformation By XSLT Stylesheets IIIRECORDs RSS files INNOPAC Weekly List Generation New Acquisitions List (2) Hong Kong University of Science & Technology Library
New Acquisitions List (3) • XSLT transformation of IIIRECORD to New Acquisitions Record • Requires a few passes of XSLT • Locally developed tool to convert EACC codes in “braced form” to UTF-8 • Sample IIIRECORD • Resulting Record after XSLT transformation Hong Kong University of Science & Technology Library
New Acquisitions List (4) Conclusion: • By using Perl scripts and XSLT stylesheets, list of XML formatted bibliographic records extracted from INNOPAC can be transformed into two completely different outputs (views), namely HTML web page and RSS news feed. Hong Kong University of Science & Technology Library
Electronic Journals Online (1) • http://library.ust.hk/res/ejournals/ • Design considerations • Require a database (on Tamino) • Metadata schema design • Indexing design • Weekly updating by Perl script • Decided to use Perl module of LibXML, instead of XSLT stylesheets Hong Kong University of Science & Technology Library
Electronic Journals Online (2) INNOPAC Weekly Update (by Perl and LibXML2) XML Formatted IIIRECORD EJ_RECORD EJ Online • Extract elements • Construct EJ_RECORD • Load metadata to EJ Online Hong Kong University of Science & Technology Library
Electronic Journals Online (3) Metadata Design • Decided not to use Dublin Core • Internal metadata - not for exchange with external systems • Programming overhead to incorporate DC • Requires extension of DC in order to markup MARC Tag 856, the hypertext link to the electronic resources Hong Kong University of Science & Technology Library
Electronic Journals Online (4) • Decided not to use RDF • Due to the same reasons above; although it can resolve the Tag 856 markup problem that DC has. • Sample abridged EJ_RECORD Hong Kong University of Science & Technology Library
Antique Maps and Theses (1) • Antique Maps http://library.ust.hk/res/maps/ • HKUST Theses http://library.ust.hk/cgi/db/thesis.pl • Design considerations • Both databases are on Tamino • Metadata as XML documents • Hypertext links to PDF files Hong Kong University of Science & Technology Library
Antique Maps and Theses (2) Multi-script Considerations • Non-English characters: • Diacritics • Mathematical symbols and formulas • Greek alphabet • CJK • XML is UTF-8 by default • Tamino stores XML documents in Unicode Hong Kong University of Science & Technology Library
Antique Maps and Theses (3) • Unicode and UTF-8 Explained: • Developed by Unicode Consortium (http://www.unicode.org), since 1991. • A character coding system of written texts of diverse languages. • Latest version is 4.0, released in 2003. • Has 96,382 characters. 82,270 of them are CJK characters (including Hangul). Hong Kong University of Science & Technology Library
a ȧ Antique Maps and Theses (4) • Diacritics – Combining Characters to be positioned relative to an associated base character. • UTF-8 transforms a Unicode scalar value to a sequence of 8-bit bytes. English alphabets are one byte, CJK ideographs are three bytes. Hong Kong University of Science & Technology Library
Antique Maps and Theses (5) • Example of UTF-8 transformation: • Latin character A has a Unicode scalar value of U+0041. It is transformed to \x41. • Greek alphabet α has a Unicode scalar value of U+03B1. It is transformed to \xCE\xB1. • Chinese character 中 has a Unicode scalar value of U+4E2D. It is transformed to \xE4\xB8\xAD. Hong Kong University of Science & Technology Library
Antique Maps and Theses (6) • Demonstration – Entering non-Latin characters to the metadata Hong Kong University of Science & Technology Library