230 likes | 406 Views
Digital Libraries. Nick Narcise April 4 th 2006. What is a Digital Library?. What is a Digital Library?. Definition from Wikipedia
E N D
Digital Libraries Nick Narcise April 4th 2006
What is a Digital Library? Definition from Wikipedia A digital library is a library in which a significant proportion of the resources are available in machine-readable format (as opposed to print or microform), accessible by means of computers. The digital content may be locally held or accessed remotely via computer networks.
What Do You Do with a Million Books? • Gregory CraneTufts UniversityD-Lib Magazine March 2006 Volume 12 Number 3 ISSN 1082-9873 http://www.dlib.org/dlib/march06/crane/03crane.html
Main Focus The ability to extract from the stored record of humanity useful information in an actionable format for any given human being of any culture at any time and in any place
Reduce the tangle of text mining, analysis, and searching technologies converting analog source to text translating one language to another Transform raw text into data
How is a Library digitized? The process of digitizing a library began with the catalog, moved to periodical indexes and abstracting services, next to periodicals and large reference works and finally book publishing. Some of the largest and most successful digital libraries are Project Gutenberg, ibiblio and the Internet Archive.
Optical Character Recognition From Wikipedia, the free encyclopedia Optical character recognition, usually abbreviated to OCR, involves computer software designed to translate images of typewritten text (usually captured by a scanner) into machine-editable text, or to translate pictures of characters into a standard encoding scheme representing them in (ASCII or Unicode).
Problems with OCR • May have errors • Useless as a knowledge base • Human beings are still much better at reading and interpreting the contents of page images than machines.
Text, Information, Knowledge and the Evolving Record of Humanity • Gregory Crane and Alison JonesTufts UniversityD-Lib MagazineMarch 2006 Volume 12 Number 3 ISSN 1082-9873 http://www.dlib.org/dlib/march06/jones/03jones.html
C. Montgomery Burns: "I'd like to send this letter to the Prussian consulate in Siam by aeromail. Am I too late for the 4:30 autogyro?" Clerk: "Uhhh, I better look in the manual ..." Burns: "The ignorance! ..." Clerk: "This book must be out of date – I don't see 'Prussia,' 'Siam' or 'autogyro.'" From "Mother Simpson," The Simpsons Television Show, Episode 3F06
Digital Reference Materials Thesaurus of Geographic Names (TGN) • Includes names and other information about places such as cities, counties, nations and their associated physical features like mountains, coasts and rivers. Other information related to history, population, culture, art and architecture is included. • TGN can associate the obsolete name Siam with the nation of Thailand (tgn,1000142) – but also with towns named Siam in Iowa (tgn,2035651), Tennessee (tgn,2101519), and Ohio (tgn,2662003). Prussia appears but as a general region (tgn,7016786), with no indication when or if it was a sovereign nation. Alexandria Digital Library (ADL) • represents a sophisticated framework with which to create such resources: places can be associated with temporal information about their foundation (e.g., Washington, DC, founded on 16 July 1790),
Consider the sentence “The current price of tea in China is 35 cents per pound."
The idea is that a digital library could • plot the prices of various commodities in different markets over time, • plot the various lifetimes of individuals, or extract and classify many events would be very useful
Digital Reference Materials • Carefully transcribed primary sources <l n="22">Forte fuit iuxta tumulus, quo cornea summo</l> • Gazetteers and semi-structured text sources <div 2 type=entry><head>AARONSBURG</head><p>P v., Hains t., Centre co., Pa. It is at the eastern extremity of Penn's valley, near Penn's creek, 32 m. Bellefonte, 89 N.W. Harrisburg. 181 W. It contains a lutheran church, two stores, and 450 inhab • Citation-based authority lists <div1 type="entry" id="abdera"><head>Abdera</head><div2 type="subentry" id="abdera-1"><head>Abdera, city of Thrace</head><div3 type="index"><list type="index"><item><bibl n="Paus. 6.5.4">Paus. 6.5.4</bibl>, <bibl n="Paus. 6.14.12">Paus. 6.14.12</bibl></item><item>a town of Thrace on the Nestus: <bibl n="Hdt. 1.168">Hdt. 1.168</bibl>, <bibl n="Hdt. 6.46">Hdt. 6.46</bibl>, <bibl n="Hdt. 7.109">Hdt. 7.109</bibl>, <bibl n="Hdt. 7.120">Hdt. 7.120</bibl>, <bibl n="Hdt. 7.126">Hdt. 7.126</bibl></item><item>founded at grave of Abderus: <bibl n="Apollod. 2.5.7">Apollod. 2.5.7</bibl></item><item>Xerxes' first halt in his flight: <bibl n="Hdt. 8.120">Hdt. 8.120</bibl></item></list></div3></div2></div1>
Digital Reference Materials • Machine readable dictionaries • <entryFree id="n3709" key="a)krwth/rion" type="main"><orthextent="full" lang="greek">a)krwth/rion</orth>, <genlang="greek">to/</gen>, (<etym lang="greek">a)/kros</etym>)<sense id="n3709.0" n="A" level="1"><tr>topmost</tr> or <tr>prominent part</tr>, <foreign lang="greek">a). tou= ou)/reos</foreign> mountain <tr>peak</tr>, <bibl n="Perseus:abo:tlg,0016,001:7:217"><author>Hdt.</author><biblScope>7.217</biblScope></bibl> • General Encyclopedias
A Research Library Based on the Historical Collections of the Internet Archive • William Y. Arms, Selcuk Aya, Pavel DmitrievComputer Science Department, Cornell University • Blazej KotInformation Science, Cornell University • Ruth Mitchell, Lucia WalleCornell Theory Center, Cornell University D-Lib Magazine February 2006 Volume 12 Number 2 ISSN 1082-9873 http://www.dlib.org/dlib/february06/arms/02arms.html
Main Idea of Article Academic researchers have to comb through collections of libraries, museums, and archives to analyze and synthesize the information buried within them.
A Web Library for Social Science Research Idea is to replace much of the tedious manual effort with computer programs that act as their agents. challenge was to organize the materials and provide powerful, intuitive tools that will make a huge collection of semi-structured data accessible to researchers, without demanding high levels of computing expertise.