330 likes | 433 Views
Texts and Digital Objects. What seems to have changed. The web as universal library. Generation I the ASCII text Generation II the XML text Generation III the book as object. The web as universal library. Generation I the ASCII text A web of text nodes with documents at the nodes
E N D
Texts and Digital Objects What seems to have changed
The web as universal library • Generation I the ASCII text • Generation II the XML text • Generation III the book as object
The web as universal library • Generation I the ASCII text A web of text nodes with documents at the nodes • Generation II the XML text A web where the documents retain deep structure but the web is still the library • Generation III the book as object The library will be imported to the web. Page by page. Library by library. The web is simply a way of accessing the universal library of print objects.
But are we going backwards? Some of the movement looks a trifle retrograde
Generation I • The primacy of texts Nodes can in principle also contain non-text information such as diagrams, pictures, sound, animation etc. The term hypermedia is simply the expansion of the hypertext idea to these other media. (Tim Berners Lee 1989 proposal for a www written at CERN) • Texts: hypertext, http, and ASCII will do
Generation I circa 1995 A forest of connected texts which frankly doesn’t look too great.
Project Gutenberg • Texts are what matter • Accuracy matters • Page numbering doesn’t • Typography doesn’t matter either
But a good deal is lost • Typography may not matter, but good web design does • Typography carries a lot of meta-data • Meta-data and the formal structure of the text needs to be kept • Variety, flexibility, and machine-readability ……. xml
Generation II circa 2000 Books repurposed for the web look a lot better than flat ASCII. But there is a big overhead.
Republished for the web • Inevitable duplication • Page numbers don’t matter • Typography can be optimised for web browsers • Structure and added value is preserved • Links and HTTP connections are fine • But this re-purposing is a hassle and ultimately confusing
So Google has a better idea • Words matter • Pages matter • Books matter • Libraries matter • And they should be searched in the way that all other digital objects and collections can be searched
Generation III circa 2005 Put books on the web just as they are. Books not texts are the primary resource for a library.
Keep it simple • Scan every page of every book • OCR every word and symbol • Store every word and symbol in a database • Store an image of every page in the database • Know precisely where every word is on every page
How the Google system works • The browser has a JPEG and some HTML around it • The web page is an image with search terms highlighted • The intelligence is in the database • Search is precise and fast • The Google database would be the universal library
Pages really matter • Every print page is a web page • A book is just a collection of web pages • The concept of a ‘union catalogue’ will now have its co-relative a ‘union library collection’ (ie what is a duplicate?) • There is no such thing as a Google edition • Are the Google standards of preservation good enough?
Simplicity and Conservatism • Publishers should be flattered • Book designers, editors and typographers should be more than flattered • Authors are still authors • Catalogues and references work with minimal adjustment • Book warehouses become obsolete
So what is lost? • Perhaps publishers and authors lose profits???? • The text is lost. The text is readable and searchable…. But there is no text. • A searchable text, but not an entire and complete text. A collection of pages (JPEGs). • Certainly none of the deep structure of the xml is retained • Linkages and references are absent
What is gained? • Books: all texts, documents and libraries become fully searchable. • Automation of reading and accessibility of rare editions. • Incredibly cheap in relation to the enhanced availability • Bibliographies and Catalogues and other systems of metadata are preserved
There is much left to do • No fine structure in the pages • Poor navigation within the books • The commercial model has to be invented • It will not all be advertising driven
Exact Editions uses a Google-style platform for magazines Technology is similar but the sociology is different.
Similar to Google Book Search • Platform for publishers of magazines • Publishers can add web functionality (links and advertisements) • PDF as input and automated production • Subscription or free access • Full web functionality (statistics and integration with web apps)
Adam Hodgkin adam.hodgkin@exacteditions.com