390 likes | 883 Views
Computing - The Next 10 Years Infinite Memory and Bandwidth : Implications for Universal Access to Information. Raj Reddy Carnegie Mellon University Pittsburgh, USA April 6, 2001 Talk presented at Georgia Tech 10 th Anniversary Convocation. Future Technology.
E N D
Computing - The Next 10 YearsInfinite Memory and Bandwidth : Implications for Universal Access to Information Raj Reddy Carnegie Mellon University Pittsburgh, USA April 6, 2001 Talk presented at Georgia Tech 10th Anniversary Convocation
Future Technology • Computational power doubles every 18 months (Moore’s Law) • 100-fold improvement every 10 years • Disk Densities double every 12 months • 1000-fold improvement every 10 years • Optical bandwidth doubling every 9 months • 10000-fold improvement every 10 years • Infinite Bandwidth and Memory before Computation • Cost decreasing, density increasing
What does the future hold? We can see some glimpses of the future • Universities without walls, • Computers that never fail and self healing software • Every home with giga PCs connected by gigabit networks • Access to all the published creative works of the world • anytime anywhere anyone • Emergence of the World Bank of, not money, but Knowledge • Systems, so-called geriatric robotics, that help the disabled lead normal lives, and • Systems that give the rest of us superhuman capabilities, like getting a month’s work done in a day
Universal Access to Information Information at your fingertips • Access to all human knowledge: • Anyone • Anywhere • Anytime
All Human Knowledge Recorded Information • Books • Periodicals (journals, newspapers) • Music, opera, dance • Paintings, Sculptures and Monuments • Movies, video • Databases, software Suppose all of this were on the Web
Examples from www.ulib.org • Lecture: Michael Shamos on UL • Books: A Child’s History of England • Art: Greek Art
Examples from www.ulib.org • Lecture: Michael Shamos on UL • Books: A Child’s History of England • Art: Greek Art
What is a book? What is a digital book ? • Collection of static content • Collection of dynamic multimedia content • Linearly organised • Browsable, navigable • Selected by an Author as related • Selected by User as related • Occupying a single physical location • No physical existence • Physically bound between cover • Instantly Transmittable
What is a Library? • Collection of items • Linearly organized (shelves) • Chosen by budget constraints • Occupying physical space • Cataloged for access
What is a Digital Library? • Collection of digital items • (potentially huge) • Encompassing everything (someday) • Organized arbitrarily • Occupying no physical space • Fully content-searchable
Universal Library Implications • Elimination of time, space, cost constraints • Democratization of information • “Knowledge is power” • Hyperlinks to related information • Preservation and Dissemination of Knowledge • faster and wider • Backup preservation • Preservation of culture
Universal Library Implications • Research • Web of scholarly information, reviews • Teaching • Support for distance education • Academic publishing • Virtual museums • Interactivity
Universal Library Applications • Acess to “Born Digital” Information • World produces a Billion Billion(1018) bytes of information every year(Lyman and Varian) • 90% is stored digitally • Digital museum • Digital tour guide • What’s in the Taj Mahal?
Universal Library Applications • Research assistant • What did Newton write about color? • What are Moslem views on race? • Teaching resource • “Act out” books in virtual reality • Real-time explanations • Business information • Data mining
We Can Store Everything • 1 book = 500 pp. • 1MB uncompressed – 300KB compressed • 108 to 3x 108 books = ~1014 bytes = 100 terabytes • Over 100 million computers on the Internet • At 1 GB each, >100 petabytes now • 1 GB of disk costs ~$3 • 100 terabytes < $300 thousand to $1 million
Non-textual Material • 1 Movie = 10 GB • 1 petabyte = 100,000 movies • All the movies ever made! • Audio • 1 petabyte = 3000 years of music • All music ever performed or recorded • Paintings and Photos @ 1 MB • 1 petabyte = 1 billion painting or photos
Non-textual Material • Gore’s Digital Earth • “A multi-resolution, three-dimensional representation of the planet, into which we can embed vast quantities of geo-referenced data.” • Area of Earth »1/2 peta m2 • 1000 bytes/m2 feasible • 2 MB/m2 not practical yet Þ 1021 bytes = 1 zettabyte • {peta-, exa-, zetta-, yotta-}
Technological Challenges • Input (scanning, digitizing, OCR) • Data representation • text, notations, images, web pages • Navigation and Search • Multilingual Issues • Output (voice, pictures, virtual reality) • Synthetic Documents
Universal Library Design • Modular • Technology plug-ins (e.g. machine translation) • Distributed • Mirror sites • Multiple interfaces • Human (languages, cultures, literacy) • Machine
Universal Library Design • Speech input/output • Pictorial output • Language support • Translation assistants • Summarization tools • Synthetic documents • Encyclopedia-on-demand
Input Issues • Non-digital media • Conversion, scanning, correction • Triple keyboard, uncorrected OCR • Digital media • Formats, conversions, color representation • ASCII, HTML, SGML, XML, PDF, PS, TEX • JPEG, TIFF, GIF?
Input Issues • Structured matter • Musical notation, Laban • Chemistry • 3D Items • Resource allocation (what’s first?) • Duplication of effort (no registry)
Metadata • Data about an item not part of the item • Bibliographic • Format, medium, encoding, resolution • Provenance • Reliability, integrity • Permissions • Who generates metadata?
Navigation Making Sense Of The World’s Knowledge • Browsing, finding, searching, flying • Fractal view • Keys are granularity and connectivity • View whole collections or one glyph • Understandingstructure of information
MATHEMATICA Canonical Form: Integrate[ Times[Power[E,Times[-1,Power[V1,2]]], Sin[Power[V1,2]]], {V1,0,Infinity}] Searching Mathematics
Multilingual Issues • Character sets • Representations Íîäà ôèçè÷åñêè íàõîäèòñÿ â çäàíèè Èçâåñòèé Нода физически находится в здании Известий • Multilingual navigation • Translation assistance
Synthetic Documents • Documents derived automatically from retrieved information • Multilingual translation • Abstracts, summaries, glossaries • Encyclopedia-on-demand
Information Reliability • Existence ¹ validity • Universal Library Philosophy • Avoid value judgments • Provide information from which users(and programs) can assess validity • Source, reputation, recency, reviews, consistency
Scaling Problems • Search services (e.g. Altavista) index >108 documents • Suppose there were 1012 ? • How can a billion users access the same item at once?
Policy Challenges • Use of copyrighted material • Economics (Who pays? Who gets?) • Privacy • Reliability of information • Change in the nature of teaching
Use Of ©Content • Philosophy: must pay for use • Authors, publishers will not suffer • Implied license • Automated permissions • Bulk licensing • Compulsory licensing • Owner CAN’T refuse; user MUST pay
Economics • Flat-fee subscriptions (e.g. HBO) • Metered use (electric company) • Microcharge (Tobias “clickl”) • Free (paid by government) • Automated permissions • Use measured by technology
Operating Model • Single portal for access to all information • Universal Library provides input, access, multilingual, output and synthesis tools • Universal Library will be a model scanning operation • Registry of digitized works
Operating Model • Specialized collections curated by specialists, provided to Universal Library • Foreign collection performed in foreign countries • Universal Library will be mirrored in ~12 sites around the world
Universal Library Status • >13,000 digital volumes • Art • Newspapers • Music, video • Portal to hundreds of other collections Visit http://www.ulib.org
Projects • Navigator • Academic electronic publishing • Electronic Union Catalog • Books out of copyright books out of print • Software distribution
Conclusions and Recommendations • Conclusions • Barely 10% of all public information is available on the Internet • Government needs to play a leadership role in developing digital libraries • Significant technical and operational challenges in migrating and maintaining holdings in digital form • Intellectual Property rights need to be addressed to facilitate creation and access digital libraries • Recommendations • Support research: meta data, scalability, multiple languages, security, and usability • Create testbeds: million book project • Place all public governmental information online • Preserve IP rights of creators by creating tax incentives for public use of online copyrighted information