240 likes | 422 Views
Present and Future. Google Books. James Crawford Engineering Director Google Books. Why and how Google scans books Challenges The Future . Overview. Google Confidential and Proprietary. Why and How Google Scans Books. Google’s mission.
E N D
Present and Future Google Books James Crawford Engineering Director Google Books
Why and how Google scans books Challenges The Future Overview Google Confidential and Proprietary
Google’s mission To organize the world’s information and make it universally accessible and useful. Online contentBillions of web pages Offline contentBillions of items becoming indexed Google Confidential and Proprietary
Book Team's Mission To organize the world’s books and make them universally accessible and useful. Google Confidential and Proprietary
Make accessible – Limited previews from publishers & authors (30,000 publisher partners)
Number of books scanned: fifteen million Number of pages: five billion Number of words: over two trillion Libraries: forty Publishers: thirty thousand Vital stats Google Confidential and Proprietary
Google Books in a nutshell Google Confidential and Proprietary
478 languages Kabardian: 16Khasi: 78Khoisan: 53Khotanese 21Kikuyu: 48Kinyarwanda: 77 Kyrgyz: 702Kimbundu: 14Konkani: 83Komi: 48Kongo: 134Korean: 35905 Kosraean: 10 Kpelle: 6Karachay-balkar: 17Karelian: 28Kru: 26Kurukh: 30Kuanyama: 9Kumyk: 16Kurdish: 220Kutenai: 0Klingon: 3Kalmyk: 26 • Kashubian: 14 • Kara-kalpak: 102Kabyle: 50Kachin: 18Kalaallisut: 82Kamba: 29Kannada: 2600Karen: 50Kashmiri: 289Kanuri: 25Kawi: 106 • Kazakh: 1871
A diversity of dates • 18?? • [196-?] • 1957/8 • late 14th century • finita quarto nonas Januarias [1490] • mense Septembri: Anno Millesimo q[ui]ngentesimo decimonono • mense iulio, anno M.D.XXXX • התשנ״א (Hebrew year 5751 = Gregorian 1990/1 CE) • ١٣٧٣ (either Islamic year 1373 AH = Gregorian 1953/4 CE or Persian year 1373 AP = Gregorian 1994/5 CE)
Works, Expressions, Manifestations, and Items Library of Congress Books in Print title Lord of the Rings, v.1 The Fellowship of the Ring author John Roland Reuel Tolkien J.R.R. Tolkien publisher Houghton Mifflin Ballantine Books year 1954 1994
Google Editions Buy Anywhere: Purchase directly on Google Books, devices, retail partner sites, affiliates, and brick and mortar stores. Read Anywhere: Users can read eBooks on desktop, tablets, iPhone, Android phone, and eInk Readers. Cloud storage and cloud sync. More to Read: Target is 400K+ paid books and over 2M freepublic domain books.
Google Book Settlement (US only) • If approved, resolves lawsuit brought against Google • Benefits: • Rightsholder control • Snippets => 20% • Library subscriptions • Free terminal in every US public library building • Downloadable books for purchase • Access for the print-disabled • Book Rights Registry: a non-profit organization to find and pay rightsholders • Research corpus
Books as a corpus of human knowledge • Understand one book • Understand all books • Understand relations between books
Linguistic analysis • "Research that performs linguistic analysis over the Research Corpus to understand language, linguistic use, semantics and syntax as they evolve over time and across different genres or other classifications of Books."
Steven Abney and Terry Szymanski, University of Michigan. Automatic Identification and Extraction of Structured Linguistic Passages in Texts. Elton Barker, The Open University, Eric C. Kansa, University of California-Berkeley, Leif Isaksen, University of Southampton, United Kingdom. Google Ancient Places (GAP) Dan Cohen and Fred Gibbs, George Mason University. Reframing the Victorians. Gregory R. Crane, Tufts University. Classics in Google Books. Miles Efron, Graduate School of Library and Information Science, University of Illinois. Meeting the Challenge of Language Change in Text Retrieval with Machine Translation Techniques. Brian Geiger, University of California-Riverside, Benjamin Pauley, Eastern Connecticut State University. Early Modern Books Metadata in Google Books. David Mimno and David Blei, Princeton University. The Open Encyclopedia of Classical Sites. Alfonso Moreno, Magdalen College, University of Oxford. Bibliotheca Academica Translationum: link to Google Books Todd Presner, David Shepard, Chris Johanson, James Lee, University of California-Los Angeles. Hypercities Geo-Scribe. Amelia del Rosario Sanz-Cabrerizo and José Luis Sierra-Rodríguez, Universidad Complutense de Madrid. Collaborative Annotation of Digitalized Literary Texts. Andrew Stauffer, University of Virginia. JUXTA Collation Tool for the Web. Timothy R. Tangherlini, University of California-Los Angeles, Peter Leonard, University of Washington. Tools & Techniques for Automated Literary Analysis, Based on the Scandinavian Corpus in Google Books. Digital Humanities
Insights into human progress oxide of lead may be thus a heavy fire a striking proof miles distant from terms of peace presents the appearance more than mortal vexation of spirit zeal and devotion lesbian and gay health care professionals abuse and neglect the overall process shift away from the power elite a research project the poor countries probability of failure increased awareness of Old-fashioned trigrams New-fangled trigrams Google is preparing trigram data for release for research purposes Source: Matthew Gray & Yuan K. Shen Google Confidential and Proprietary
Organize the world's books and make them universally accessible and useful