710 likes | 879 Views
Making Silent Voices Heard Stephen Rhind-Tutt, President Charting Vanishing Voices Workshop June 29, 2012. Agenda. About Alexander Street The Challenge The Nature of Virtual Space Examples from Alexander Street Partnerships and Collaboration. 1. About ASP.
E N D
Making Silent Voices Heard Stephen Rhind-Tutt, President Charting Vanishing Voices Workshop June 29, 2012
Agenda • About Alexander Street • The Challenge • The Nature of Virtual Space • Examples from Alexander Street • Partnerships and Collaboration
Founded in 2000 by executives who used to work for Chadwyck-Healey, SilverPlatter, Wolters-Kluwer, Gale and Wilson. Headquartered just outside Washington DC, USA Offices in Stevenage, England; Shanghai, China; Kuala Lumpur, Malaysia; Sydney, Australia; Brazil; New Zealand 3,000 customers 2,500 licensors About Alexander Street Press
Collaboration More examples
The Challenge • By 2020 the web will have • > 5 Bn users, (currently 2.3 Bn - 37% of the world) • > 90% of published works prior to 1923 • > Most works published to 2020 • > 4 Billion websites (currently 555m, 71% growth p.a) • > 1 Trillion photographs (Facebookadds 300m daily) • > 100Million pages of facsimiles of manuscripts • > 100Million audio files • > 1 Billion video files (YouTube adds 72 hrs every minute)
Preservation and Access • Little or no cataloging • Mostly undigitized • Decaying film and audio formats • Increasing opportunities to embellish (HD-video, 3-D models, social annotation etc) • More than 6,500 endangered languages • Countless cultural artifacts, audio, video, texts • Hidden collections • (Personal) archives • Field Notes • Data sets
The nature of virtual space… “You must consult the laws of nature…you say “What do you want brick?” and the brick says to you “I like an arch” and you say to brick “Look, I want one too, but arches are expensive…” Brick says “I like an arch”… “Honor the material you use” Louis Kahn (1979)
Understanding the medium • Steel – High cost to create, strong, easy to stamp shapes, medium weight… • Wood – Low cost to create, moderately strong, needs to be crafted, light weight… • Glass – Medium cost to create, weak, easy to craft, transparent • The Web - ?
Nature of electronic publications Page Page Page Page Page Page Page Page Page • Pliable • Evolving quickly • Unlimited in size • Atomic • Interconnected • Interdependent • The link matters more than the object
Understanding the medium Programming languages C++, PERL, VB, etc… Assembly Code Machine Code Binary 0111010011010000101101101000101110100010001110101010101010101011111010101010101111101011100100011101
Understanding the medium Font Standards – Postscript Display Standards – Super VGA Browser Standards – IE 7.0 Document formats - PDF Mark-up Standards – SGML, XML, HTML Communications Protocols – TCP-IP, Modems Plug-in standards – Java Image Standards – JPG, TIFF, etc, etc
Understanding the medium Twitter – local, custom, news Four Square Map Standard - Google Maps, Open Map iOS, Android, Devices – Nook, Kindle, iPad, Phone standards – 3G, 4G, 5G Network protocols – 801 Video Standards – H264, Silverlight, Flash
Evolving quickly On current trends… • Processing speed – by 2015 machines 4 times more powerful than today’s. • Storage space – by 2015 20 Terabytes of storage (8 Bn pages) will cost under $100 • > than 90% of all developed world will have Web access • Significant improvements in the developing world • Phone Bandwidth > 1.5 Mb/s
Evolving quickly On current trends…
Where we’re headed… Why? Therefore Who, What, When, Where? After Data, Information, Knowledge, and Wisdom,Gene Bellinger, Durval Castro, Anthony Mills. http://www.systems-thinking.org/
Understanding electronic products Value in the electronic world is about... Performance “The manner in which or the efficiency with which something reacts or fulfills its intended purpose” Webster’s Unabridged
What do we need to do? • Comprehensive - everything on the network • Everyone on the network • Local and personal (unique verified identity) • Ubiquitous access (everywhere, all devices) • High quality (peer review) • Workflow integration and analysis (deep links to relevant content and tools) • Maximize efficiencies (easy ingestion and dissemination) • Real time currency
Indexing MARC Semantic Controlled vocabularies Ingestion Scanning Uploading Data Crosswalking Producing Filming Recording Licensing Writing Commissioning Outbound Discovery Inbound Discovery Permissions Privacy Permissions Anonymity Shibboleth Community Peer Review Crowdsource Annotation Playlists Quality Bandwidth Encodes # of pixels Sampling API Harvesting Devices Promotion Conferences Adsense E-mail Mailings Tools Transcripts Subtitles Chaptering Translation Usage Stats
Evolution of tasks Process integration Workflow tools & apps Community Building Outbound discovery Inbound discovery Permissions Automated ingestion and tagging Rare and unpublished material Human tagging Republishing public domain Simple, One database Search Warehousing Compiling Directories Printing Typesetting Growing Fading
Evolution of tasks Process integration Workflow tools & apps Community Building Commissioning? Outbound discovery Inbound discovery Permissions Editorial? Automated ingestion and tagging Rare and unpublished material Human tagging Licensing? Republishing public domain Simple, One database Search Quality? Warehousing Compiling Directories Selection? Printing Marketing? Typesetting Growing Fading
Make video searchable… 12 double-spaced pages 5 minutes to read in depth 2 minutes to scan 30 minutes of news =
Be of the web Music Websites Primary Works Newspapers Monographs Journals
Major Collections Individual Titles Library Branded Interface Embeddable Search Box Federated Search Engines
The strain on keyword search… • Questions • Google: Martin Luther King – 8.3m hits (2005), 32.5m (2012) • Google Scholar: 202k hits, options to restrict: • Article • Legal document • Date range (year published) • Patent or Citation
‘Semantic’ Indexing Word Page Chapter Book or Volume Traditional indexing > ‘Semantic’ indexing > Series Who ? What ? Collection When ? Where ?
Increases in Utility Semantic Search Keyword Search Fielded Search Access Do youhave the book titled… All mentions of ‘Star Wars’ All mentions of ‘Star Wars’ by Regan in speeches he delivered in 1985 All mentions of ‘Star Wars’ in texts about Regan published in 1985
What is Semantic Indexing ? • Identify and divide texts into content elements (e.g. letter, diary entry…) • Identify key concepts for these elements • (e.g. authors, sources, battles, encounters…) • Index both elements and associated concepts • Integrate to form a cohesive whole • Unique ways of browsing through concepts • Unique ways to ask questions
Semantic Indexing… Document Text Author IDEncounter IDSource IDDateSubject Age writing Etc… Encounter Author Source SourceEditor/Translator Original Language PublisherPublication DatePublication PlaceSubject of WorkEtc… Encounter Name Cultural Groups Estimated # of peopleStart year Start month Start day Location Expedition Encounter Type Fatalities Etc… Name Date of birth Place of birth Date of death Place of death Nationality Religion Sexual Orientation Occupation Etc…
Semantic Indexing… Document Text Author IDEncounter IDSource IDDateSubject Age writing Etc… “Show me writings by Jesuits, originally written in French, that discuss trade involving the Huron.” Encounter Author Source SourceEditor/Translator Original Language PublisherPublication DatePublication PlaceSubject of WorkEtc… Encounter Name Cultural Groups Estimated # of peopleStart year, month, day Location Expedition Encounter Type Fatalities Etc… Name Date of birth Place of birth Date of death Place of death Nationality Religion Sexual Orientation Occupation Etc…