230 likes | 558 Views
Selected OCLC Research work. Making data work harder. Data mining of WorldCat (e.g., FRBR ... xISBN send OCLC an ISBN, receive all ISBNs for the same work ...
E N D
1. Eye to the Telescope: Future-Gazing & Current Projects from OCLC Research 2006 Amigos Conference
11 May 2006
Dallas, TX
Eric Childress
OCLC Research My sincere thanks to Bonnie Juergens and Laura Kimberly for kindly arranging my appearance here, and to all of you for attending such an early event!My sincere thanks to Bonnie Juergens and Laura Kimberly for kindly arranging my appearance here, and to all of you for attending such an early event!
2. Outline The Big Picture
Pattern Recognition
Brand, Data, Technology trends
The Library - next phase
Selected OCLC Research work
3. Pattern recognition Production anywhere, Global distribution
Make products anywhere, ship them everywhere
Offshore business processes & research centers
Big brands & micro channels
Mega-publishers, -media, -retailers, -search engines
Niche markets exploited via AdWords & affiliate programs
Portable devices, digital content, interactive Net
iPods, now with video; Are iPhones next?
Ringtones, iTunes, Podcasts, Vlogs/Google Video, online gaming, etc.
Self-service, micro-consumption
The “convenience” society – 24x7 stores, ATMs, click-n-buy
Disaggregation – consume by the news story, song, etc.
Intellectual Property issues
Big business not-so-secretly wants all transactions billable
Open Source & Open Content rising (e.g., Apache, Creative Commons) Outsourcing – business process outsourcing (BPO) expected to grow from $6B USD (2005) to $14B USD (2010) [http://www.thehindubusinessline.com/2006/05/08/stories/2006050802730200.htm]
Accounting services, call centers, pharmacy trials, software development, and research (e.g., Microsoft, IBM) are displaying a significant offshore presence
Big brands –
Wal-Mart commonly accounts for as much of 20% of total sales volume for bestselling books in the U.S.
Amazon & Google especially have leveraged narrow distribution channels by sharing the wealth. Also worthy of note is a very robust used book market built in part by “long-tail” sales from online outlets (BISG estimates 2004 used book market = $2.2 Billion (111 million books, 8.4% total consumer spending on books.)
Microsoft is now squarely targeting Google – Amazon has recently switched engines to adopt Microsoft (N.B. estimates indicate that 10% of Amazon users clicked through to Google links when Google was Amazon’s partner). Rumors of Microsoft-Yahoo discussions
Portable devices:
Ringtones market has risen from $68M (2003) to a projected $600M (2006)
MP3 devices = 58 million devices shipped in 2005 to 116 million in 2007;
Video – YouTube (30 million streams per day)
[see http://www.engadget.com/2006/05/04/the-clicker-youtube-and-fair-use-a-match-made-in-heaven/]
IP issues – a complex space gets more complicated
Copyright as originally conceived (i.e. public domain as the default) is greatly diminished – Disney and other corporate content players wield significant political influence. Fair Use and the First Sale Doctrine are inconvenient…
Countering this has been the Creative Commons and similar efforts that allow IP owners to conveniently cede rights for IP reuse with no or minimal, uniform terms and conditionsOutsourcing – business process outsourcing (BPO) expected to grow from $6B USD (2005) to $14B USD (2010) [http://www.thehindubusinessline.com/2006/05/08/stories/2006050802730200.htm]
Accounting services, call centers, pharmacy trials, software development, and research (e.g., Microsoft, IBM) are displaying a significant offshore presence
Big brands –
Wal-Mart commonly accounts for as much of 20% of total sales volume for bestselling books in the U.S.
Amazon & Google especially have leveraged narrow distribution channels by sharing the wealth. Also worthy of note is a very robust used book market built in part by “long-tail” sales from online outlets (BISG estimates 2004 used book market = $2.2 Billion (111 million books, 8.4% total consumer spending on books.)
Microsoft is now squarely targeting Google – Amazon has recently switched engines to adopt Microsoft (N.B. estimates indicate that 10% of Amazon users clicked through to Google links when Google was Amazon’s partner). Rumors of Microsoft-Yahoo discussions
Portable devices:
Ringtones market has risen from $68M (2003) to a projected $600M (2006)
MP3 devices = 58 million devices shipped in 2005 to 116 million in 2007;
Video – YouTube (30 million streams per day)
[see http://www.engadget.com/2006/05/04/the-clicker-youtube-and-fair-use-a-match-made-in-heaven/]
IP issues – a complex space gets more complicated
Copyright as originally conceived (i.e. public domain as the default) is greatly diminished – Disney and other corporate content players wield significant political influence. Fair Use and the First Sale Doctrine are inconvenient…
Countering this has been the Creative Commons and similar efforts that allow IP owners to conveniently cede rights for IP reuse with no or minimal, uniform terms and conditions
4. Voices carry Old media losing to new media
Broadcast radio vs Satellite & Internet radio
Newspapers vs Google News, Craigslist, etc.
Brand & voice through new channels
Blogging by top execs & by staff
Personal branding – “Webcred” is key to one’s fortunes
Individual-driven content rising:
Personal web pages
Blogs (a new one each second!)
Digital images/video (flickr, Picasa, YouTube)
Bookmarks, etc. (e.g., del.icio.us, furl, digg, technorati)
Infotainment increasingly social & peer-to-peer
Community authorship, open content (Wikipedia)
Myspace, Facebook, etc. personal presence services MySpace has 70M registered users, had 47M transactions in Feb. 2006 & is the second most visited destination on the Web after Yahoo
[http://www.azstarnet.com/business/125984]MySpace has 70M registered users, had 47M transactions in Feb. 2006 & is the second most visited destination on the Web after Yahoo
[http://www.azstarnet.com/business/125984]
5. The blogosphere is doubling every 3 months (a new blog created every second) [http://www.sifry.com/alerts/archives/000432.html]
The blogosphere is doubling every 3 months (a new blog created every second) [http://www.sifry.com/alerts/archives/000432.html]
6. Data rules Deep indexing:
Amazon’s “Search Inside” and “Statistically Improbable Phrases”
Google, Yahoo, Microsoft underwriting library digitization work
Library space: NetLibrary, Alexander Street, many others indexing content
Custom search feeds: Google Alerts, News topic RSS, etc.
Instant verification:
Many voices, many fact-checkers widely-distributed – Spin doctors beware!
Recommendation systems:
Amazon, Apple iTunes, other retailers – “people like you chose…”
Novel concepts: Pandora – suggests music based on intrinsic patterns of music you like (the “music genome”)
Empowered consumption
My iPod, my tags, my playlists
Reuse, derive, mix content from many sources (e.g. Mashups) The "world churns out new digital information equivalent to the entire collection of the U.S. Library of Congress every 15 minutes. Such a proliferation of information in digital format, occurring almost 100 times a day, adds up to approximately five exabytes (five quintillion bytes or five billion gigabytes) a year [http://www.nist.gov/public_affairs/techbeat/tb2006_0330.htm#bytes]The "world churns out new digital information equivalent to the entire collection of the U.S. Library of Congress every 15 minutes. Such a proliferation of information in digital format, occurring almost 100 times a day, adds up to approximately five exabytes (five quintillion bytes or five billion gigabytes) a year [http://www.nist.gov/public_affairs/techbeat/tb2006_0330.htm#bytes]
7. Techscape Web 2.0:
The Network spans all attached devices (e.g., iPods, phones, etc.)
Software resides on the Net, not the workstation
“Participative Net” – social environment, shared content reused
Everywhere Net
Internet, GPS, cellphone, municipal wireless…
System refactoring
Modularity (micro-services, remixing, multiple sources)
Layering (loosely-coupled systems)
Interoperability (low-friction, high reuse)
Lightweight protocols gaining favor (e.g., SRW/SRU, microformats)
Machine-oriented services (web services)
Web 2.0: Source: http://radar.oreilly.com/archives/2005/10/web_20_compact_definition.html
For more information & interesting graphic: What Is Web 2.0: Design Patterns and Business Models for the Next Generation of Software / Tim O’Reilly
http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html?page=1
Mashup defined: “a website or web application that seamlessly combines content from more than one source into an integrated experience.
Content used in mashups is typically sourced from a third party via a public interface or API. Other methods of sourcing content for mashups include Web feeds (e.g. RSS or Atom) and JavaScript includes.” [http://en.wikipedia.org/wiki/Mashup_%28web_application_hybrid%29]
Web 2.0: Source: http://radar.oreilly.com/archives/2005/10/web_20_compact_definition.html
For more information & interesting graphic: What Is Web 2.0: Design Patterns and Business Models for the Next Generation of Software / Tim O’Reilly
http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html?page=1
Mashup defined: “a website or web application that seamlessly combines content from more than one source into an integrated experience.
Content used in mashups is typically sourced from a third party via a public interface or API. Other methods of sourcing content for mashups include Web feeds (e.g. RSS or Atom) and JavaScript includes.” [http://en.wikipedia.org/wiki/Mashup_%28web_application_hybrid%29]
8. Libraries - next phase Surfacing seamlessly
Point-of-need delivery (e.g., library content in non-library apps such as the Web, course management systems, etc.)
Open WorldCat, RedLightGreen, OAIster, etc.
Open standards, easy integration of data from many sources
Re-thinking, re-engineering
Library 2.0 changes systems & services
Moving towards “Lego”-like modularity in systems & data
User-tasks-oriented designs (e.g., NCSU catalog)
Adding means for users to contribute, shape their own experiences
Supporting Library 2.0 will mean changing organizations & operations
More building space for people-to-people interaction, less for books
Process & operational changes
Example: Choose-acquire-catalog vs Acquire-choose-catalog
9. Cascading commodization
Open standardization of content/software + exposure
“Any services that can abstracted to generic network services will be” -- Robin Murray
Cascading commodization
Open standardization of content/software + exposure
“Any services that can abstracted to generic network services will be” -- Robin Murray
10. Selected OCLC Research work Making data work harder
Data mining of WorldCat (e.g., FRBR (Functional Requirements of Bibliographic Records) clustering of related records)
FictionFinder – browse/search all fiction works in WorldCat
Audience Level – assigns an audience indicator value based on data in bib records for a work, or – alternatively – by inferring audience from the type and number of libraries holding a work
xISBN – send OCLC an ISBN, receive all ISBNs for the same work
New views, new uses
DeweyBrowser – Dewey-based visualization of WorldCat, more
Live Search – An AJAX-based search interface that leverages FRBR, advanced relevance, and rank-by-holdings to provide fast results
Terminology Services – Controlled vocabularies searchable in a sidebar
11. FRBR Group 1 Entities
The FRBR model is made up of three sets of entities.
Group 1 is a 4-level bibliographic model that’s described as you see here. [Read the slide]
OK … what does this really mean?
A work: Shakespeare’s Hamlet is a work … not any particular edition or version or translation … but Hamlet as an intellectual concept.
An expression: this is a work realized in a particular version or translation. The original text of Hamlet is an expression; so is a French translation or a German one.
A manifestation: this is an expression that is issued or published … So the edition of the Andre Gide French translation published in Paris in 1946 is a manifestation.
An item: a copy of a manifestation on the shelves in a library … so the copy of Andre Gide’s translation of Hamlet in the stacks at the Library of Congress with the call number PR2779.H3G5 1946
***
These four entities in Group 1 are the product of intellectual or artistic endeavor. The entities form a hierarchy with work at the top of the model so it may help to see them in a diagram as here.
This diagram begins to show how the Group 1 entities are related to each other.
A single work can be realized in various expressions … for example, a work may be translated into many different languages (a one-to-many relationship).
One or more expressions can be embodied in one or more manifestations (a many-to-many relationship)
The FRBR model is made up of three sets of entities.
Group 1 is a 4-level bibliographic model that’s described as you see here. [Read the slide]
OK … what does this really mean?
A work: Shakespeare’s Hamlet is a work … not any particular edition or version or translation … but Hamlet as an intellectual concept.
An expression: this is a work realized in a particular version or translation. The original text of Hamlet is an expression; so is a French translation or a German one.
A manifestation: this is an expression that is issued or published … So the edition of the Andre Gide French translation published in Paris in 1946 is a manifestation.
An item: a copy of a manifestation on the shelves in a library … so the copy of Andre Gide’s translation of Hamlet in the stacks at the Library of Congress with the call number PR2779.H3G5 1946
***
These four entities in Group 1 are the product of intellectual or artistic endeavor. The entities form a hierarchy with work at the top of the model so it may help to see them in a diagram as here.
This diagram begins to show how the Group 1 entities are related to each other.
A single work can be realized in various expressions … for example, a work may be translated into many different languages (a one-to-many relationship).
One or more expressions can be embodied in one or more manifestations (a many-to-many relationship)
12. OCLC FRBR work set algorithm-based cluster of related WorldCat records *Similar to Family of works see: Tillett, Barbara. 2004. What is FRBR?: A Conceptual Model for the Bibliographic Universe.
Available at: http://www.loc.gov/cds/downloads/FRBR.PDF
Incorporating the concepts of the FRBR model in systems:
Superior presentation of search results
Esp. in large files – more intuitive clustering
May help streamline library cataloging
Reduces repeated keying of work-related info
Bibliographic & management intelligence
New insights into works (e.g., OCLC’s 1000 list)
Libraries can operate at workset level (e.g., ILL)
*Similar to Family of works see: Tillett, Barbara. 2004. What is FRBR?: A Conceptual Model for the Bibliographic Universe.
Available at: http://www.loc.gov/cds/downloads/FRBR.PDF
Incorporating the concepts of the FRBR model in systems:
Superior presentation of search results
Esp. in large files – more intuitive clustering
May help streamline library cataloging
Reduces repeated keying of work-related info
Bibliographic & management intelligence
New insights into works (e.g., OCLC’s 1000 list)
Libraries can operate at workset level (e.g., ILL)
13. As of mid 2005?As of mid 2005?
14. FictionFinder An OCLC experimental prototype
Supports searching & browsing of fiction materials cataloged in WorldCat
Fiction records — 2.8 million
Unique works — 1.4 million
Total holdings — 130 million
Employs FRBR to:
Build a “work” view & cluster related records
Support the creation of special indexes OCLC Research team:
Diane Vizine-Goetz (lead)
Roger Thompson
Carol Hickey
Lance Osborne
J.D. Shipengrover
New version:
Available later in 2006
Improved navigation & work-based displays Interface: http://fictionfinder.oclc.org
Project page: http://www.oclc.org/research/projects/frbr/fictionfinder.htm
What’s in*
Fiction
Drama
Novels
Short stories
Text
Including eBooks
Sound
Audiobooks & cassettes, etc.
What’s out*
Works about fiction, drama, etc.
Movies, films, video
Music
Interface: http://fictionfinder.oclc.org
Project page: http://www.oclc.org/research/projects/frbr/fictionfinder.htm
What’s in*
Fiction
Drama
Novels
Short stories
Text
Including eBooks
Sound
Audiobooks & cassettes, etc.
What’s out*
Works about fiction, drama, etc.
Movies, films, video
Music
16. Prototype redesign in progress… Prototype redesign in progress…
22. Questions? My inspiration for the title of this presentationMy inspiration for the title of this presentation
23. Further reading OCLC Reports
http://www.oclc.org/reports
OCLC Research
http://www.oclc.org/research
OCLC-related blogs:
Lorcan Dempsey http://orweblog.oclc.org
Thom Hickey http://outgoing.typepad.com/outgoing
Stu Weibel http://weibel-lines.typepad.com
It’s All Good http://scanblog.blogspot.com