140 likes | 416 Views
Charles Duncan C.Duncan@intrallect.com. Automatic Metadata Generation. JISC Project. March – July 2009 Gather use cases both to inform uptake of available automatic metadata tools and to inform future tool requirements Deliverables
E N D
Charles Duncan C.Duncan@intrallect.com Automatic Metadata Generation
JISC Project • March – July 2009 • Gather use cases both to inform uptake of available automatic metadata tools and to inform future tool requirements • Deliverables • Synthesis report on automated metadata generation and its uses at national and international levels • General guidance document on different automated metadata generation approaches for service providers in HE • Priorities for required tools and services with an outline of costs and benefits
Generic View • Applicable to: • The digital library, eLearning, Scholarly Communications, eScience, Curation and Preservation
Importance of USE • Generating metadata is worthless unless there is a clear USE for that metadata • Generation use cases will require matching metadata use examples
Questions to consider • where useful metadata lies • what tools exist to extract metadata • how these tools should be integrated into the deposit process • how the many different formats of resources can be handled
Why use metadata? • Discovery • Search • Refining searches • Exposed information allows human judgement • Recommendation service • Tag clouds • Popularity measures (promote resources and resource owners) • Ability to get additional information (tracks, film details, etc) • Organising information helps retain knowledge • Stakeholder-specific – benefits for suppliers/consumers • Making links with other people with similar profiles • Auditing – ability to identify gaps, quality management
Where useful metadata lies • The way people organise their resources • Behaviour (playlists) • Personal profiles • Image metadata (embedded and transportable) • Pdf, office docs, mp3, video (mpeg, dvd) • Databases (imdb, albums, amazon, bar codes, isbn, etc) • Identity • Authenticated in a role, attribution: capture of ownership information and affiliation • Controlled vocabularies – mapping
Golddust c-values, user oriented • Image geographic info (exif) gps location and direction (e.g iphone/mac photo manager) • Dynamic metadata – • Use of object, comments, citations, tracking use and e.g location in a VLE • Amega report • User tagging - Flickr • Recommendation service • Metadata – resources • Metadata - users
What tools exist to extract metadata • iTunes • From input • From databases • Metadata “scrapers” – e.g. zotero, refworks (proquest) • openURL link resolvers (identifier standards) • iPhoto face recognitions • Transcription of audio (e.g. Dolphin) • Text mining – frequency of word use, context of word use (wordle.com, autonomy) • Google, amazon, lastfm, spotify, (can also use negative results – dislikes) • Creating thumbnails, validate file format (see RepoMann, Jove, Driod) • ROAR harvests and checks file formats in repositories • Output to multiple formats
How to integrated tools into deposit • Scraping – adding own metadata - converting formats – storing • iTunes ripping a cd – what is the deposit process? (gracenotes) • Size of the community matters – common objects that many people use • Integration tools for AMG, deposit and repositories/archives
Handle different formats • Formats for resources • Formats for metadata
Use case 1 • Overview • Metadata Generation • Metadata Use
Use case 2 • Overview • Metadata Generation • Metadata Use
Use case 3 • Overview • Metadata Generation • Metadata Use