300 likes | 442 Views
Cleaning Metadata. Graphic by Ryan Schenk. Outline. Introduction Definitions Cleaning metadata General Thoughts Process, Tools, and Lessons Going forward. Definitions. Metadata Schema Metadata Cleaning. Metadata. Summary information about something. The Something.
E N D
Cleaning Metadata Graphic by Ryan Schenk
Outline • Introduction • Definitions • Cleaning metadata • General Thoughts • Process, Tools, and Lessons • Going forward
Definitions • Metadata • Schema • Metadata Cleaning
Metadata Summary information about something The Something Summary Information Title: Recovering Gear Photographer: Jan Hahn Date: 1950 Ship: Atlantis People: Nat Corwin, Dean Bumpus
Schema The structure of a set of information The Structure ID | Subject 1 |Nat and Dean aboard Atlantis in 1950 The Information A different schema with the same information: ID | Date | People | Ships 1 | 1950 | Nat Corwin, Dean Bumpus | Atlantis
Cleaning Metadata Converting metadata into a more usable form Before cleaning: ID | Caption 1 | Corwin, N. and Dean Bumpus on Atlantis in 1950 Loose schema, different formats, ambiguous, not atomic After cleaning: dc.id | dc.date | dc.subject.person | dc.subject.ship 1 | 1950 | Corwin, Nathaniel; | Atlantis (Ketch) | Bumpus, Dean F. | Precise schema, standard formats, specific, atomic
Cleaning Metadata • General Thoughts • Process, Tools, and Lessons
General Thoughts Cleaning metadata is like... Engineering Bundesarchiv, Bild 183-1989-0523-016 / CC-BY-SA [CC-BY-SA-3.0-de (www.creativecommons.org/licenses/by-sa/3.0/de/deed.en)], via Wikimedia Commons
General Thoughts Cleaning metadata is like... Archaeology Attribution, Noncommercial, Share Alike http://www.flickr.com/photos/dunechaser/665480669/
A Process for Cleaning Metadata 1. atomization 2. addition 3. reconciliation 4. reassembly
AARR! • Atomization • Addition • Reconciliation • Reassembly
1. Atomization Breaking down information into basic elements
Atomization Lessons for Metadata Designers Loose schemas and free comment fields are tough to atomize. ID | Subject 1 |Nat and Dean aboard Atlantis in 1950 Structured schemas don't need to be atomized ID | Date | People | Ships 1 | 1950 | Nat Corwin, Dean Bumpus | Atlantis
2. Addition Adding information
Addition Lessons for Metadata Designers Addition is time-consuming and often impossible. Record as much as you can from the start!
3. Reconciliation Standardizing information
Reconciliation Lessons for Metadata Designers Free text fields tend to produce Irregular information. Temple of Doom Movie: Movie: Indiana Jones and the Temple of Doom Controlled vocabularies and selection widgets will keep your information standardized.
4. Reassembly Recombining information in a new form
Reassembly Lessons for Metadata Designers Be consistent. It takes time to reassemble multiple formats.