230 likes | 379 Views
Sites Cleanup: The Clone Wars. Kara M. Lewis, Collections Information Program Manager Patricia L. Nietfeld, Collections Manager Smithsonian Institution, National Museum of the American Indian. Long, long ago…in 2006….
E N D
Sites Cleanup: The Clone Wars Kara M. Lewis, Collections Information Program Manager Patricia L. Nietfeld, Collections Manager Smithsonian Institution, National Museum of the American Indian
Long, long ago…in 2006… • NMAI migrated all geographical data from two legacy databases into the Sites module in EMu • Much of the data was not standardized • Much of the data was “duplicate” information • Made the decision to migrate “as is” and use the tools in EMu to clean it up • As a result…
The Conflict • ~39,000 Unique combinations • ~90,000 Sites records created • ~337,500 Catalog records affected • At least half were duplicates, or data was in the wrong field • The rest were “variations,” obsolete place names, misspellings, or just plain wrong
“Do or do not. There is no try.” • Conventions: • No abbreviations • no St. for Saint • Names in language of country • Alternate versions in parentheses • Lac Saint-Jean (Lake Saint John) • Use 1st level political subdivision • Ecuador, Manabí Province • Use current names
“Do or do not. There is no try.” • Conventions: • Country – Region? – State • Pará State, North Region • Subdivisions on case by case basis • Leave blank if can’t determine higher subdivision - Fill it in if known - Most specific info. in Provenience
“You must unlearn what you have learned.” • What Pat did not do: • Not a lot of energy spent on US state archaeological site numbers • This was cleanup, not verification
“Control, control, you must learn control!” • Started with spreadsheet unique combinations of geographical data • Split into smaller spreadsheets by state or country • Learn about the country
“Ready are you? What know you of ready?” • Content Resources: • General: Wikipedia, Statoids.com • International Travel Maps and Books of Vancouver, Canada • Country’s official website • Archaeological websites • Indigenous peoples’ websites • Government agencies • Maplandia.com • Google, JSTOR • MAI publications
“It is the future you see.” • Nomenclature Resources: • US: Geographic Names Information System (GNIS) • Canada: Geographical Names Search Service (GNNS) • Others: GEOnet Names Server (GNS)
The Implementation in EMu • Contractors do not have to be content experts • Create new Sites, rather than “reuse” • Practice first • I do the actual deletions
The Confrontation • Start in the Sites module • Create list view with all fields • Search and group “old” Sites
The Confrontation • Open 2nd window and create “new” Sites. • Find the unique combos of Sites and Provenience in Catalog • Check the “Collection” field
The Confrontation • Start with Objects – usually “one to one” replacement • Sort & highlight those to receive new Site • Replace old IRN with new IRN • Replace not Replace All • Replace the Provenience in those already changed
The Confrontation • Photo Archives is a different story • Each record created a new Site record = duplicates • Many IRNs to replace per “new” Site record • Instead, use periods to represent wildcards…
The Confrontation • Start with the number of digits that matches the “new” IRN • Replace not Replace All • Then go through Provenience as before
The Climax • Double check with View>Attachments>Selected Records • When spreadsheet completed, retire the “old” Sites
The Climax • Contractors let me know what is retired • Double check that • all are detached • DELETE!
Triumph! • New data export to check unique values • Checked with Pat on questions • Final spreadsheet given to contractor
The Resolution • We now have just under 15,500 Sites Records • We finished in one year • Averaged 2 contractors at a time • Module is now tightly controlled • Data is ready for the web
The End (or is it??) • Sites was just the beginning… • Kara M. Lewis, Collections Information Program Manager lewiskm@si.edu • Patricia L. Nietfeld, Collections Manager nietfeldp@si.edu