1.2k likes | 1.39k Views
Toro 1. EMu on a Diet. Yale campus. Peabody Collections Counts & Functional Cataloguing Unit. Anthropology 325,000 Lot Botany 350,000 Individual Entomology 1,000,000 Lot Invertebrate Paleontology 300,000 Lot Invertebrate Zoology 300,000 Lot
E N D
Toro 1 EMu on a Diet
Peabody CollectionsCounts & Functional Cataloguing Unit • Anthropology 325,000 Lot • Botany 350,000 Individual • Entomology 1,000,000 Lot • Invertebrate Paleontology 300,000 Lot • Invertebrate Zoology 300,000 Lot • Mineralogy 35,000 Individual • Paleobotany 150,000 Individual • Scientific Instruments 2,000 Individual • Vertebrate Paleontology 125,000 Individual • Vertebrate Zoology 185,000 Lot / Individual 2.7 million database-able units => ~11 million items
Peabody CollectionsFunctional Units Databased • Anthropology 325,000 90 % • Botany 350,000 1 % • Entomology 1,000,000 3 % • Invertebrate Paleontology 300,000 60 % • Invertebrate Zoology 300,000 25 % • Mineralogy 35,000 85 % • Paleobotany 150,000 60 % • Scientific Instruments 2,000 100 % • Vertebrate Paleontology 125,000 60 % • Vertebrate Zoology 185,000 95 % 990,000 of 2.7 million => 37 % overall
The four YPM buildings Peabody (YPM) Environmental Science Center (ESC) Geology / Geophysics (KGL) 175 Whitney (Anthropology)
VZ Kristof Zyskowski (Vert. Zool. - ESC) Greg Watkins-Colwell (Vert. Zool. - ESC)
HSI Shae Trewin (Scientific Instruments – KGL )
VP Mary Ann Turner (Vert. Paleo. – KGL / YPM)
ANT Maureen DaRos (Anthro. - YPM / 175 Whitney)
% Databased vs. Collection Size (in 1000s of items) Botany Entomology Invertebrate Paleontology Invertebrate Zoology
Peabody CollectionsApproximate Digital Timeline • 1991 Systems Office created & staffed
Peabody CollectionsApproximate Digital Timeline • 1991 Systems Office created & staffed • 1992 Argus collections databasing initiative started
Peabody CollectionsApproximate Digital Timeline • 1991 Systems Office created & staffed • 1992 Argus collections databasing initiative started • 1994 Gopher services launched for collections data
Peabody CollectionsApproximate Digital Timeline • 1991 Systems Office created & staffed • 1992 Argus collections databasing initiative started • 1994 Gopher services launched for collections data • 1997 Gopher mothballed, Web / HTTP services launched
Peabody CollectionsApproximate Digital Timeline • 1991 Systems Office created & staffed • 1992 Argus collections databasing initiative started • 1994 Gopher services launched for collections data • 1997 Gopher mothballed, Web / HTTP services launched • 1998 Physical move of many collections “begins” • 2002 Physical move of many collections “ends”
Peabody CollectionsApproximate Digital Timeline • 1991 Systems Office created & staffed • 1992 Argus collections databasing initiative started • 1994 Gopher services launched for collections data • 1997 Gopher mothballed, Web / HTTP services launched • 1998 Physical move of many collections “begins” • 2002 Physical move of many collections “ends” • 2003 Search for Argus successor commences • 2003 Informatics Office created & staffed
Peabody CollectionsApproximate Digital Timeline • 1991 Systems Office created & staffed • 1992 Argus collections databasing initiative started • 1994 Gopher services launched for collections data • 1997 Gopher mothballed, Web / HTTP services launched • 1998 Physical move of many collections “begins” • 2002 Physical move of many collections “ends” • 2003 Search for Argus successor commences • 2003 Informatics Office created & staffed • 2004 KE EMu to succeed Argus, data migration begins • 2005 Argus data migration ends, go-live in KE EMu
Big events EMu migration in '05 (all disciplines went live simultaneously) Physical move in ‘98-'02 (primarily neontological disciplines)
What do you do … … when your EMu is out of shape & sluggish ?
What do you do … … when your EMu is out of shape & sluggish ?
The Peabody Museum Presents What clued us in that we should put our EMu on a diet ?
Area of Server Occupied by Catalogue 980 megabytes in Argus 10,400 megabytes in EMu
Area of Server Occupied by Catalogue 980 megabytes in Argus 10,400 megabytes in EMu ?
Default EMu “cron” maintenance job schedule Mo Tu We Th Fr Sa Su late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact
Default EMu “cron” maintenance job schedule Mo Tu We Th Fr Sa Su late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact
Default EMu “cron” maintenance job schedule Mo Tu We Th Fr Sa Su late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact
Default EMu “cron” maintenance job schedule Mo Tu We Th Fr Sa Su late night workday evening = emulutsrebuild = emumaintenance batch = emumaintenance compact
Three Fabulously Easy Steps ! • 1. The Legacy Data Burnoff • ( best quick loss plan ever ! )
Three Fabulously Easy Steps ! • 1. The Legacy Data Burnoff • ( best quick loss plan ever ! ) • 2. The Darwin Core Binge & Purge • ( eat the big enchilada and still end up thin ! )
Three Fabulously Easy Steps ! • 1. The Legacy Data Burnoff • ( best quick loss plan ever ! ) • 2. The Darwin Core Binge & Purge • ( eat the big enchilada and still end up thin ! ) • 3. The Validation Code SlimDing • ( your Texpress metabolism is your friend ! )
1. The Legacy Data Burnoff Anatomy of the ecatalogue database File NameFunction ~/emu/data/ecatalogue/data the actual data ~/emu/data/ecatalogue/rec indexing (part) ~/emu/data/ecatalogue/seg indexing (part) 980 mB 10,400 mB The combined size of these was 10.4 gb -- 4 gb in data and 3 gb in each of rec and seg
The ecatalogue database was a rate limiter typical EMu data directory 23 files, 2 subdirs
Closer Assessment of Legacy Data In 2005, we had initially adopted many of the existing formats for data elements from the USNM’s EMu client, to allow for rapid development of the Peabody’s modules by KE prior to migration -- Legacy Data fields were among them
Closer Assessment of Legacy Data In 2005, we had initially adopted many of the existing formats for data elements from the USNM’s EMu client, to allow for rapid development of the Peabody’s modules by KE prior to migration -- Legacy Data fields were among them
sites – round 2 constant data lengthy prefixes
sites – round 2 data of temporary use in migration