550 likes | 772 Views
Switching to the fast track: Rapid digitization of the world's largest herbarium. TDWG 2011- New Orleans Simon Chagnoux, Henri Michiels. The French Museum. An old institution. Founded in 1635 (at that time the Royal garden of medicinal plants)
E N D
Switching to the fast track: Rapid digitization of the world's largest herbarium TDWG 2011- New OrleansSimon Chagnoux, Henri Michiels
An old institution • Founded in 1635 (at that time the Royal garden of medicinal plants) • In 1793, the French revolution turns the garden into the national Museum • Now: 15 locations in France, 2000 people TDWG - Orleans
Renovating the Herbarium An opportunity to digitize the entire collection
The Paris Herbarium TDWG - Orleans
The Renovation Project (1) • Two main drivers to this project : • the herbarium, designed for 6 million specimens, was packed with 10 million sheets and fitted with old storage • raising the storage density required to reinforce the floors TDWG - Orleans
The Renovation Project (2) • The only way of doing this was to move away the entire collection and to put it back in the renovated place after works • An opportunity for • New sorting, from geographic to phylogenetic (APG3) • Reconditioning • Digitizing TDWG - Orleans
Renovation Calendar 2006 – Start of the project 2009 – Start of the works 2010 (June) – Start of digitization 2011 (Nov) – Opening of the first rearranged spaces to researchers 2012 – End of the project TDWG - Orleans
Budget • Overall project cost: 24,5 Million € • Building renovation 12 000 000 • Movers 900 000 • Attaching specimens 3 200 000 • Reconditioning, digitization and sorting 6 700 000 • Supplies 1 600 000 • Storage 100 000 TDWG - Orleans
The renovation cycle Floor by floor renovation Herbarium DigitizationReconditioningSorting Industrial Partner Warehouse TDWG - Orleans
Before .... TDWG - Orleans
... And after TDWG - Orleans
Why digitize ? • Because all the parts have to bemanipulated in the course of the project • Digitizationgives us: • a virtual copy of specimens • the possibility to share and studyspecimenswithouttouchingthem • More than an electronic copy of the collection catalog, we’ll have a collaborative tool for managingscientificknowledgeinside, as well as outside the institution TDWG - Orleans
2D Digitization is cheap • the cost of digitization is marginal compared to the full project • full specimen processing (moving, sorting, reconditionning, new furniture) • digitization and name processing • digitization is appealing to funding $1,5 $0,1 TDWG - Orleans
A new paradigm • For 15 years we have been entering all information of some specimens, • 1 million entries in the database (rich information) • One fifth (200 000 images) was photographed • Since summer 2010, we use a massive approach where digitization precedes data entry • 2 million records digitized in one year • limited information in the database (name and geographic area) • The scientific information can be added without manipulating the specimens themselves TDWG - Orleans
The workflow Digitizing, reconditionning and sorting
An industrial process (1) • We chose a contractor with an industrial know-how • A dedicated place had to be set-up and equipped by the contractor • Two teams of 20 workers in two shifts working from 6am to 9pm • The process had to align on the schedule of the renovation works, floor by floor TDWG - Orleans
An industrial process (2) • Planned production rate: 17 000 sheets per day over 24 months ca. 15 seconds / sheet • At this rate, a variation of ± 1 second per specimen has an impact of ± 300 k€ over the project cost TDWG - Orleans
The Bussy-St-Georges site TDWG - Orleans
Workflow overview TDWG - Orleans
How to alleviate data entry • We take advantage of the physical ordering of specimens • We provide a name list to the contractor (APG 3 classification) • The contractor enriches the list with the information generated during the process and provides us with a table containing consolidated information (image number, barcode numbers, classification,…) TDWG - Orleans
1 – Delivery (1) A carting company transports the specimens to the facility where they arrive in clearly labeled boxes. Boxes receive a tracking barcode TDWG - Orleans
1 – Delivery (2) • The Museum provides two files: • a “logistics” file • number of boxes • family name and number • genus name and number • geographic area • a “taxonomy” file • List of available taxon names with family, genus, species, authors, ID (taxon number) TDWG - Orleans
1 – Delivery (3) • This information is digested by the contractor’s Information System and used along the industrial process (labeling, sorting, quality assurance) TDWG - Orleans
2 – Folder processing For each folder, the operator : • replaces the jacket (color according to region) • reads the species name and types the first letters on its computer • selects the name in a list • prints a label with barcode and identification information, and sticks it on the folder TDWG - Orleans
3 – Specimen Digitization (1) • Datamatrix and barcode are stuck on each sheet • Datamatrix: for tracking purposes • Barcode: specific to Muséum and to int’l herbarium standard • The specimens are placed three by three on a tray TDWG - Orleans
3 - Specimen Digitization (2) • The tray is placed on a conveyor belt • The sheet is scanned • The scan is checked (framing and focus) • At the end of the chain, the barcode is read to check if all specimens are back in the folder TDWG - Orleans
The Digitization Bench TDWG - Orleans
4 - Reconditioning • After scanning, each sheet is inserted in a sulfurized paper liner • The barcode of each specimen is read, allowing the system to check if all specimens are back in the right folder • The folders are stored in a “cut box” before sorting TDWG - Orleans
5 - Sorting 1 (by genus) • This sorting consists in storing specimens by family and genus names • The operator puts the jackets in boxes and places them on shelves according to the family and genus numbers (the shelves are labelled in advance by the contractor) TDWG - Orleans
6 - Sorting 2 (by species) • The operator takes a box, reads the barcode on each jacket • The system displays the species name and assigns a number which is printed on a label • The label is sticked on the folder, which is then stored on the shelf with the same number TDWG - Orleans
7 – Packing, transport and final storage • The folders are put in boxes and sent to the Museum • The contractor stores the folders in the Museum’s herbarium TDWG - Orleans
60 000 images produced each week How to ensure quality in mass digitization? • Checking: • Focus • Data quality • Barcode number • Barcode location 1% of the production checked (ca. 600 images) 4 1 Samples are distributed among botanical staff 2 3
Production of images • The conveyor belt passes the specimens under a bidirectional scanner which produces 11x17” (A3), 300 dpi, 5000 x 3300 pixel images • TIFF files are saved offline (one production day per disk of 1 TB) • JPEG’s are made for online use TDWG - Orleans
Scanning resolution and image size • One TIFF image is 50 MB • One JPEG is 5 MB. This compression rate was chosen to have the same level of details as with TIFF (only colour is slightly changed) • This choice is a technico-economic trade-off • For 10 million images: • TIFF represents 500 TB • JPEG represents 50 TB • Data represents <100 GB TDWG - Orleans
Why do we keep TIFF ? • Partners seek lossless data (Reflora, Mellon) • Standard for physical publishing • Native scan output, which can be used for any future use or transformation TDWG - Orleans
Handling TIFF data • We cannot afford « live » storage of 500 TB • … and even 1 Po with redundancy ! $$$ • With a lot of energy consumption and heat dissipation for rarely accessed images • We are planning to start using tape storage next year, with HSM software • For the time being, USB disks are stored in the collection warehouse TDWG - Orleans
Exception for the types • The types are not part of this industrial process • They are manually digitized on-premises at 600 dpi (200 MB in compressed TIFF) • This process was initiated by the Mellon foundation in 2004 • We now have about 100 000 type images TDWG - Orleans
What we’ve achievedand learned … … after 12 months of collaboration between scientists and industrials (over an anticipated duration of 24 months)
Achievements • 2,1 million specimens processed between June 2010 and August 2011 • Images and data are of good quality • The new premises comply with today’s standards (space, safety, light, air-conditioning, …) TDWG - Orleans
Fast but ... not fast enough TDWG - Orleans
Reasons for being behind schedule • Logisticians have under-estimated the sorting work • Only two digitization chains are operational, instead of three (due to lack of staff) TDWG - Orleans
Software and quality assurance • There is more software needed for ensuring tracability and detecting failures than for data acquisition. • Fast web publication of images allows a broader audience to perform quality control. • Continuous control is mandatory TDWG - Orleans
People • Working under constant time pressure during two years is really difficult in an academic context • The contractor must be considered as a service provider and not just the team next-door (not obvious in an academic context) TDWG - Orleans
ROI speed robustness quality exhaustivity specifity Working with a contractor • Culture clash • Many parameters were not known at the beginning of the project (processes, numbers, ...) • Quality control is a key point to make sure that scientific excellence governs the industrial throughput (to be defined upfront) • Write everything and always refer to the contract TDWG - Orleans
Digitizing other objects • Digitizing herbarium is « easy »: • same dimensions for all objects • Easy manipulation and scanning • The plant itself is not touched – only the paper • Digitizing 3D objects is a lot more complex and generally requires to manipulate the specimen itself TDWG - Orleans
Is it over ? Digitization is just a very first step…
Virtual herbarium • The amount of information available on-line will lower the number of physical visits to the Herbarium • … but visitors leave post-it note on the sheets How to replace this ? • Annotation systems • « virtual visit » website TDWG - Orleans
AFM FABACEAE Abrus aureus R. Vig. Spot the differences … ? TDWG - Orleans