370 likes | 506 Views
2010 Collaborative NSF Grant. Collaborative grant between four Pacific Northwest herbaria: University of Washington Herbarium (PI: Dick Olmstead, Co-PI’s: David Giblin and Joe Ammirati ) Oregon State University Herbarium (Co-PI: Aaron Liston)
E N D
2010 Collaborative NSF Grant • Collaborative grant between four Pacific Northwest herbaria: • University of Washington Herbarium (PI: Dick Olmstead, Co-PI’s: David Giblin and Joe Ammirati) • Oregon State University Herbarium (Co-PI: Aaron Liston) • University of Idaho, Stillinger Herbarium (Co-PI: Dave Tank) • Montana State University Herbarium (Co-PI: Matt Lavin) • Submitted to NSF in July, 2009; awarded this spring; funds arrived in June. • Combined grant funds: $1,340,879 • Grant covers imaging and digitization of herbarium • specimens from multiple PNW herbaria.
What we proposed to do Expand the taxonomic breadth of the portal by databasing 200,000 PNW non-vascular plant, fungal, and lichen specimens at OSC and WTU. Image, database and provide online access to 185,000 vascular plant specimens from the region’s remaining large herbaria (ID, MONT). Image, database, and provide online access to the PNW vascular plant specimens from small herbaria in Idaho, Oregon, and Washington. Create connections to the Portal for other PNW herbaria with existing specimen databases (WS, UBC, MONTU, SOU, SRP, CIC). Also, improve existing connections (WTU, OSC, ALA). Provide Portal data to GBIF, USVH, and other data aggregators for those collections that lack their own data access points. Develop portal-based web applications and expand the PNW Herbaria web site.
Grant Budget Collaborative grant between four institutions in Washington (WTU), Oregon (OSC), Idaho (ID), and Montana (MONT). Each institution receives a portion of the funds. WTU Funds ($) Informatics Specialist: 184,202 Databasing coordinator: 41,357 Imaging/databasing: 162,512 Equipment/Supplies: 17,815 Travel: 6,000 2010 PNW Herbaria meeting: 3,250 Indirect costs: 227,475
Grant Budget Collaborative grant between four institutions in Washington (WTU), Oregon (OSC), Idaho (ID), and Montana (MONT). Each institution receives a portion of the funds. OSC Funds ($) Databasing coordinators: 68,926 Imaging/databasing: 140,000 Equipment/Supplies: 9,969 Travel: 3,500 Indirect costs: 102,746
Grant Budget Collaborative grant between four institutions in Washington (WTU), Oregon (OSC), Idaho (ID), and Montana (MONT). Each institution receives a portion of the funds. ID Funds ($) Databasing coordinator: 39,676 Imaging/databasing: 158,050 Equipment/Supplies: 9,921 Travel: 2,500 Indirect costs: 91,204
Grant Budget Collaborative grant between four institutions in Washington (WTU), Oregon (OSC), Idaho (ID), and Montana (MONT). Each institution receives a portion of the funds. MONT Funds ($) Coordinator & imaging: 44,000 Equipment/Supplies: 5,869 Travel: 500 Indirect costs: 21,407
Organization and Administration Each institution has a high degree of autonomy in how their funds are used. WTU (lead institution) Dick Olmstead (PI) WTU: David Giblin (Co-PI) Joe Ammirati (Co-PI) OSC: Aaron Liston (Co-PI) ID: Dave Tank (Co-PI) MONT: Matt Lavin (Co-PI) Project Facilitator Project Facilitator Project Facilitator Project Facilitator Imaging/Databasing Personnel Imaging/Databasing Personnel Imaging Personnel Imaging/Databasing Personnel
Organization and Administration Ben Legler (IS, WTU) WTU (lead institution) Dick Olmstead (PI) WTU: David Giblin (Co-PI) Joe Ammirati (Co-PI) OSC: Aaron Liston (Co-PI) ID: Dave Tank (Co-PI) MONT: Matt Lavin (Co-PI) Project Facilitator Project Facilitator Project Facilitator Project Facilitator Imaging/Databasing Personnel Imaging/Databasing Personnel Imaging Personnel Imaging/Databasing Personnel
Organization and Administration My duties as Informatics Specialist: Develop, configure, and deploy the specimen imaging equipment. Develop the databasing software to capture label data from imaged specimens directly into the Portal server. Provide training to the Project Facilitators in each state so they can assume responsibility for imaging and databasing within their state. Work with each institution as needed to assist with imaging and databasing challenges. Assist herbaria in setting up data access points to connect their data to the PNW Portal, GBIF, USVH, and other data aggregators. (continued…)
Organization and Administration My duties (continued): Connect the collections hosted on the Portal Server to GBIF, USVH, and other data aggregators Enhance the PNW Herbaria web site with new features and web applications. Manage the PNW Herbaria web site, databases, and web server.
Organization and Administration Project Facilitator Duties: Learn to use the imaging equipment, and how to deal with problems that may arise. Transport the imaging equipment between collections. At each institution, set up the equipment and train the imaging personnel. Oversee specimen databasing, including training and managing data entry personnel. Image and data quality control.
Organization and Administration Imaging & Databasing Personnel: Will consist of hourly, work study, and volunteers. Most will be stationed at WTU, OSC, and ID. The small herbaria being imaged will recruit their own personnel to do imaging; however, funds from the grant are available to pay these personnel if needed.
The Details of Our “To-Do” List An in-depth look at what we proposed in our grant
The Details of Our “To-Do” List Expand the taxonomic breadth of the portal by databasing 200,000 PNW non-vascular plant, fungal, and lichen specimens at OSC and WTU. Image, database and provide online access to 185,000 vascular plant specimens from the region’s remaining large herbaria (ID, MONT). Image, database, and provide online access to the PNW vascular plant specimens from small herbaria in Idaho, Oregon, and Washington. Create connections to the Portal for other PNW herbaria with existing specimen databases (WS, UBC, MONTU, SOU, SRP, CIC). Also, improve existing connections (WTU, OSC, ALA). Provide Portal data to GBIF, USVH, and other data aggregators for those collections that lack their own data access points. Develop portal-based web applications and expand the PNW Herbaria web site.
1. Non-Vasculars Database non-vascular plants, fungi, and lichens at OSC and WTU. • Approximately 100,000 specimens will be databased at each. • Each will use their existing databases for this component of the project. OSC: Specify 6, WTU: FileMaker. • Timeline: start now, anticipated completion date of summer 2011. • TO-DO: Create data access points to allow the Portal to harvest these data.
2. ID and MONT Image and database the collections at ID and MONT. • About 120,000 specimens will be imaged and databased from ID. • ID has an existing FileMaker database that will be used. • We need to integrate the image capture process with this database. • Timeline: begin imaging August 2010; can potentially finish summer 2011. • About 65,000 specimens will be imaged and databased from MONT. • Only image capture will occur at MONT. Images will then be transferred to the Portal server. • Databasing will be done from the images at WTU, using the same database software developed for the smaller herbaria. • Will begin imaging in August, 2010.
3. Smaller Herbaria Image and database smaller herbaria in Idaho, Oregon, and Washington. • 4-5 smaller herbaria will be selected from each of these states. • For each herbarium, the entire PNW vascular plant collection will be imaged and databased. • Images will be transferred to the Portal server for processing and storage. • Databasing will occur from the images using personnel stationed at WTU, OSC, and ID. • More about the imaging and databasing process in a moment…
3. Smaller Herbaria Image and database smaller herbaria in Idaho, Oregon, and Washington. Washington: Western WA University: 26,000 Whitman College: 17,000 Central WA University: 25,000 Eastern WA University: 7,000 Pacific Lutheran University: 5,000+ TOTAL: 78,000? Oregon: Reed College: 10,000 Portland State University: 11,000 Linfield College: 2,000 Southern Oregon University: 14,000 TOTAL: 37,000 Idaho: Lewis & Clark State College: 10,000 Northern Idaho College: 10,000 Forest Service Herbaria: 5,000 TOTAL: 25,000 Montana: (none) GRAND TOTAL: 142,000+
3. Smaller Herbaria Benefits of digitizing these herbaria: • Small herbaria house valuable specimens not present in the larger herbaria. But they are often not examined by specialists. • Limitations in staff, budgets, and computer infrastructure make it difficult for these herbaria to manage a specimen database or provide online access. • It is often challenging to justify the existence of these collections to university administrators. Can we increase their utility and prominence? There is a recognition among the general herbarium community that larger herbaria and institutions should assist these smaller herbaria. … this is what we’ve proposed to do.
4. Portal Connections Create connections to the Portal for PNW herbaria with existing specimen databases. • The following herbaria have already databased (or are databasing) their PNW vascular plant collections. We need to connect their data to the Portal: • Washington State University (165,000) • University of British Columbia (440,000) • Southern Oregon University (14,000?) • University of Montana (90,000?) • Boise State University (30,000?) • Albertson College of Idaho (35,000?) • The connections with our existing providers can be improved: • University of Alaska, Fairbanks (210,000) • Oregon State University (194,000) • University of Washington (198,000)
4. Portal Connections Improving Portal connections: • Currently using DiGIR and Darwin Core data schemas. • Should switch to TAPIR or IPT, and use current data schemas. • I will investigate ways to bypass these options to create custom connections allowing faster data transfer and richer data structures.
5. Provide data to GBIF & Others Provide Portal data to GBIF, USVH, and other data aggregators for those collections that lack their own data access points.
6. Portal Applications Develop portal-based web apps and expand the Portal web site. • Public interface: • Search interface improvements: browse taxonomy, image viewer, search by polygon, search by shapefile, search by a list of values, return results as a checklist, etc. • Specimen-based, synonymized regional checklists for each organismal group. • Atlas pages with dot maps for each taxon for the region, including print-quality maps. • Specimen-based, dynamically generated, county level checklists for the region. • Dot map of the entire region showing all collection sites, with color coding. • A version of the search pages targeted for mobile devices. • Static datasets that can be downloaded and copied to mobile devices for field use. Any more ideas? • Back-end & administrative: • Improve Portal’s data harvesting processes. • Use the Portal to host ALL Pacific Northwest herbaria to GBIF? • Add data quality controls such as synonymy checks, flagging records with inconsistencies, and reporting data problems back to the originating herbarium. • Create mechanism for automated dispatch of loan requests to participating herbaria. • Improve data usage tracking, and mechanisms to report statistics back to herbaria. • Create a GIS Web Service providing access to georeferenced specimen data.
Imaging & Databasing Overview of the imaging & databasing workflow Portal Server: Data Entry: Imaging Workstation: Transfer RAW images and metadata on a portable hard drive Dropbox Data entry form displays a blank record for an image Populate database from images & metadata Image processing scripts RIA using AXAJ Adjacent image viewer shows label data and annotations RAW JPEG Tiles
Imaging and Databasing Imaging equipment. • OrteryLightbox • Canon EOS 5D Mark II • 16-32 GB compact flash • AC adaptor for camera • 50 mm macro lens • Custom camera mount • Custom specimen holder • 6 inch ruler • Laptop computer • USB cable • Canon EOS Utility • Image metadata form
Imaging and Databasing Imaging equipment. Custom mount on top of box to hold camera
Imaging and Databasing Imaging equipment.
Imaging and Databasing Images are stored in several formats: • Digital Negative (.DNG): This is a publicly documented RAW format developed by Adobe as an alternative to the numerous proprietary RAW formats from each camera manufacturer. 22 MB per image. • JPEG: Conversion from RAW formats to TIFF or JPEG is a hassle, so we will store high-quality JPEG copies for immediate access. 7 MB per image. • Tiled images: These are used by the online specimen image viewer. They function in the same way as map tiles in Google Maps. 3.5 MB per image. Storage requirements for 327,000 specimen images: (for comparison, an 8-bit TIFF with LZW compression is larger than the DNG, JPEG, and tiles combined) (we’ll have 13.5 Terabytes available, and can purchase more if needed)
Imaging and Databasing Canon: Photoshop: dcraw:
Imaging and Databasing Canon: Photoshop: dcraw:
Imaging and Databasing Specimen image viewer:
Imaging & Databasing Overview of the imaging & databasing workflow Portal Server: Data Entry: Imaging Workstation: Transfer RAW images and metadata on a portable hard drive Dropbox Data entry form displays a blank record for an image Populate database from images & metadata Image processing scripts RIA using AXAJ Adjacent image viewer shows label data and annotations RAW JPEG Tiles
Imaging and Databasing A web-accessible data entry interface using specimen images. • A set of databases hosted by the Portal server. Each herbarium will have its own separate database. • The data entry interface will be accessed through a web browser (or similar stand-alone client app), with access restricted to authorized personnel. • Data entry will be performed by personnel at WTU, OSC, and ID. • Following the grant period, managers and curators at these collections can use these databases as their primary database if they choose, or migrate the data into their own in-house database. • This design eliminates the need for smaller herbaria to manage their own in-house databases.
Imaging and Databasing Data entry personnel simply click a button to pull up a blank record, database from the image, and repeat. I may add OCR-assist to the data entry interface.
End Results What will we have accomplished by the end of the grant? • Imaged anddatabased ca. 327,000 vascular plant specimens from at least 14 regional herbaria, with these available online. • Databased ca. 200,000 non-vascular plant, lichen, and fungi specimens. • The PNW Herbaria Portal will host nearly 2,000,000 specimens. • Developed a means of efficiently digitizing specimens from smaller herbaria using staff and resources at the larger herbaria. This can serve as a model for use in other regions. • Expanded the features of the Portal web site, and created new ways of accessing and using the data.
End Results What is happening first? • Database non-vascular plants, lichens, and fungi at WTU and OSC. • Finish configuring the imaging equipment, train facilitators, and deploy to: • Western Washington University • University of Idaho • University of Montana • Reed College (Oregon) • Develop and deploy the portal-based data entry system by early fall. Starting in the fall, I will begin connecting additional collections to the Portal; help improve existing connections; and begin developing web site features.
6. Portal Applications Develop portal-based web apps and expand the Portal web site. • Public interface: • Search interface improvements: image viewer, search by polygon, search by shapefile, search by a list of values, return results as a checklist, etc. • Specimen-based, synonymized regional checklists for each organismal group. • Atlas pages with dot maps for each taxon for the region, including print-quality maps. • Specimen-based, dynamically generated, county level checklists for the region. • Dot map of the entire region showing all collection sites, with color coding. • A version of the search pages targeted for mobile devices. • Static datasets that can be downloaded and copied to mobile devices for field use. Any more ideas? • Back-end & administrative: • Improve Portal’s data harvesting processes. • Use the Portal to host ALL Pacific Northwest herbaria to GBIF? • Add data quality controls such as synonymy checks, flagging records with inconsistencies, and reporting data problems back to the originating herbarium. • Create mechanism for automated dispatch of loan requests to participating herbaria. • Improve data usage tracking, and mechanisms to report statistics back to herbaria. • Create a GIS Web Service providing access to georeferenced specimen data.