230 likes | 526 Views
Herbarium Data & Visualization . Dynamic Information Visualization Tool Paul White Advisor: Dr. Dennis Groth. Outline. Background Goal Methods Demonstration Results Discussion. Informatics and Biological Information.
E N D
Herbarium Data & Visualization Dynamic Information Visualization Tool Paul White Advisor: Dr. Dennis Groth
Outline • Background • Goal • Methods • Demonstration • Results • Discussion
Informatics and Biological Information • Bioinformatics- A merging of biological databases using information technology in order to combine & and leverage information that lies within and thus result in new insights about the biological world. • Information technology useful for developing tools and techniques for storing, handling, and communicating data • Biological data reaching massive proportions from research: • NCBI (GenBank)-Genome and Protein Data • LandSat (NASA’s Mission to Planet Earth)-Ecosystem Data • Biological Collections-Specimen Data (Herbaria)
Herbarium Defined • Natural History and Biological Sciences • Biological Collections • Botanical Collections like an Herbarium • IUB campus Deam Herbarium • Is a collection of dried plant specimens • Research voucher that identifies and names the specimen • Mounted, accessioned, specimens that document the flora of Indiana’s geographical region • Can think of as a library of type specimens • Systematics and Taxonomy • Classification of living things
How and why do herbaria get used? • IU Herbarium houses the collections of Charles C. Deam on which the Flora of Indiana is based. • Some purposes of herbaria • Verification of type specimens • Support of biological research • Comparative & evolutionary analyses • Population & ecological analyses • Herbaria at interface of domains • Between organisms-ecological data • Within organism-genome data • To support a published flora • Resource for education at every level • K-16 students • Educators • Scientists • Government agencies
What is a flora? • A formal catalog of plants found in a region • Taxonomy • Keys • Controlled Vocabulary and Glossary • Distribution • Space (geospatial) • Time (historical) • Descriptions • Botanical • Ecological
Deam Herbarium Represented as a Digital Library • Need for digital access to the herbaria’s collection of plant specimens for biodiversity research purposes • Digital Library services need to support distributed users through the full range of the information lifecycle of collaborative knowledge work • Enhancing information access • Textural data that includes 3587 recognized Indiana vascular plants and their presence in 92 Indiana counties • Based on 140,000 actual plant occurrences consisting of core data (species name, name of collector, county, date) collected from many sources • Original locations are standardized to county name, analysis is provided as to ID reliability and location accuracy • Primary data providers are Deam Herbarium , Friesner Herbarium , BONAP, Missouri Botanical Garden (Kay Yatskievych) • Sharing data through well-established data standards. • GBIF Global Biodiversity Information Facility: To design, implement, co-ordinate, and promote the compilation world’s biodiversity data. Standardize the nomenclature in taxonomy. • Darwin Core vocabularies are applied consistently, so that participating databases can be interoperable.
Goal Searchable interface that gives a hierarchical taxonomic overview A geographical means of visually representing the data
Other collaborative source data Herbarium Data Perl script used to create conversions XPath to query the data Java applet for user interface Self described file format XML Incoming data from databases Process
Method: Storing the Data • Develop a universal method that provides flexibility • Have metadata. A way to develop a common language. New data elements- taxonomic data, location data, specimen data. Attributes for the elements appropriately based on Darwin Core vocabulary. • XML format adopted as a standard • On-the-fly: Conversion files from biological data using Perl scripts that are in self-describing file formats • Interoperability: Syntax can vary to suit the research scientists needs • The outcome is a series of XML document types then can be used in a modular and extensible manner in which the results of a distributed query can be returned
<family> <familyname>ACERACEAE</familyname> <species>Acer campestre</species> <species>Acer negundo</species> <species>Acer negundo var. negundo</species> <species>Acer negundo var. texanum</species> <species>Acer negundo var. violaceum</species> <species>Acer nigrum</species> <species>Acer platanoides</species> <species>Acer rubrum</species> <species>Acer rubrum var. drummondii</species> <species>Acer rubrum var. rubrum</species> <species>Acer rubrum var. trilobum</species> <species>Acer saccharum var. saccharum</species> <species>Acer saccharum var. schneckii</species> </family> <family> <familyname>ACORACEAE</familyname> <species>Acorus americanus</species> <species>Acorus calamus</species> </family> <family> <familyname>AGAVACEAE</familyname> <species>Manfreda virginica</species> <species>Yucca filamentosa</species> </family> SnapshotInput Output
Method: Handling the Data • Why pre-compute the various views of the static data? • Underlying information representation • Control user default set of outputs • This is helpful for the XPath functionality by shortening the wait time for the user • Query the data stored in the conversion files by using XPath to get a searchable interface. • XPath can be thought of as a query language like SQL. In this case, it extracts information from an XML document. • Not written in XML syntax but similar to a way a path as in a directory listing or URL. • For example, the following XPath expression: /doc/family/countyname[../species/name='"+item+"']
Method: Communicating the Data • The principle objective of any visual representation of the data is to convey a message. Most visual tools involve some sort of comparison. • compare an item to another item • compare data relationships • Provide an interactive visualization tool that: • Integrates geographical information with specimen data on vascular plants of Indiana • Develop a shape file for the state of Indiana that is represented by counties • Lawrence,47,221,417,221,450,219,453,218,456,218,462,175,462,175,417,221,417 • Use XPath in a Java program; to extract write an XPath expression indicating what information is wanted from an XML document and ask the XPath engine to fetch it. • /doc/family/countyname[../species/name='"+item+"'] • Java Applet for the creation of the Graphical interface • Fault tolerant after startup • Fast to run, slow to start
Results • Discovery: What do we do with what we have? • New information about relationships we had not seen before • Helps in analyzing patterns in plant distribution and species diversity. Plants found only in Northern or Southern regions. • Protecting genetic range and genetic diversity of species • Collaborative Knowledge Work • Information creation and dissemination • Access and presentation • Collaboration
Discussion Shows regions or areas which have the same characteristics Density vs. Distribution Fact Sheet Have a comprehensive database of Indiana plant occurrence data, online Scientists studying distributions must examine many dispersed, specialized data sets
Indiana Botanical Information System Distribution: Native plant of Indiana Lobelia cardinalisL. Not threatened Cardinal Flower Name is current Spec. Pl. (1753) 930 Description: Perennial herb to 1.5 m, with milky sap. Ovary inferior with 2 locules. Calyx fused to form hypanthium with 5 sepal lobes. Petals alternate with sepals. Corolla slit to base on back and deeply cleft, with lower lip of 3 petals, and 2 upper petals of equal length. Stamens fused with bristles on lower 2 anthers. Recorded in county Not known in county Source: Curator Eric Knox Direct inquiries and comments to the Deam Herbarium (herbarium@bio.indiana.edu).
References Processing XML with Java by Elliotte Rusty Harold XPath Essentials by Andrew Watt Perl Cookbook by Tom Christiansen, Nathan Torkington
Acknowledgements I wish to thank Dr. Sun Kim and Dr. Mehmet Dalkilic for their words of encouragement My advisor Dr. Dennis Groth for taking the time to comment on and provide insite into this project. Curator and professor Eric Knox for his enthusiasm and recent effort to launch the Deam Herbarium into the digital age.