250 likes | 652 Views
An example of data integration in a distributed network environment Barbara Stein Museum of Vertebrate Zoology Open-access Distributed Databases ...not such a new idea 1993 – FishGopher (gopher server) 1997 – Neodat II (data warehouse) 1998 – N.A. Bird Data Network (Z39.50)
E N D
An example of data integration in a distributed network environment Barbara Stein Museum of Vertebrate Zoology
Open-access Distributed Databases • ...not such a new idea • 1993 – FishGopher (gopher server) • 1997 – Neodat II (data warehouse) • 1998 – N.A. Bird Data Network (Z39.50) • 1998 – REMIB (TCP/IP secure sockets) • 2000 – FishNET (Z39.50 & XML) • All of these efforts were taxon-based
MaNIS Goals • Facilitate open access to combined specimen data from a web browser • Enhance the value of specimen collections • Conserve curatorial resources • Use a design paradigm that could be easily adopted by other disciplines
Institutional Considerations • Design of the network must benefit participants as well as the larger user community • Institutions must be able to retain their current database management systems • Institutions must be able to retain control over which of their data are accessible • Institutions must be able to document network use of their collections’ data
Design Considerations • Architecturemust be simple, low cost, and require minimal maintenance • No visible long-term support for the network or its participants • Known opposition within the community to centralization of operations • Uncertain availability of in-house technical expertise
DiGIR Distributed Generic Information Retrieval DiGIR is a software application (i.e., a protocol) that specifies how requests and responses issued across a network are formulated. MaNIS was the first functional implementation of DiGIR and became a driving force behind its development.
DiGIR Goals • A network protocol that would serve as a standard among natural history databases... • Avoid multiple incongruous development efforts • Pool resources; achieve economies of scale • Create a support community of experts • Solve scalability problems • Ensure easy adoption by any discipline with similar needs
Design Approach • Use open protocols and standards, such as HTTP and XML • Let user communities define the structure of their data without requiring changes to the networking protocol or presentation software • Make new data provider installations as easy as possible • Develop open source software with GNU (free General Public Licensing)
Standards are Paramount • The mammal community was ready... • MSW (Mammal Species of the World) • Documentation standards for data processing • DwC2 (Darwin Core Version 2) • Georeferencing Guidelines
Steps in Development of MaNIS • Collaborative georeferencing of locality data • Creating the network software • Connecting institutional databases to the network
Distribution of Origin of Mammal Specimens Africa: a large institution (FMNH) holds the majority of specimens, but holdings may be biased or incomplete Oceania: a smaller collection (BPBM) may be crucial to biogeographic investigations Mesoamerica: to neglect any one institution might be a serious omission
Steps in Development of MaNIS • Collaborative georeferencing of locality data • Creating the network software • Connecting institutional databases to the network
Key Features of the MaNIS Network • There is no central repository or server • Institutions retain control over public access to their data without changing their in-house dbms • Software is optimized for query performance • In-house dbms protected from traffic and intrusion • Each data provider automatically maintains summary data (i.e., counts of specimen records), in addition to specimen data from its institutional database
Institutional SQL Server Database Institutional Sybase Database Institutional 4D-Mac Database Institutional Oracle Database Institutional MS Access Database Online MaNIS Database Online MaNIS Database Online MaNIS Database Online MaNIS Database Online MaNIS Database DiGIR Provider DiGIR Provider DiGIR Provider DiGIR Provider DiGIR Provider MaNIS Network Diagram MaNIS DiGIR Web Portal MaNIS DiGIR Web Portal MaNIS DiGIR Web Portal UMNH-MaNIS Presentation Layer MVZ-MaNIS Presentation Layer UWBM-MaNIS Presentation Layer
Steps in Development of MaNIS • Collaborative georeferencing of locality data • Creating the network software • Connecting institutional databases to the network
Keys to Success • Shared goals • Standards • Collaboration • heightened sense of community • recognition of the enormous value of combined data, i.e., “the whole” • large and small collections are now recognized for their respective contributions to that whole • appreciation of what can be gained has replaced a sense of competition • Trust • all business public • all participants equal
Impact of the Network • Conservation — Aid resource managers and provide new tools for solving the biodiversity crisis • Research — Encourage development of new applications for ecological analysis and synthesis • Education — Make possible educational use of specimen data • Collections management — Increase use of collections while conserving curatorial resources
MaNIS Developers John Wieczorek (lead MaNIS programmer) PJ Schwartz (DiGIR portal) Dave Vieglais (DiGIR provider) Reed Beaman Stan Blum Renato Giovanni Collaborators Australia National Botanical Garden (ANBG) Berkeley Digital Library Project (DLP) Biological Collection Access Service for Europe (BioCASE) Committee on Data for Science and Technology (CODATA) Centro de Referência em Informação Ambiental (CRIA)Global Biodiversity Information Facility (GBIF) Taxonomic Databases Working Group (TDWG) University of Kansas Biodiversity Research Center (KUBRC)
Funded MaNIS Participants Bernice P. Bishop Museum California Academy of Sciences Colección Nacional de Mamíferos (Mexico)Field Museum Los Angeles County Museum of Natural History Louisiana State University Museum of Natural Science Michigan State University MuseumRoyal Ontario MuseumTexas Tech University Museum University of Alaska MuseumUniversity of California Museum of Vertebrate ZoologyUniversity of Kansas Natural History Museum University of Michigan Museum of ZoologyUniversity of New Mexico Museum of Southwestern BiologyUniversity of Puget Sound James R. Slater MuseumUniversity of Utah Museum of Natural HistoryUniversity of Washington Burke Museum Non-funded participants Comisión Nacional para el Conocimiento y Uso de laBiodiversidad (CONABIO)Sternberg Museum, Fort Hays State UniversityUniversity of Kansas Natural History Museum, Division of BirdsUniversity of Minnesota Bell Museum
Project Information • MaNIS is an international collaboration among mammal specimen collections (http://elib.cs.berkeley.edu/manis) • DiGIR is a collaborative open source development project on SourceForge (https://sourceforge.net/projects/digir) • Software and documentation are available on the DiGIR web site (http://digir.net)
Distributed Database Networks • Discipline-specific • FishNet • HerpNET • ITIS (The Integrated Taxonomic Info. System) • MaNIS (The Mammal Networked Info. System) • ORNIS (The Ornithological Info. System)
Distributed Database Networks • International • AVH (Australian Virtual Herbarium) • BioCASE (Biological Collection Access for Europe) • CONABIO (Comisión Nacional para el Conocimiento y Uso de la Biodiversidad) • ENHSIN (European Natural History Science Information Network) • GBIF (Global Biodiversity Information Facility) • REMIB (Red Mundial de Información Sobre Biodiversidad)
Georeferencing Reference Wieczorek, J., Q. Guo and R.J. Hijmans. In press. The point-radius method for georeferencing locality descriptions and calculating associated uncertainty. International Journal of Geographical Information Science.