10 likes | 135 Views
Data Storage. The USA-NPN Umbrella. Project BudBurst. USA-NPN Website. Plant Phenology Network. Plant Phenology Program. USA-NPN Website. Other Participating Networks & Programs. Project BudBurst. No Data Storage. Data Storage. Metadata Storage. Data Storage. Web Content.
E N D
DataStorage The USA-NPN Umbrella ProjectBudBurst USA-NPNWebsite PlantPhenologyNetwork PlantPhenologyProgram USA-NPNWebsite Other ParticipatingNetworks & Programs ProjectBudBurst NoDataStorage DataStorage MetadataStorage DataStorage Web Content Web Content DataStorage Who are we?Protocols & speciesHow to contribute Pointers toothers’ data “Our” data Target: Education, outreach, and more casual observers Target: Long term observers of many types Development of Cyberinfrastructure to Support the USA National Phenology Network Bruce E. Wilson1, Homer Hruby2, Kirsten Meymaris3, Wolfgang Grunberg4, Benjamin Crom1 1Environmental Sciences Division, Oak Ridge National Laboratory*, Oak Ridge, Tennessee 37831 2University of Wisconsin-Milwaukee, Milwaukee, WI 53201 3University Corporation for Atmospheric Research P.O. Box 3000 Boulder, CO 80307 4Office of Arid Lands Studies, University of Arizona, 1955 E. Sixth St., Tucson, AZ 85719 Portions of this work were sponsored by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory (ORNL), managed by UT-Battelle, LLC for the U. S. Department of Energy under Contract No. DE-AC05-00OR22725 Portions of this work were funded by the US Geological Survey, the National Science Foundation, the University of Wisconsin-Milwaukee, and by the University of Arizona Poster #B51A-0065 Abstract: The USA National Phenology Network (USA-NPN) is an emerging and exciting partnership among academic communities, federal agencies, and volunteers. The USA-NPN consists of four components, representing different levels of spatial coverage and quality/quantity of phenological and related environmental information: 1) Locally intensive sites focused on process studies; 2) Spatially extensive scientific networks focused on large-scale phenomena; 3) Voluntary and Educational Networks; and 4) remotely sensed products that can be validated against ground measurements and assimilated to extend surface phenological observations to the continental-scale. A critical challenge for the USA-NPN is the development of a robust and cost-effective cyberinfrastructure which supports the distributed collection and management of phenological data across these differing components. This poster presents the framework for the Phase I cyberinfrastructure for USA-NPN, designed to support the most critical needs across these tiers. Phase I is based on a range of technologies leveraged from several efforts, including the Plant Phenology Network, the National Biological Information Infrastructure Metadata Clearinghouse, Project BudBurst, and the ORNL Distributed Active Archive Center. Technical Obstacles Many data systems have a limited ability to handle variations in data structure from different research projects, particularly historical data. The ability to handle variations in data is the key to building a common cyberinfrastructure. We seek to overcome these limitations by combining the standard data warehouse and data collection models into a hybrid model. Typical Ecology Data Systems Data Warehouse Data Collection OR Single data model Multiple data models Data in database Data often in file system System searches metadata Goal: Hybrid Model Multiple (related) Data Models Data in Database Currently insufficient tools exist for managing subsets of observational data, specifically the ability to reproduce, share, and cite data. The data model (at right) is a first step towards addressing this need. Tools The new USA-NPN web site is being developed in an iterative fashion, using a combination of Drupal for content management and PHP scripts for database interaction. Key improvements over the Plant Phenology Program site include automation of user registration, database storage of results, and collection of configuration information drawn from the database itself. This database also stores observational results using the simple but flexible schema at right. The USA-NPN will continue to evolve this design using the principles of agile database development. To address the citation need, we expect to take advantage of LSID’s (Life Science Identifiers), which are a general-purpose tool to enable researchers to reference datasets and/or specific portions of datasets. • Data Model: • Key data model features include: • Handling both a series of observations (extent of phenophase) as well as discreet dates (phenophase date). • Handling data from multiple participating networks and programs • Differing protocols and phenophase definitions for different datasets, networks, and programs Project Partners & Other Contributors: Mark Schwartz, Diane Pfister-Drews, John Mills, Jeremy Streich: University of Wisconsin-Milwaukee (Department of Geography and College of Arts and Sciences) Jake Weltzin, Mark Losleben: USA-NPN National Coordinating Office Manish Wadhwa: Office of Arid Land Studies, University of Arizona Sandra Henderson, Dennis Ward: University Corporation for Atmospheric Research Mike Frame: US Geologic Survey Giri Palanisamy, Ranjeet Devarokonda, Dave Sill: Oak Ridge National Laboratory • Data Use Policy (DRAFT) Issues: • Freedom of information: To the extent possible, we will make USA-NPN data available with as few restrictions as possible. • Data Use Policy Agreement: Users will need to agree to a data use policy prior to downloading data, to encourage contact with network researchers concerning co-authorship and citations (similar to the Ameriflux policy). However, to encourage broad dissemination and public understanding of phenology data, anonymous users will be able to view data using GIS tools. • The USA-NPN data (as opposed to participating network data) will be available for open use, with a citation requested and under a creative commons license. • Need to protect observer privacy: Observer identity information will not be released. Observation locations will be rounded to 0.1 degrees (but will include elevation information). • Networks with higher protection needs (e.g. threatened and endangered species) will be able to use the USA-NPN software stack, provide their own data store, and maintain a more restrictive data access policy. • All developed software will be made available under an open source license. • We expect to support multiple authentication authorities, including a local identity, OpenID, and the NCEAS/LTER LDAP authority. • Current Reality and Future Target: • The current cyberinfrastructure can be modeled as below, with no data sharing and some coordination between the efforts: • Where we want to get to can be modeled as below, recognizing the differing needs of the component programs: • Storage and metadata will be distributed between USA-NPN and the participating programs and networks. • Technology Stack: • Linux, Apache, MySQL, and PHP for tool development. • Drupal for content management • Mercury (http://mercury.ornl.gov) for metadata creation, harvesting, indexing, and searching. • Server virtualization using VMWare Infrastructure (but very interested in Xen experience from potential collaborators). • SubVersion and Trac for revision control and lightweight project management. • Conclusions and Plans Forward: • Expect to have base functionality working in early 2008 through usanpn.org • Additional functionality, particularly for reporting, analysis, and contribution of historical datasets will evolve over the course of the year. • Work is currently limited by a need for base funding, but a lot has been accomplished with limited resources. • There is a need for improved coordination and sharing between USA-NPN and other related efforts. Investigators with relevant tools and interests are encouraged to contact one of the authors. For more information: Contact Bruce Wilson (wilsonbe@ornl.gov) or Jake Weltzin (jweltzin@usgs.gov). USA-NPN website: http://www.usanpn.org Presented at the American Geophysical Union Fall 2007 Meeting, December 10-14, 2007 *ORNL is managed by the University of Tennessee-Battelle LLC under contract DE-AC05-00OR22725 with the U.S. Department of Energy.