210 likes | 336 Views
Building A Global Invasive Species Information Network with a TAPIR Protocol. Jim Graham, Annie Simpson, Michael Browne, Bob Morris, Tom Stohlgren, Greg Newman, …. Research vs. Production. Investigation. Design. Implementation. Testing. Maintenance. Time. Software Lifecycle.
E N D
Building A Global Invasive Species Information Network with a TAPIR Protocol Jim Graham, Annie Simpson, Michael Browne, Bob Morris, Tom Stohlgren, Greg Newman, …
Investigation Design Implementation Testing Maintenance Time Software Lifecycle
The Tire Swing What marketing suggested What management approved What was designed What was delivered What the customer needed Alan Chapman, http://www.businessballs.com/treeswing.htm
Questions to Answer • Who is the customer? • Invasive species data providers • Invasive species data consumers • Stake holders • What are we selling/giving them? • A network to allow the exchange of information on invasive species • What do we need to do to get them to want to buy/use it?
Technology Adoption Lifecycle Time Bohlen, Joe M. & George M. Beal (May 1957), "The Diffusion Process", Special Report No. 18 1: 56-77
Survey & Interview Highlights • At least 3 languages/frameworks important • 1 hour to “as long as it takes” for commitment • Minimal web service expertise • Various installation scenarios • DiGIR did not meet all needs • Complex queries not needed • Database problems
History • National Biological Information Infrastructure (NBII) • Global Invasive Species Information Network (GISIN) • NISBase: Brian Steves and Shawn Dalton • GIS standards (WMS) • Common web services • Invasive Alien Species Profile Schema - (IAS-PS)
Need: Toolkits in 3 languages Documentation Support Registry/Directory Portal Provider test bed Have: Existing: Protocols Schemas/Data Models Toolkits Portals Registries Databases Minimal funding for development No funding for support? Situation
Complexity • Complexity is a multiplier on: • Development: more to implement • Testing: more to test • Support: more to document, train, and upgrade • Performance: larger data transfers, longer parsing time • Simpler means we can get tools; with higher quality, better support, that run faster, and for less money
TCS Database BGIF Registry GISIN Data Providers GISIN Consumers Other Providers Other Consumers Web Services GISIN Portals GISIN Registry/Directory Other Web Sites Web Browser Communication End-Users Architecture
Protocol Design • Approaches: • TAPIR-Light • Key Value Pair Only • Flat data models • Performance : 1 million records in 14 minutes • Controlled vocabulary wherever possible
Required Data Models • BioStatus: Indigenous, Harmful, etc. • Occurrences: X, Y (DarwinCore) • ProfileURLs: Language, URL • ImpactStatus: Human, Agriculture, etc. • ManagementStatus: Activity, etc. • DistributionStatus: Growing, Stable, etc. All have: Scientific Name, Location
Implementation Requirements 1. Automatic Installation • Installer and DiGIR-like admin pages 2. Adapt toolkit to database, web server, security 3. Roll toolkit to another language (Perl, C++) 4. Do it themselves – Just the documentation • Existing toolkits/protocols are too complex and lack the development documentation to do 2 through 4 quickly
Protocol Transaction Diagram Request http://provider.org/GISIN.php ?Op=Inventory &Model=Occurrences &Count=true &Genus=Tamarix &Concept=Latitude &Concept=Longitude &Concept=Date &Concept=ScientificName Response <response> <inventory> <records> <record> <Latitude>-105</Latitude> <Longitude>40 <Date>10/12/2000</Date> <ScientificName> Tamarix aphyla </ScientificName> </record> … </records> </inventory> </response> SQL Query SELECT Latitude,… FROM Locations JOIN Observation… JOIN Organisms… WHERE Genus=‘Tamarix’ Database Locations Observations Organisms
Toolkit Design: Data Flow Configuration Files Internet Web Service Provider.xml Metadata.xml Capabilities.xml Query Builder Admin Web Site GISIN Protocol Database Connection Utilities Web Provider Database Date
Performance by Time Per Record 1 million records: 14 hours -> 14 minutes
Products Mapped to Customer Needs More Sophisticated Users Complex Queries RDF Consumers Consumers GISIN TAPIR/DarwinCore… KVP XML Providers Providers Invasive Species Databases Other Databases Adapted from Peter Fox, Debra McGuiness (personal communication)
Next Steps • Resolve Issues • Toolkit Development: • Complete the design • Roll to Java and ASP • User’s Guide • Testing: • 2-4 more databases connected • Automated tests • Defect tracking • Portal • Incremental improvements • Provider Meeting in November
Current Web Site • GISIN Organization Site: www.GISINetwork.org • GISIN Directory: www.niiss.org/GISIN • Until end of September: www.niiss.org/GISS • Browse Directory • Search for data: BioStatus, Occurrences, ProfileURLs • GISIN Technical Site • Documentation • For providers: • Get Toolkit • Sample Provider (based on the toolkit) • Manual exercising of TAPIR-GISIN web services • Automated tests are coming!
Acknowledgements • Funded by NSF, NBII (USGS), GBIF, TDWG • Thanks to: Renato de Giovanni, Roger Hyam, Donald Hobern, Markus Döring, Hannu Saarenmaa, Kevin Richards, Peter Fox, Debra McGuiness, Brain Steves, Pam Fuller, John Pickering, Shawn Dalton, Greg Ruiz, and the other members of GISIN • Review: www.niiss.org/GISIN (or GISS) • Contact: jim@nrel.colostate.edu