250 likes | 394 Views
Data Integration: Using TAPIR as an asynchronous caching protocol. Aaron Steele asteele@berkeley.edu University of California at Berkeley Museum of Vertebrate Zoology. Application. Network. Application. Network. Application. Network. Application. Cache. Network. Application. Cache.
E N D
Data Integration: Using TAPIR as an asynchronous caching protocol Aaron Steele asteele@berkeley.edu University of California at Berkeley Museum of Vertebrate Zoology
Application Network
Application Network
Application Network
Application Cache Network
Application Cache Network
Application Cache Network
Application Cache Network
“Nanos Gigantium HumerisInsidentes.” - Issac Newton
How Can Google Help? • Google Base • Google Subscribed Links
Google Base • Submit record metadata: form, bulk, API • Google creates your data index • Query data using Google Base protocol • Search results link back to your data • Track usage statistics • Change or delete metadata • No storage or transmission limits • Check the TOS for details
BioCase & TAPIR Adaptersto Google Base Application Adapter Google Base (cache)
Google Subscribed Links • “Add custom search results to Google” • You define query, result format, result link • Dynamic! Supply XML, TSV or RSS feeds • Include images or gadgets (maps, etc) • Users subscribe to your links
A Word about Citations • A link is essentially a citation • Search results from Google Base and Subscribed Links return pointers (links) to your data, not the actual data
Application Network
Application SQL Cache TAPIR Protocol Network
Data HarvestingSoftware • Java 1.5, Eclipse, dom4j, Hibernate, MySQL • XML configuration • Resource access points and a global set of filtered concepts to cache • HigherGeography = Madagascar • Class = Aves OR Class = Reptilia • CoordinateUncertaintyInMeters != null • Harvest via TAPIR inventory requests (KVP) • Paged inventories were handled with an Inventory class that implemented the Iterator interface
Application SQL DwC Cache TAPIR Protocol Update Feeds Network
Data Synchronization Implementation • Network records added, removed, changed • Cache must reflect these changes • PHP 5, SQLite application • Register resources • Generates Atom & RSS GUID update feeds • Compares successive copies of GUID-DLM inventories: • if new GUID detected, record INSERT • if DLM changed, record UPDATE • if old GUID missing, record DELETE
Application SQL DwC Cache TAPIR Protocol Update Feeds Network
HerpNET Proof of Concept • Class = Reptilia OR Class = Amphibia • CoordinateUncertaintyInMeters != null • 20/80 providers accessible via TAPIR • 200k cached georeferenced records • AmphibiaWeb synonmy lookup on scientific name using synonmy server, each synonmy name looked up in cache for coordinates, then results mapped using BerkeleyMapper • Query times reduced to 5ms from 15s
Future Work • ReBioMa Project • Funded by MacArthur • Dynamic SDM for Madagascar • MaxEnt using cached records from TAPIR providers georeferenced by BioGeomancer • New models fitted and projected when cache updates