220 likes | 236 Views
Learn about georeferencing and how it can link specimen data with geographical information to map biodiversity in Southern Africa. Explore the benefits, resources needed, and current practices in georeferencing. Discover new methods and tools for more accurate and efficient georeferencing.
E N D
Georeferencing — how can we getsouthern African primary biodiversity data on the map? Pieter JD Winter SANBIF Georeferencing Training of Trainers Workshop, 4—8 April 2011, Cape Town, South Africa
What is georeferencing? • Linking of (specimen) data with the type of geographical data that defines a record in space. • In biodiversity collections, this geographical data is initially usually in the form of a locality text string.
What is georeferencing? • In addition to a text string, grid or coordinate data may have been recorded. • Such data can be used by a computer to map a point. • In contrast, when the grid or coordinate data is absent, retrospectivegeoreferencing is done after the collection event. • Retrospecitvegeoreferencing essentially translatesthe locality text string into a format that is computer mappable.
Occurrence Data Correct identification + Correct georeference data
Why georeferencing? Occurrence data are potentially the most widely or frequently applied data from biological collections, because: • digitized specimen data are put in a geographical context. • they facilitate finding populations in the field. • they facilitate visualization of species ranges. • they facilitate the application of many useful tools, from species dot maps to full GIS analyses. (John Elia, Trainer, Tanzania 2010)
Why georeferencing? Once we can apply data in this way, we can use the derived information for a host of research and decision support purposes: • Conservation Planning. • Environmental management. • Vegetation analysis. • Presentation of taxonomic research output.
MANAGING THE GEOREFERENCING OF PRIMARY BIODIVERSITY DATA (John Elia, Trainer, Tanzania 2010) Managers and curators may want to know : • What are the benefits vs. costs of georeferencing the collection? • How will the georeferenced data be used, and by whom? • What proportion of my collection is already digitized? • What kind of expertise am I going to need? Etc. • How hard will this be? • How long will it take?
RESOURCES NEEDED(John Elia, Trainer, Tanzania 2010) • Each institution will have different resource needs in order to georeference their collections. • The basics include: • Topographic maps (paper & electronic). • Geographically skilled personnel. • Access to a good gazetteer – (many are available free via the Internet, either for downloading, or via on-line searching). • Suitable computer hardware. • Preferably internet access (as there are many resources on the Internet). • A database and database software (Not recommended to use spreadsheets). • Think twice before doing specimen by specimen --- make use of batching tools!
Case Study: SANBI Herbaria • QDS (quarter degree squared) grid georeferences. • State of the art in the country in the 1970’s, when relational databases first became popular. • 80% of collections georeferenced to QDS level or higher. • Estimate less than 5% of records have a higher than QDS resolution. • Hardly any records have datum values. • Some have an indication of possible extent or uncertainty, but mostly recorded as distance intervals.
Georeferencing errors • Automation: Infamous ‘global’ edits of early days were sometimes disastrous, e.g. Mitchell’s Pass (EC vs WC), and now have a stigma. • Errors in both plant identification and georeference. • Name error (conservative estimate): 5% of holdings • Georeference error (conservative estimate): 9% • Total estimated occurrence data error = 5% + 9% = 14% • Being an average figure, it is not unusual to expect a 0% error for some partial datasets, and a 28 % error in others.
Current SANBI practice • Much duplication & repetition of effort: • Different individuals looking up same georeference. • Same individual repeating a lookup for a different taxon, at a different occasion. • Some people keep record (own gazetteers) to avoid their own repetition. • Lot of gazetteer information not in public domain.
Current SANBI practice • Methods currently employed are capacity intensive. • Require a fine balance of slog and skill, thus not cost effective to use only semi-skilled, or only high skilled, personnel. • Resources are not accessible to all, or not systematically employed. • Priority for enhancing accuracy: Threatened Species data (now mostly by Ilva Rogers). • Selective in choice of what to georeference.
More Hope • TanBIF/GBIF, Tanzania training course an eye opener: • MaNISgeoreferencing calculator • GeoLocate • BioGeomancer • Concepts of collaborative georeferencing and batching to avoid the need for 7 millenia to manually georeference the world’s primary biodiversity data.
User demand & plan • High demand for the remaining 95% to be more accurately georeferenced. • Consider allocating the role to a select team only, collaborating across branches or organizations. • Automation and collaborative or distributed georeferencing need investigation to accelerate the process. • Optimize the process of returning remotely improved georeferences to source. • No tool is a panacea, and circumstantial evidence of the sort that humans need to interpret will still be needed for difficult cases, but new methods promise to remove the bulk of slog work, a major resource constraint at present. • Gaps that exist between our georeferencing methods and gazetteer usage, can now be effectively plugged with GeoLocate & BioGeomancer. • The more the gazetteers are used and enriched in a region, the better they are for further users.
ACKNOWLEDGMENTS • SABIF, SANBI (BIM), GBIF • Trainer team • Local organizing committee
Error examples • If 75% of taxon records for ‘Mitchell’s Pass’ have been given a wrong georeference (e.g. by ‘global changes’), that dataset obviously has a 75% occurrence error. • Likewise, but maybe less obvious, if all specimens of Berzeliaintermediahave been assigned to an incorrect name, B. commutata(incl. det. and data capture error), this gives a 100% occurrence error in the dataset for B. intermedia, unless the ‘correct’ taxon happens to co-occur with the ‘wrong’ one, B. commutata. • Georeferencing of a species’ range based on specimen data requires that the records are correctly named, before outliers can be omitted. • Best practice to include the type locality data when inferring distribution ranges.