130 likes | 147 Views
Learn about the importance of data validation in biodiversity and environmental conservation decisions and how the French Information System on Nature and Landscapes ensures data quality.
E N D
Data Quality in Data Exchanges: a Tri-Part Approach in the French Information System on Nature and Landscapes Rémy Jomier (UMS Patrinat, National Natural History Museum – MNHN –, French Agency for Biodiversity – AFB –, and National Center for Scientific Research – CNRS –), Nature data standardization manager Solène Robert (UMS Patrinat, National Natural History Museum – MNHN –, French Agency for Biodiversity – AFB –, and National Center for Scientific Research – CNRS –), Nature data and geographical data cellcoordinator
Why validate taxon data ? • « Whoever is careless with the truth in small matters can not be trusted with important matters » - A. EinsteinA datum is small… But has to be validated (hence, true) in order to be trusted with the important matters : biodiversity, our environment, and related research/political decisions. It gets even more important when you know that:« some outside the museum community see the quality of museum data as being generally unacceptable for use in making environmental conservation decisions » - A. Chapman • Some of you may think « Hey, • I know THAT name ! »
What’s the « SINP » ? • SINP: French, national Information System on Nature and landscaPes. (Système d'Information Nature et Paysages) • Encompassesanybiodiversity and geological data in France. Includesboth taxon occurrences and habitat occurrences for biodiversity. • Uses welldefinedrules, often more constrainingthan the Darwin Core (DwC) • The part dealingwith a taxon occurrence : OccTax
What is a taxon occurrence? Observation or non-observation of a taxon, at a time, a place, by observers. • Example : • On 10th may 2014, Patrick Haffner (MNHN) observed badger traces at the point 8 050, 67 523 (Lambert 93 projection) • Indirect observation of a mammal • Direct observation of a butterfly
Prerequisite to data exchange: • Data conformity and consistency • Conformity: ensuresthat a datumcanbeexchanged • Presence of compulsoryelements • Type of the attribute (text, number, date…) • Consistency: ensuresthatthereis no blatanterror • Checkingconsistencyneeds to have comparisonbetweenelements • Example: end date / start date • Both are ensured by a national protocol, common to all
Scientific validation of data: What happens • Validation processes • Validation levels Datum • Manual: experts • Automated: compare with knowledge bases • Combined arms: both for maximum damage ! Er… Sorry, validation. • Producer validation • Regional validation • National validation • SCOPE ! • Looks like overkill ? Nu-uh ! That’s the bare minimum !
Scientific validation of data: the needs within SINP • 3 levels : • Producer’slevel, with a self evaluation • Regionallevel, coordinated at a regionallevel • National level, coordinated at a national level (canbeequivalent, at times, to the regionallevel, taking care not to duplicate efforts is key) • National validation isdoneglobally, by using national expert networks and feedback fromusers, or knowledgedatabases • Validation should NEVER slow down data movement… But shouldalsobeexchangedwhenitexists.
Scope ? • Quick and relatively easy to check: the taxon/date/location triplet Minimal scope • Not always easy to check: any other information. The process to which it’s been submitted, and what elements have been checked, have to be described. Enlarged scope
Scientific validation of data: Processes • Automated process : Comparing information with reference databases (presence maps for example). Very quick, but dependent upon existing databases. • 1.5 hour / 1,5 million data for conformity, consistency, and minimal scope scientific validation • Manual process : has to have experts intervene and check each and every datum. Time consuming, but very reliable, can work outside of automated bounds and without databases. • Combined process: combines both, with automated process flagging things that the experts should check.
Scientific validation of data: Results • Eachdatumistaggedwith a trust level, and all relevant information: • Level (producer, regional, national) • Scope (minimal, enlarged) • Validator (thisensures trust) • Date (in case of a furtherrevision and update) • Type of validation (manual, automatic, combined) • Reasonable Fallout: validated data
How does all that affect data exchange ? • Each element needs to be attached to the datum • Aim of a standard : exchanging information Need for concise information, numerical or 1-2 letter codes • No need to embark useless information. « not checked » doesn’t need data • Levels don’t require same information (producer vs regional/national) • Checking for duplicate data: interesting • Data may have been validated on compulsory attributes, but rejected on optional ones: Keeping the optional information in a different place, validated information is exchanged • Data flow: when do we update and how ? Modification date on the datum
TēnākoutouThankyou for your attention • E-mail : rjomier[at]mnhn.fr