90 likes | 113 Views
NORMAN Databases. Collection of data for EMPODAT – key issues. Environmental Institute, Koš, Slovakia. Norman Databases. Process of data collection & upload
E N D
NORMAN Databases Collection of data for EMPODAT – key issues Environmental Institute, Koš, Slovakia
Norman Databases Process of data collection & upload • Data collection - excel Data Collection Templates (matrices water, waste w., sediments, biota, spm, sewage sludge, soil, air)… DCT exampleonline DCT • Contacting a data provider • To provide data in the DCT Template • To provide data in their format, to be transformed into DCT Template by the database team • Checking data completeness • Returning the filled-in template to the data provider for check • Preparation of dataset for upload & upload • Technical check – e.g. the database codes, typing mistakes, etc. • Upload via DCT allows to upload also incomplete data, where not all obligatory fields are filled in
Data collection - Problems/gaps Provided Data quality – main problem: missing info in the templates • Data source – obligatory fields not filled: (“easy” metadata like: Organisation (data owner), e-mail; Type of data source – monitoring, survey, research) • Analysis – not obligatory, but usefull: coordinates, national codes; sometimes station names are given only as a code • Analytical method - significant gaps, obligatory fields filled-in only partly, missing info related to quality related info Result: • Data are clasified in the lowest quality category • Not possible for user to follow-up on the data reasons: data provider does not have it either/too tedious to collect information/analysis was done by third parties – infomation missing/not for public DCT good example
Example http://www.normandata.eu/empodat_detail1.php?id=43340
Example http://www.normandata.eu/empodat_detail1.php?id=69000
Data collection - Problems/gaps Solution needs to be find for: • Data already uploaded (in planning process) • Exchanged datasets of insufficient quality • Individual corrections via online forms • Data in pipeline • Requesting data owner to provide as much data as possible • Data provided in the future • discuss in advance with the data provider, what is the Norman database strucutre • Do not accept data without required metadata
Data collection – Data in the pipeline • MODELKEY data – water & sediment (about 260K data) • VEOLIA data – water (about 60K data) • BRGM data – water (about 5K data) • Missing : • Data source/monitoring type • Analysis: sampling parameters – geographical/analytical • Information about the analytical method (QC/QA information about chemical data) • IVL data – water & sediment & biota (about 31K data) • Missing : • Analysis: sampling parameters – geographical/analytical • Information about the analytical method (QC/QA information about chemical data)
Data collection - Problems/gaps Data collection – main problems • Data providers may found the DCTs too complicated • Time consuming to prepare DCT ready for upload (needs several rounds going back to data provider for missing information) • Available info do not match exactly with required info in DCT • Datasets are too large Solutions: • We offer to convert the data provided from any format to DCT (access or other excel form) • Clarification of DCTs with the data provider • For really large datasets or regularly updated datababses a technical sollution needs to be developed for automated data transfer, IT interfaces can be created if necessary
Data collection - Summary • Development of a process for improvement of the data quality / rules for data acceptance • Development of a strategy/agreement for the data collection: • Who should provide the data • In which form the data will be provided • When the data should be provided – annual basis? • Other consideration?