110 likes | 289 Views
EML Congruency Checker. A tool to assess and report on the quality of EML-based data packages. Software Stack. ECC. LTER NIS. Data Manager Web-service. Data Manager Library (Java Library). (Quality Checks & Reports). PostgreSQL. A Brief History. Data Manager Library (2006)
E N D
EML Congruency Checker A tool to assess and report on the quality of EML-based data packages
Software Stack ECC LTER NIS Data Manager Web-service Data Manager Library (Java Library) (Quality Checks & Reports) PostgreSQL
A Brief History Data Manager Library (2006) Costa, Jones, Leinfelder, Servilla, and Tao LTER All Scientist Meeting (2009) eml-dev, NCEAS Econinformatics group, LTER Developers, LTER Information managers Data Manager Web-service (2011) Costa, Earl, Gastil, O’Brien, Ramsey, Servilla, and Stephenson EML Congruency Checker (2011) Gastil and O’Brien
Checks must be defined case-by-case First: Structural Second: Scientific Error-free Metadata-data congruence Error-free data PASTA ready ? Complete Metadata
ECC v0.1 Error-free Metadata-data congruence Error-free data ECC V0.1 IMC Annual Meeting 2011, EIMC
Collected Quality Checks google_doc_quality_checks google_doc_quality_checks
EML Congruency Checker Version 0.1 Checks: • Data URL is valid • Display data from the URL • Database table can be generated • Data can be loaded into the database table • Compare number of rows loaded to number specified in metadata IMC Annual Meeting 2011, EIMC
ECC v0.1 – Current Capability Quality Check Field: System KNB: any package LTER: apply only to LTER Type: Data Metadata Congruence Status: Valid, Info, Warn, Error IMC Annual Meeting 2011, EIMC
<?xml version="1.0" encoding="UTF-8"?><qualityReport><creationDate>2011-08-14T17:22:59</creationDate><packageId>knb-lter-sbc.25.7</packageId><entityReport><entityName>Detritus_Biomass_All_Years.csv</entityName><qualityCheck qualityType="congruency" system="knb" status="valid"><name>Online URLs are live</name><description>Check that online URLs return something</description><expected>true</expected><found>true</found><explanation>Succeeded in accessing URL: <![CDATA[http://sbc.lternet.edu/external/Reef/Data/Long_Term_Experiment_Kelp_Removal/Detritus_Biomass_All_Years.csv]]></explanation><suggestion></suggestion><reference></reference></qualityCheck><qualityCheck qualityType="metadata" system="knb" status="valid"><name>Create database table</name><description>Status of creating a database table</description><expected>A database table is expected to be generated from the EML attributes.</expected><found>A database table was generated from the attributes description</found><explanation>CREATE TABLE Detritus_Biomass_All_Years_csv(YEAR TIMESTAMP,MONTH TIMESTAMP,DATE TIMESTAMP,SITE TEXT,TRANSECT TEXT,TREATMENT TEXT,QUAD TEXT,SIDE TEXT,SP_CODE TEXT,WET_WT FLOAT,AREA INTEGER,NOTES TEXT,GENUS TEXT,SPECIES TEXT,SIZE TEXT,functional_GROUP TEXT,SURVEY TEXT,KINGDOM TEXT,PHYLUM TEXT,CLASS TEXT,taxon_ORDER TEXT,FAMILY TEXT,GENUS1 TEXT,SPECIES1 TEXT,COMMON_NAME TEXT,Substrate_type TEXT,Mobility TEXT,Growth_morph TEXT);</explanation><suggestion></suggestion><reference></reference></qualityCheck><qualityCheck qualityType="data" system="knb" status="info"><name>Display some data</name><description>Display the first row of data</description><expected>One row of data should be displayed</expected><found><![CDATA[2008, 1, 2008-01-30, AQUE, 4, CONTROL, 20, I, CC, 0.33, 20, -99999, Chondracanthus, spp., -99999, ALGAE, DETRITUS, Plantae, Rhodophyta, Rhodophyceae, Gigartinales, Gigartinaceae, Mazzaella, californica, -99999, HARD, SESSILE, SOLITARY]]></found><explanation></explanation><suggestion></suggestion><reference></reference></qualityCheck><qualityCheck qualityType="data" system="knb" status="valid"><name>Data load status</name><description>Status of loading the data table into a database</description><expected>No errors expected during data loading or data loading was not attempted for this data entity</expected><found>The data table loaded successfully into a database</found><explanation></explanation><suggestion></suggestion><reference></reference></qualityCheck><qualityCheck qualityType="congruency" system="knb" status="valid"><name>Number of records check</name><description>Compare number of records specified in metadata to number of records found in data</description><expected>1962</expected><found>1962</found><explanation>The expected number of records (1962) was found in the data table.</explanation><suggestion></suggestion><reference></reference></qualityCheck></entityReport></qualityReport>
Future Direction • Implement full suite of quality checks • Work form current list (Google spreadsheet) • Design/specify Metadata Quality Checks with LTER Network Information System developers and Tiger Team • Improve community customization • Separate quality check configuration from processing logic where possible • Engage community through collaborative effort
production oriented workshop: Criteria for “pasta-ready” Involve Pilot sites Use pasta calendar Synthesis data project calendar PASTA = provenance automatic synthesis tracking system