600 likes | 869 Views
Training Course on EDIT. For Users. Outline of the module. Introduction Using EDIT - integration with other tools Objects in EDIT for Users EDIT Graphical User Interface Future developments. A - Introduction. EDIT is a tool for data validation - data edit/imputation.
E N D
Training Course on EDIT For Users
Outline of the module • Introduction • Using EDIT - integration with other tools • Objects in EDIT for Users • EDIT Graphical User Interface • Future developments
EDIT is a tool for data validation - data edit/imputation • What is data validation?- An activity aimed at verifying whether the value of a data item comes from the given set of acceptable values: • What is data editing? - The activity aimed at identifying erroneous entries and correcting them if necessary. Example: the response is missing or incorrect.
How EDIT works shortly? A format contains a description of the data in a dataset A dataset is a set of data according to a specific format Define a format Define a program containing rules and file operations to be executed on the dataset(s) Uploads dataset(s) from external files For users Execute the job Get the report containing errors (if any)
EDIT User types • 'User‘ - Executes programs on datasets and accesses the reports. • 'Programmer‘ - Manages the metadata needed by the user to execute programs; • Implements 'formats‘; • Implements ‘validation rules’ by means of 'programs'; • Defines other operations on files by mean of 'programs'; • Sets up the unattended mode configuration. • 'Administrator' • Manages users and permissions.
'User' type functionalities • ‘Change Password’ • Allows users to change their password; • ‘Dataset Import/Export’ • Allows users to import and export data to and from EDIT as well as monitor any ongoing import/export processes; • ‘Job Execution’ • Allows users to execute programs on imported datasets and view/export the results of the execution.
What can we do by means of a ‘program’? • Run programs containing mainly validation rules / computations: A1 – Single column – only a column is involved; A2 – Multiple columns – two or more columns within a single record are involved; B - Vertical – multiple records involved; C - Hierarchical – multiple datasets involved. • Perform dataset operations: Copy, Merge, Alter, Aggregate, etc. • Use specialised functions like outlier detection: Terror, Hidiroglu-Berthelot, σ-Gap; • Accepted formats: SDMX-ML, GESMES, CSV, FLR.
Accepted data formats CSV (with or without header) (SBS, CVTS,TOURISM) 9H; 2008; LT; 2; B-N_X_K642; 11930; 16236; ; ; ; ; UNIT; ; ; ; ; ; TT0; ; ; ; ; D08 9H; 2008; LT; 3; B-N_X_K642; 11930; 1001; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08 9H; 2008; LT; 4; B-N_X_K642; 11930; 529; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08 9H; 2008; LT; 30; B-N_X_K642; 11930; 17766; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08 9H; 2008; LT; 2; B-E; 11930; 1138; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08 9H; 2008; LT; 3; B-E; 11930; 104; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08 9H; 2008; LT; 4; B-E; 11930; 61; ; ; ; ; UNIT; ; ; ; ; ; TT; ; ; ; ; D08 multi-year 2007, 2008, 2009 observations FLR example 1 001E20100121814 00 804.822 001E20100121816 93 5295.54 001E20100121814 99 6166.24 001E20100125290334 581.371 FLR example 2 2010010011 010252000405595911005909580E 01ZZZZZ 2691.966 2734482.0 0.0 2010010011 010252000405595911004009600E 01ZZZZZ 237.543 341202.0 0.0 GESMES (BOP ITS, BOP FDI) UNA:+.? ' UNB+UNOC:3+FR2+4D0+100929:1637+IREF000243++GESMES/TS' UNH+MREF000001+GESMES:2:1:E6' BGM+74' NAD+Z02+ECB' NAD+MR+4D0' NAD+MS+FR2' IDE+10+EUROSTAT_BOP_01 reporting' DSI+BOP_FDI_A' STS+3+7' DTM+242:201009291637:203' DTM+Z02:20072009:702' IDE+5+EUROSTAT_BOP_01' GIS+AR3' GIS+1:::-' ARR++A:FR:N:2:330:N:4A:E:9999:9999:20072009:702:0:A:F+0:A:F+0:A:F‘ ARR++A:FR:N:2:330:N:4F:E:9999:9999:20072009:702:0:A:F+0:A:F+0:A:F' ARR++A:FR:N:2:330:N:7Z:E:9999:9999:20072009:702:0:A:F+0:A:F+0:A:F' ARR++A:FR:N:2:330:N:A1:E:1100:9999:20072009:702:5824:A:F+5930:A:F+4204:A:F' ARR++A:FR:N:2:330:N:A1:E:1495:9999:20072009:702:5828:A:F+5932:A:F+4206:A:F'
Ways of using EDIT • As a web-based application – called by other applications; • Standalone – running on a PC; • Client – server – running in a Data Centre.
EDIT as Web-based application • Web-based Interface • Unified interface for both the standalone version and the server deployment; • EUROSTAT Look & Feel; • Light interface, simplified workflows. • ECAS account is needed.
EDIT running standalone • Downloadable package; • Standalone installation supported by Windows XP and Windows 7; • Simple installation wizard; • Full functionality; • Standard authentication is requested.
Client - server mode for EDIT • EDIT runs on a UNIX machine; • The current setup is EDIT installed at Eurostat & other DGs; • Contains all registered domains (= user specific workspaces) as by default imbedded; • ECAS credentials needed for external users.
EDAMIS integration • EDAMIS allows transmitting data files through a single entry point; • EDAMIS can send data to EDIT by placing the files in a configurable location; • EDIT detects metadata based on the EDAMIS naming convention; • EDIT performs the processing in unattended mode.
SDMX integration • Statistical Data and Metadata Exchange (SDMX) initiative is sponsored by seven institutions (the BIS, the ECB, Eurostat, the IMF, the OECD, the UN and the World Bank); • SDMX describes and universalises the way to exchange statistical data and metadata; • EDIT can import SDMX-ML datasets.
C - Objects in EDIT for Users • Datasets instantiations - lookups; • Programs, jobs
1 - Dataset instantiations • Dataset Instance (Dataset) – a collection of data rows according to the structure of a format; • A two dimensional table composed by rows and columns: • Columns correspond to the fields defined in the format; • Records – no limit on size or number.
Example: 'Format' – 'Dataset instantiation' Format Dataset instantiation
Lookup tables – code lists • Lookup – An auxiliary dataset containing a list of values to be used for validating codes; • Code lists – usually lookup tables refer to code lists; • One can use several code lists inside the same program – as many as needed for the given data sets – 'Country', NACE, NUTS; • Several versions of the same code list can be used from within the same program, if needed.
2 - Programs, jobs • Program – a set of operations to be performed on a specified dataset definition (format); • No specific dataset is associated with a program, only formats (dataset definitions) should be specified; • Job – the association between a 'Program' and concrete 'Dataset Instances'; • Possible operations types of rules/checks: Single and Multiple column(s), Vertical and Hierarchical.
Validation report • It contains: • Job results – information about the job; • Error statistics – summary of the errors; • Error report – detailed list of errors.
Error statistics • The error statistics are displayed in a table format and it consists of the following columns: • Rule name: The name of the program rule that failed; • No of Failures: Individual rows that the error appeared through job execution; • Rule Message: Rule’s error message as defined in the program.
EDIT Home page Menu options User profile information Herepasswordcanbechanged
Go in >Dataset>> Import dataset Screen part I Defining dataset: import dataset Select a file on your hard drive Select a file type (CSV / GESMES / FLR / SDMX) Reusesavedparameters Starting line Save properties for further use
Screen part II Select a format Defining dataset: import dataset Reusesaved configuration Select columns to import Use the arrows to addremovefields Provide a name for the new dataset Save configuration for further use Click to import
Unsuccessful import Defining dataset: import dataset Click to download the importing report in text format Statusis FAILED
Successful import with warnings Defining dataset: import dataset In the report, two records wereskipped (lines 2 and 5) Click to download the importing report in text format Statusis COMPLETED
Successful import Defining dataset: import dataset Afterimporting, EDIT redirectsyou to the searchdatasetscreen Click to look at the content imported Delete a selecteddataset Statusis COMPLETED
Defining dataset: import dataset Click to hiddenfileds Select fields to behidden in the display Hiddenfields EDIT hides the selectedfields
Defining dataset: import dataset Unfold the Basic filtering options Select an logicaloperator Enter a value Select a field in the datatset (e.g. WEIGHT) The corresponding records are filtered
Defining dataset: import dataset Unfold the Advanced filtering options Create an expression aided by the lists of fields, operators and functions Click to apply the searchcriteria The corresponding records are filtered
Defining dataset: import dataset Customizeyourview Export in CSV format
Defining dataset: search dataset Searchcriteria Restore an archiveddataset Export the dataset in CSV format List of alreadyimporteddatasets Viewdetails of the datasetwithfiltering options Delete the dataset Archive the dataset
Defining dataset: Import/Export dataset Import/Export historysearch Searchcriteria Viewdetails of the datasetwithfiltering options List of Import/export history Delete the dataset
Defining jobs: Create a job Menu option Searchcriteria Click to create a job for this program List of existing programs to beexecuted
Defining jobs: Create a job Enter a name and a description Choose the dataset to validate (if several) Execute the job
Defining jobs: Create a job When the validation isfinished the date isdisplayed During the validation process, onlycancellationis possible Validation is RUNNING
Defining jobs: Create a job Delete the job Copy the job When the validation isfinished the date isdisplayed Click to view the results Validation is COMPLETED
VIEW RESULTS OF A JOB Defining jobs: Create a job Click to view the Error table
VIEW ERROR TABLE OF A JOB Filtering by Errorfields Defining jobs: Create a job Unfold Basic filtering Unfold Advanced filtering Error message number Export the error table (CSV)