290 likes | 305 Views
Tools used for testing and long-term preservation. Terje Pettersen-Dahl, adviser Department of electronic archives (Elark), National Archives of Norway. Bern, 10.4.2003. System types. Registry-based ERMs. Specialized case handling systems. Information systems. Arkadukt. Noark. ADDMML.
E N D
Tools used for testing and long-term preservation Terje Pettersen-Dahl, adviser Department of electronic archives (Elark),National Archives of Norway Bern, 10.4.2003
System types Registry-based ERMs Specialized case handling systems Information systems Arkadukt Noark ADDMML ArkN3 Arkade
Original system Overview ADDMML-file Structural-description Arkadukt New ADDMML-file Arkade For Long term preservation Data files New data files AnalysisChecksControls Access
Choice of method • Migration. • Preserving only extracts of data from the databases. • Extracts on a software and hardware independent format. • In addition to the extract we need technical metadata about the extracts.
Metadata • The metadata has to be standardized • The National Archivist has established a national standard for metadata called ADDMML (Archives Data Description and Manipulation Mark-up Language). • ADDMML is an XML DTD.
Structure in ADDMML The structure is hierarchical. A simple extract is called a dataset. A dataset can contain one or more files. A file may contain one or more tables. Tables contains some fields. Fields may contain codes. Dataset File Record-type Field Code
Arkadukt • Arkadukt produces ADDMML-files that always are 100 % correct syntactic. • This is a must for Arkade. • The user does not need to know anything about ADDMML! • The metadata itself is registered as plain text. • Simple registration. • Adjusted to the structure in ADDMML.
Arkade • Arkade has following functions: • Conversions • Analysis • Checks and controls • Special-functions • Additionally some functions can be initiated from Arkade: • Creation of SAS-dataset • Random quality testing of records
Conversion • Arkade can convert data on different terms. Examples are: • Convert from one character-set to an other. • Convert from one file-format to an other. • Change record-delimiter. • Unpack packed fields. • Split repeating groups or record-types into different files. • Convert from one field-format to an other. • All conversions are initiated by processes in the ADDMML-file.
Analysis Arkade does analysis of data on different levels • File level. • Count total number of records in the file. • Count total number of characters in the file.
Analysis (cont.) • Record-type level. • Find minimum- and maximum-length for records of this type in the file. • Find number of fields in records of this type, eventually minimum- and maximum-number if the number varies. • Count number of records of this type in the file. • Produce sorted frequency-lists of the values throughout the file for each field in the record-type. • Produce cross-reference table for two specified fields from the same record-type.
Analysis (cont.) • Field level • Count number of empty (NULL) and non-empty values in the field throughout the file. • Find length and record-number for the shortest and longest data-value (ex padding) in the field. • Find minimum- and maximum-value (including record-number) in the field. • Produce sorted frequency-lists of the values throughout the file in this field. All analysis are initiated by processes in the ADDMML-file.
Checks and controls As analysis checks are done on different levels. • File level • Check if given record-length is correct. • Check if given number of record-types is correct. • Check if given number of records is correct. • Check if given number of characters is correct.
Checks and controls (cont.) • Record-type level • Check whether primary key is unique and do not contain any empty value (NULL). • Check whether secondary key is unique and do not occurs with empty value (NULL). • Check whether foreign key either are empty or exists in the referenced file. Additionally if the given type of relation is correct. • Check if given record length is correct. • Check if given minimum record length is correct. • Check if given maximum record length is correct.
Checks and controls (cont.) • Record-type level (cont.) • Check if given number of fields is correct. • Check if given number of records of this type is correct. • Field level • Check if given field length is correct. • Check if given minimum field length is correct. • Check if given maximum field length is correct. • Check if given data-type and field format is correct. • Check whether the field always has a value (no NULL).
Checks and controls (cont.) • Field level (cont.) • Check on uniqueness. • Check given codes against a specified code-set. All checks are initiated by processes in the ADDMML-file.
Special-functions Additionally Arkade has a few special functions: • Control of control-digits in birth-number. • Control of control-digits in account-number. • Add key-fields in record-types where these are not given (Key-values are given indirectly by the records internally positions to each other). All special-functions are initiated by processes in the ADDMML-file.
SAS-dataset • Arkade can generate an internal dataset. As Arkade is made in SAS, this internal dataset will be a SAS-dataset. • The SAS-dataset can be used further to: • Sort tables • Do an extract • Make statistics • Make a basis for a public version. • Generation of SAS-dataset are initiated from the screen.
Random tests • Arkade can do random tests on the extracts. Examples: • Look at the first 100 records only.(The number can vary and is decided by the user.) • Look at each 25. record.(Once again the number is decided by the user.) • Only test the ADDMML-file without doing anything with the extracts. • Random tests are initiated in the screen. • Random tests are mainly used to check syntax and conformity in the data-files.
Conditions for Arkade • Arkade is dependent of a correct ADDMML-file. • To run Arkade there must be data-files, and the references to the data-files have to be correct. • Even most logical dependencies have to be correct.
ArkN3 • Imports data in the format described in the Noark-3manual. • Tests whether the described format is followed. • Presents cases and registry-records. • Makes it possible to search on different levels. • Does an analysis on the imported data.
International view Dublin Core ISAD(G) EAD ADDMML
ISO 15489 and MoReq versus Noark • These new standards are in close harmony with Norwegian theory and Norwegian requirements • But Noark is not a general records management-standard • Noark = a detailed application standard, initially for registry systems
ISO 15489 and MoReq versus Noark • Registry- and case handling workflow is integrated in Noark: 1) Registry handling control: follow-up- and “sign-off”-functions connected to case management (MoReq’s workflow-functions are related to capture, retention and availability/distribution) 2) Process management – implements the general specification in MoReq, but is closely related to registry handling and case handling in Noark 3) Board-handling (described in great detail, but only an option in Noark)
ISO 15489 and MoReq versus Noark • MoReq-elements which are given less consideration in Noark: • ”freezing” of metadata • audit trails • “robust” metadata capture • Necessary to map Noark to MoReq’s requirements • It is important for us to have a standard which is related to Moreq, • Market considerations (Norwegian suppliers export opportunities to EU-countries - and vice versa)
General RM-standard • I addition to Noark there is a need for en general Norwegian RM-standard based on MoReq • for systems without registry functions which generate and manage records • E.g. it is necessary with a category for file which is more general and liberal then the category “case” in Noark • A general standard is also necessary to avoid discrimination of EU-suppliers who offer MoReq-based solutions in Norway
RM-standard: possible Norwegian model Specific RM (Process management) Board handling Case handling & RM workflow Case handling info. in registry Registry- & Noark- based process mgmt.*) Other case handling & workflow Not registry- & Noark- based process mgmt. *) Noark also requires defined levels of functionality in Basic RM Basic RM (Doc. & metadata capture and other MoReq-specified functions) ”May” ”Should” Level of requirements: ”Must” Basic workflow