250 likes | 259 Views
Demonstration of mapping techniques and automatic data transformation from various formats (databases, text files, spreadsheet files) to XML files compatible with the CIDOC Conceptual Reference Model.
E N D
Mapping Techniques: Demonstration of automatic data transformation Maria Theodoridou ICS-FORTH April 2004
The Problem Convert data of various formats into a common XML formatcompatible with the CIDOC Conceptual Reference Model • Input data: • Databases • Text files • Spreadsheet files • Output files: • XML files, compatible with the CIDOC Conceptual Reference Model
Data Junction 7.5 An integration/migration tool designed to convert structured data from one format to another. Also designed to clean and restructure the data to fit the new format. Market Leader in its Field Map Designer: transfers data among a wide variety of data file types. To transfer data, you design and run a conversion. Each conversion contains all the information needed to convert data from an existing data file or table to a new Target data file or table. Conversions are done in three Steps: • Define options for the source file • Define options for the target file • Define appropriate mapping from the source to the target file using all the functions, expressions, actions and events the tool offers
Step 1 (Source) • Define Source Type • ASCII, Access, Excel, SQL, Oracle etc. • Define Source File • Each text source filewill be separated into fields and records • Field is a category of things, which have a common characteristic • Things of the same field are listed in separate records below the field • We use field and record delimiters
Step 1 (Source) • We can sort our data descending or ascending depending on one or more fields • By using the “Source Filters” we can keep only the records we want • We can have the field names by using a header in the source file • The parsing is done serially and every record is listed below the other • From the Data Browser we see all the fields, records and the data
Step 2 (Target) • Define Target Type • ASCII, Access, Excel, Lotus, SQL, Oracle etc. • Define Target File • We can use an existing record layout from: • A previous conversion • An XML file, an XML DTD, an XML Schema
General about a conversion • Every conversion uses the Event Handlers, the Actions, the Expressions and the Functions • Every Event, from a predefined set, contains one or more actions • Every action has either a predefined set of parameters or uses an expression • Every expression can use one or more of the predefined or the user defined functions • We can also declare global variables, which can be used by every expression within a conversion.
Events • Predefined set of Events • They occur at a specific moment with a predefined order in the conversion cycle • Events are separated into Source Generic Events, Target Events, Conversion Events, and Record Type Events • Examples of Events: • BeforeConversion,AfterConversion,AfterEveryRecord,OnDataChanged,BeforeMap,AfterMap,BeforePutRecord,OnDiscard
Actions Actions have their own format and parameters: • Execute: executes an expression we specify • Clear: Clears the contents of a target record memory buffer • Map: maps a specific target record type • Put: writes the results of a target record type to a specific position in the target file • ClearMapPut: combines the functionality of Clear, Map, and Put • Abort: causes the conversion to abort
An Example • We Map and then Put the target record types “CRM_Entity”, “is_identified_by” and “has_current_owner” • The global variable “id” helps us map the appropriate fields to the “is_identified_by” target record type
Expressions • They offer conditional mapping and data modification • Each expression is written for an “Execute” Actionor is a Target Field Expression • We use the same syntax in every expression • Every expression is written in the expression builder window
A target field expression: If (IsNull(Fields("nr"))) Then "NULL" Else "OID:LUPA."&Fields("nr") End If An expression example
Functions Set of predefined functions separated in categories: • Conversion: typecast a string or a variable • Date/Time: convert strings into several formats of date and time • Math: return the sign, absolute, logarithm etc. • Parse: parse a string into sub strings by using delimiters
Functions • Text: replace, compare, count, trim strings • File: read, copy, append and rename a file or get the name of a directory or make a new directory • Flow Control: For … Next, Go To, If … Then … Else, Return, Select Case … End Select, While … Wend. • We can also have user-defined functions, save them as a Visual Basic Module File (.bas)and call them within the conversions we want
Step 3 (Map) • Here we define the Events, Actions, Expressions and Functions we use • We define the order and type of the fields of every target record type • A target record type may contain a lot of other record types • When a field is selected to be a record, then it automatically links to a record type, which has the same name with the field
Lupa transformation Lupa – Stone data base of 7000 archaeological records, City of Vienna, Austria Microsoft Access to XML conversion
Lupa transformation Target XML file
Field name CIDOC CRM correspondence Lupa transformation Map Nr CRM_Entity->P47F_is_identified_by->E42_Object_Identifier Bilder CRM_Entity->P129B_is_subject_of->E38_Image Titel CRM_Entity->P102F_has_title->E35 Title Objekt CRM_Entity ->P2F_has_type-> E55_Type Fundort CRM_Entity -> P12B_was_present_at-> E5_Event Bezirk CRM_Entity->P12B_was_present_at->P7F_took_place_at->E53_Place Verwahrort CRM_Entity->P55F_has_current_location->E53_Place Literatur CRM_Entity->P70B_is_documented_in->E31_Document Inschrift CRM_Entity->P65F_shows_visual_item->P142F_has_clear_text Inschrift_Abfrage CRM_Entity->P65F_shows_visual_item->P141F_has_transcription
CIL transformation CIL –Corpus Inscriptionum Latinarum 25,150 roman inscriptions text to XML conversion
CIL transformation Source CIL 13, 00029* = AE 1994, 1194. Abelloni / deo / (A)elia(?) Mar/[6] / v(otum) s(olvit) CIL 13, 00118* (AE 1989, 0519). D(eo?) M(ercurio?) / pro sal(ute) Imp(eratoris) M(arci) Aur(eli) Antonini / Fel(icis) Aug(usti) Brit(annici) p(ontificis) m(aximi) tr(ibunicia) p(otestate) XV imp(eratoris) / II co(n)s(ulis) II des(ignati) III p(atris) p(atriae) C(aio) Iul(io) / Drutedo et Balorice Taur(i) f(ilio) / ex v(oto)
CIL transformation Source ContentExtractor is a report reader designed to mine important data out of unstructured report formats such as invoices, bank statements, emails, html docs, etc., and turn that information into a database We used Content Extractor to convert unstructured text to Microsoft Excel file format
CIL transformation Target XML file
Field name CIDOC CRM correspondence CIL transformation Map Field 1 CRM_Entity->E34 Inscription->P65F_shows_characters Field 1 CRM_Entity->E34 Inscription->P141F_has_transcription Field 2 CRM_Entity->P70B_is_documented_in->E31_Document Field 2 CRM_Entity->P70B_is_documented_in->E31_Document