510 likes | 629 Views
Raw Data engine. Perl – a Generic Cure For a Perennial Problem. The Task. Get from a customer data files in his format + specs Create from these files data understood by ALEPH system. General Description. Input. Output. Manipulation. Input - Overview. Input. Record. Output.
E N D
Raw Data engine Perl – a Generic Cure For a Perennial Problem
The Task Get from a customer data files in his format + specs Create from these files data understood by ALEPH system Perl – a Generic Cure For a Perennial Problem
General Description Input Output Manipulation Perl – a Generic Cure For a Perennial Problem
Input - Overview Input Record Output Manipulation Record Record Record Record Record Perl – a Generic Cure For a Perennial Problem
Input Structure Input field,field, field, field, field, field Output Manipulation Record Record Record Record Record Perl – a Generic Cure For a Perennial Problem
Input Variety Input field,field, field, field, field, field field;field;field;field;field;field f1 f2 f3 f4 f5 Perl – a Generic Cure For a Perennial Problem
Input - Multiline Input Rec name value name value name value …………….. Rec name value name value name value …………….. Perl – a Generic Cure For a Perennial Problem
Make Input Uniform All input into a hash: %input_hash; $input_hash{f1} = f1_val; $input_hash{f2} = f2_val; Perl – a Generic Cure For a Perennial Problem
Repeatable Fields 32137001535410 I0010.. 32137001535410 I0020.. 005 I0050.. 1992-07-29 I0060.. TB55 I0250.. 0004688683 I0300.. PERP I0330.. PERP I0360.. WCC3 I0390.. $***0.00 I0250.. 0004688684 I0300.. PERP I0330.. PERP I0360.. WCC3 I0390.. $***5.00 Perl – a Generic Cure For a Perennial Problem
It’s a Perl Hash Some of %input hash entries become pointers to lists: x \%input_hash 0 HASH(0x1a7e8f0) 'I0010' => 32137001535410 'I0020' => 005 'I0050' => '1992-07-29' 'I0060' => 'TB55' 'I0250' => ARRAY(0x1c3e990) 0 0004688683 1 0004688684 'I0300' => ARRAY(0x1c3e96c) 0 'PERP' 1 'PERP' 'I0330' => ARRAY(0x1d40328) 0 'PERP' 1 'PERP' 'I0360' => ARRAY(0x1d4031c) 0 'WCC3' 1 'WCC3' 'I0390' => ARRAY(0x1d40388) 0 '$***0.00' 1 '$***5.00' Perl – a Generic Cure For a Perennial Problem
Some Code… ## no value yet – simple scalar unless (defined $input_hash{$current_name}) { $input_hash{$current_name} = $value;}## already value for key else {## already list – just push if (ref $input_hash{$current_name}) {push @{$input_hash{$current_name}}, $value; }## only scalar – make list with existing + new else {my @lst = ($input_hash{$current_name}, $value); $input_hash{$current_name}= \@lst; } Perl – a Generic Cure For a Perennial Problem
Output Let’s look at one typical output: • Each record is one simple string • Each field has a fixed width (Typical for file to be imported into DB) Perl – a Generic Cure For a Perennial Problem
Output – Needed Structures To build a correct output record we need: • The values – value per field • Their correct order • Their correct format (I.e. witdh per field) Perl – a Generic Cure For a Perennial Problem
Output Info Source How do we know what the output record looks like? • From DB ‘describe’ of relevant table • From Data Definition file (COBOL copy, C h file etc) • From some special purpose definition file Perl – a Generic Cure For a Perennial Problem
Output From COBOL Copy 01 Z71. 02 Z71-REC-KEY. 03 Z71-SEQUENCE-TYPE PICTURE X(1). 03 Z71-DOC-NUMBER PICTURE 9(9). 03 Z71-COPY-ORDER-SEQUENCE PICTURE 9(5). 03 Z71-REC-SEQUENCE PICTURE 9(15). 02 Z71-USER-NAME PICTURE X(10). 02 Z71-OPEN-DATE PICTURE 9(8). 02 Z71-OPEN-HOUR PICTURE 9(4). 02 Z71-OPEN-SECOND PICTURE 9(4). 02 Z71-ACTION-DATE PICTURE 9(8). 02 Z71-TYPE PICTURE X(2). 02 Z71-ALPHA PICTURE X(1). 02 Z71-TEXT PICTURE X(200). 02 Z71-DATA PICTURE X(2000). 02 Z71-TRIGGERED PICTURE X(1). Perl – a Generic Cure For a Perennial Problem
Field Order @z71_cols_copy = ( 'z71_sequence_type', 'z71_doc_number', 'z71_copy_order_sequence', 'z71_rec_sequence', 'z71_user_name', 'z71_open_date', 'z71_open_hour', 'z71_open_second', 'z71_action_date', 'z71_type', 'z71_alpha', 'z71_text', 'z71_data', 'z71_triggered' ); Perl – a Generic Cure For a Perennial Problem
Output format $z71_format_copy = '%-1s%09s%05s%015s %-10s%08s%04s%04s%08s%-2s%-1s%-200s%-2000s%-1s'; A regular Perl format for (s)print based on the COPY info • Each character string (X(n)) is left justified • Each number is right justified ans zero padded. Perl – a Generic Cure For a Perennial Problem
Writing Correct Output Now suppose we have built %zhash with values corresponding to the Z71 copy. The following code writes out a record foreachmy $f (@z71_cols_copy) {push @out_vals, $$zhash{$f};} ##sprintf receives the format and a list $out_str = sprintf($z71_format_copy , @out_vals);print $out_str . "\n"; Perl – a Generic Cure For a Perennial Problem
Making It Generic $current_cols = $ztable . “_cols_copy”; *cols = *$current_cols; $current_format = $ztable . “_format_copy”; *format = *current_format; foreachmy $f (@cols) {push @out_vals, $$zhash{$f};} ##sprintf receives the format and a list $out_str = sprintf($format , @out_vals);print $out_str . "\n"; Perl – a Generic Cure For a Perennial Problem
Output Sum up To Create • Read all relevant Copy files • For each znn table create znn_cols_copy, znn_format_copy • Use Data::Dumper to store in zvars_copy.txt Perl – a Generic Cure For a Perennial Problem
Output Sum up To use: • do “zvars_copy.txt” • establish current $ztable • use glob * to get correct cols and format • write output Perl – a Generic Cure For a Perennial Problem
Input – Simplifying Variety Define to which type belongs current input: • Fields separated by delimiter – specify delimiter, name fields • Fields with fixed widths – specify names and widths • Record multiline – specify new record, field format • Others (cvs, Excel etc.) Perl – a Generic Cure For a Perennial Problem
File Type Definitions auth_kul_ext type fixed auth_kul_ext names d1:0-1;lnk:2-19;d2:20-40;val:41-70 ser_copy_data type fixed ser_copy_data names CNUM:8;LOCATION:6; ser_copy_data names COPY_NUM:5;SUPPLIER:12; item_data type delimited:; item_data names num;price;date cash_data type tag_multiline:rec=14;sep=2;tag=5;sep=3;val=* acq_order_data type tag_multiline:delim=> acq_order_data new_rec **STORE Perl – a Generic Cure For a Perennial Problem
Input Implementation • one module-object per input type • Each has the new constructor • Each has a process_infile function for actual input reading • A module ChooseInputFormat.pm selects the right object and return a pointer to it. • new just does some preparations based on the type • process_infile reads the actual file • for each input record – builds %input_hash and calls build_output() received as argument Perl – a Generic Cure For a Perennial Problem
Reading Delimited • names: n1;n2;n3;… @lst = split /;/, $names; ##map name to its position $ind = 0; foreach (@lst) { $lst_names{$_} = $ind; $ind++; } @vals = split /$delim/, $_; foreachmy $name (keys %lst_names) { ## take value of name from correct position my $val = $vals[$lst_names{$name}]; $input_hash{$name} = $val;} Perl – a Generic Cure For a Perennial Problem
Reading Fixed my @lst = split /\s*;\s*/, $names;my $field_num = 0; $line_pat = "";foreachmy $lst (@lst) {my ($name, $width) = split /:/, $lst; ## position per name $lst_name{$name} = $field_num++; ## unpack for reading $line_pat .= 'A' . $width . ' '; } my %input_hash = (); my @lst = unpack ($line_pat, $_);foreachmy $name (keys %lst_name) { $input_hash{$name} = $lst[$lst_name{$name}]; } Perl – a Generic Cure For a Perennial Problem
From In To Out Manipulation Out File In File input_hash zhash ??? Perl – a Generic Cure For a Perennial Problem
General Look Z30-DOC-NUMBER sequence[,1,9] Z30-ITEM-SEQUENCE sequence[docno,10,6] Z30-SUB-LIBRARY PERM_LOC[1:4][sub_lib_map] Z30-MATERIAL MATERIAL[material_map] Z30-ITEM-STATUS "09" Z30-OPEN-DATE INVENTORY_DATE[format_date] Z30-CALL-NO CALL_NO Z30-PRICE PRICE[decimal] Z30-NOTE-OPAC NOTE1 & “;” & NOTE2 Z30-NOTE-CIRCULATION NOTES[substring_on_word:200:1] Z30-NOTE-INTERNAL NOTES[substring_on_word:200:2] Perl – a Generic Cure For a Perennial Problem
Building zhash – Syntax 1 Z30-ITEM-STATUS "09" Build Z30-ITEM-STATUS from literal “09” Z30-CALL-NO CALL_NO Build Z30-CALL-NO from input field CALL_NO as is Z30-SUB-LIBRARY PERM_LOC[1:4][sub_lib_map] Build Z30-SUB-LIBRARY from input field PERM_LOC, take first 4 characters, then map result using sub_lib_map Z30-PRICE PRICE[decimal] Build Z30-PRICE from input PRICE, extract just the decimal portion Perl – a Generic Cure For a Perennial Problem
Building zhash – Syntax 2 Z30-NOTE-OPAC NOTE1 & “;” & NOTE2 To build Z30-NOTE-OPAC concatenate input NOTE1, then “;”, then input NOTE2 Z30-DOC-NUMBER sequence[,1,9] Build a sequence with delta 1, width 9, leading zeroes Z30-ITEM-SEQUENCE sequence[docno,10,6] Build a sequence with delta 10, width 6, leading zeroes, restart the sequence for each input docno with differen values. 3 input records with docno=500: 000010,000020,000030 2 input records with docno=732: 000010,000020 Perl – a Generic Cure For a Perennial Problem
Implementation - 1 Implementing the functions is done in 2 levels: • Group type • Using modifiers Types: • Just a literal • An input field – no modifiers • An input field with modifiers(s) • Concatenation (Each element may include modifiers) • Various sequences Perl – a Generic Cure For a Perennial Problem
Implementation - 2 For each output field we build an info hash, that includes: • the pointer to the function to call • the target • the source • other info – needed per specific function All hashes are pushed into a list Perl – a Generic Cure For a Perennial Problem
The Functional Backbone $f = “ZbuildOutFuncs::'$func”; $handle_funcs[$func_ind]{func} = \&{$f}; $handle_funcs[$func_ind]{target} = $target; $handle_funcs[$func_ind]{source} = $source; $handle_funcs[$func_ind]{…} = …; Perl – a Generic Cure For a Perennial Problem
Actually Building zhash foreachmy $handle (@handle_funcs) { my $res = $handle->{'func'}($handle, \%zhash, $input_hash); } Each function has something like: $target = $handle->{target}; ## What to build $source = $handle->{source}; ## How to build my $res; ………. evaluates… ## Build ………. $$zhash{$target} = $res; ## Prepare for output return $res; Perl – a Generic Cure For a Perennial Problem
Getting Values sub get_gen_field_val { my ($my_input_hash, $column) = @_; my ($org_val, $field, $modifiers); if (defined $$input_hash{$column}) { $org_val = $$input_hash{$column}; return $org_val; } if ($column =~ /^_\w+$/) { $org_val = $$zhash{$column}; return $org_val; } ($org_val, $field, $modifiers) = analyze_complex($column); $org_val = get_modified_val($org_val, $modifiers); return $org_val; } Perl – a Generic Cure For a Perennial Problem
Modifying a Value sub get_modified_val { my ($org_val, $modifiers) = @_; my $non_square = '([^\[\]]+)'; my @mods = $modifiers =~ /$non_square/go; foreach my $mod (@mods) { next if $mod =~ /^\s*$/; my ($type, $action) = $mod =~ /^(\w+):(.*)/; if ($type) { $org_val = $mod_functions{$type}->($org_val, $action); } else { exit_on_err(); } } return $org_val; } Perl – a Generic Cure For a Perennial Problem
Simple Modifier - uppercase sub mod_uppercase {my ($val, $action) = @_;returnuc $val;} What if $val is list ?? Perl – a Generic Cure For a Perennial Problem
Handling Lists sub handle_list {my ($val, $action, $func) = @_;my @val_lst;foreachmy $v (@$val) {my $res = $func->($v, $action);push @val_lst, $res; }return \@val_lst;} Perl – a Generic Cure For a Perennial Problem
Modifier with List sub mod_uppercase {my ($val, $action) = @_;return handle_list($val, $action, \&mod_uppercase) ifref $val; returnuc $val;} Perl – a Generic Cure For a Perennial Problem
regex Definition regex_check my_reg /\d{4}\D\d{4}/ Code sub build_regex_check { my ($name, $regex) = $_ =~ /(\w+)\s+(.*)/; my $code_src = "sub {my \$val=shift;my \$res=\$val=~$regex;return \$res}"; my $code = eval $code_src; } Perl – a Generic Cure For a Perennial Problem
Running Index We need several running indexes. Maybe the easiest way to do it: sub build_running_index{ my $cnt = 0; return sub { return ++$cnt; } } Each instance: • starts from 0 • increments by 1 Perl – a Generic Cure For a Perennial Problem
Using it $cnt1 = build_running_index();$cnt2 = build_running_index();$res1 = &$cnt1();$res1 = &$cnt1();$res2 = &$cnt2();$res1 = &$cnt1();print"c1: $res1 c2: $res2\n";sub build_running_index{ my $cnt = 0;returnsub {return ++$cnt; }}output: c1: 3 c2: 1 Perl – a Generic Cure For a Perennial Problem
More General • Different start • Different delta sub build_own_index_gen{ my ($cnt, $delta) = @_;returnsub {return $cnt += $delta; }} Perl – a Generic Cure For a Perennial Problem
Usage $cnt1 = build_own_index_gen(0, 3);$cnt2 = build_own_index_gen(5,10);$res1 = &$cnt1();$res2 = &$cnt2();$res1 = &$cnt1();$res1 = &$cnt1();print"c1: $res1 c2: $res2\n";sub build_own_index_gen{ my ($cnt, $delta) = @_;returnsub {return $cnt += $delta; }} output: c1: 9 c2: 15 Perl – a Generic Cure For a Perennial Problem
Timer - 1 We need several timers We also need a total timer sub get_time_diff { my $tls = time(); return sub { my $start = shift; my $tle = time(); my $tlt = $tle - $tls; $tls = $tle unless $start; return $tlt; } } Perl – a Generic Cure For a Perennial Problem
Timer Timer for differences sub get_time_diff { my $tls = time(); return sub { my $tle = time(); my $tlt = $tle - $tls; $tls = $tle; return $tlt; } } Perl – a Generic Cure For a Perennial Problem
Timer - Usage $t1 = get_time_diff(); $t2 = get_time_diff(); sleep(1); $r = $t1->(); $r1 = $t2->(); print “r: $r; r1: $r1\n"; sleep(1); $r = $t1->(); $r1 = $t2->(); print “r: $r1; r1: $r1\n"; Results: r: 1; r1: 1r: 1; r1: 1 Perl – a Generic Cure For a Perennial Problem
Timer 2 Timer with total sub get_time_diff { my $tls = time(); return sub { my $start = shift; my $tle = time(); my $tlt = $tle - $tls; $tls = $tle unless $start; return $tlt; } } Perl – a Generic Cure For a Perennial Problem
Timer 2 - Usage $t1 = get_time_diff(); $t2 = get_time_diff(); sleep(1); $r = $t1->(); $r1 = $t2->(1); print “r: $r; r1: $r1\n"; sleep(1); $r = $t1->(); $r1 = $t2->(1); print “r: $r1; r1: $r1\n"; Results: r: 1; r1: 1r: 1; r1: 2 Perl – a Generic Cure For a Perennial Problem
Timer We need several timers We also need a total timer sub get_time_diff { my $tls = time(); return sub { my $start = shift; my $tle = time(); my $tlt = $tle - $tls; $tls = $tle unless $start; return $tlt; } } Perl – a Generic Cure For a Perennial Problem