1.21k likes | 1.79k Views
Informatica Master Data Management (MDM). Topic 4: Load Process. Objectives. Following are the objectives of this topic: Configure Trust Configure Validation Rules Configure Relationships Configure Lookups Describe the Load Process. Trust. Dynamic Cell-level Survivorship
E N D
Objectives • Following are the objectives of this topic: • Configure Trust • Configure Validation Rules • Configure Relationships • Configure Lookups • Describe the Load Process
Trust • Dynamic Cell-level Survivorship • Base Object property • A mechanism for measuring the confidence factor associated with each cell based on its source system, change history, and other business rules • Defined at a column level for each contributing source system • Ensures that the most reliable data at the cell level is consolidated based on data characteristics
Trust • When two base object records merge: • MRM calculates the trust for each trusted column in the two base object records being merged • Cell with the highest values survive in the final merged record 2 Base Object records to merge: Calculate Trust: 62 56 71 37 Winners Survive:
Trust • When an update comes in from a source: • MRM calculates the trust on the incoming data and compares it to the trust of the data in the base object • Updates are only applied to the base object for cells that have higher trust on the incoming data Base object prior to update: 71 56 Data from staging table: 50 75 Base Object cells only updated where new data has higher trust weighting:
Trust • Trust is an option property for a base object column • If trust is switched off for a column, then the most recently updated value from any source is the survived value in the base object • Only switch on trust for a column if: • Two or more source systems contribute to the column • The sources are not deemed to be equally reliable providers of values to the column
Trust Trust Demo
Validation Rules • Defines a condition under which a data value is not valid • Base Object property • If the validation condition is met, then trust weighting is downgraded • Trust after validation downgrade is TRUST – (TRUST * downgrade_pct/100) • Reserve Minimum Trust can be set to avoid having trust scores below the minimum trust value x := TRUST – (TRUST * downgrade_pct/100) if x < MINIMUM_TRUST then x := MINIMUM_TRUST endif; • Validation check can be done on any column in a base object and Downgrade can be applied to any other columns in the base object
Validation Rules • Some examples of validation rules: • Downgrade trust on Last Name if length(last_name)<3 and last_name <> ‘NG’ • Downgrade trust on Middle Name if middle_name is null • Downgrade trust on Address Line 1, City, State, Zip, and Valid Address Ind if valid_address_ind = ‘False’
Validation Rules Validation Rules Demo
Relationships • Relationships are association between base objects via a matching column • Property of the Base Object • Types of relationships: • One to Many Relationship • Many to Many Relationship
Relationship • One-to-many Relationship • One table (the child) contains a foreign key column, which matches a unique key column of another table (the parent) • One-to-many relationships are always defined from the child table in the relationship (i.e. the referencing table rather than the referenced table).
Relationship • Many-to-many Relationship • A base object acts as an intersection table between another two base objects • The intersection table has a one-to-many relationship with the other two base objects
Lookups • Automatic Lookups • MRM automatically handles lookups loading/updating the primary key of a Base Object Staging Table for Customer data from CRM System (C_STG_CRM_CUST): Customer Base Object Customer Cross-Reference
Lookups • User Defined Lookups • For user-defined relationships, the corresponding lookups has to be manually configured • Lookups can be based on XREF table or an Unique Key in Base Object Staging Table for Address data from CRM System (C_STG_CRM_ADDR): Customer Base Object Customer Cross-Reference
Lookups • Shadow Foreign Key • The foreign key value stored on the cross-reference (X-ref) is the same as the value stored on the base object • This facilitates certain MRM internal processes on parent merge • However, it makes it difficult to tie child X-ref’s back to their original parent X-ref • Shadow foreign key is an additional column added to the X-ref for every foreign key defined on the base object • Contains the source system’s original foreign key value • Name of shadow foreign key column is S_FKColumnName, for example • Foreign key column name = Customer_ROWID • Shadow foreign key column name = S_Customer_ROWID
Lookups Shadow Foreign Key on XREF Staging Table for Address data from CRM System (C_STG_CRM_ADDR): Customer Base Object Customer Cross-Reference Address Base Object Address Cross-Reference
Load Process • Load process is a two-step process: • Apply Updates • Apply Inserts RegisterLOADjob STRIP_ON_LOAD_IND = 0 Process Updates STRIP_ON_LOAD_IND = 1 Tokenize STRIP_ON_LOAD_IND= 0 End LOAD job Process Inserts STRIP_ON_LOAD_IND = 1 Tokenize
Load Process • Updates • Load job applies updates for existing records whose LAST_UPDATE_DATE (Staging table) > SRC_LUD (XREF table) • The update process always updates the XREF table record • The update process may update the Base Object depending on trust: • For columns not flagged for trust, update happens if incoming data has new LUD • For columns flagged for trust, load job compares trust weightings of staging table data to trust weightings of existing data in base object to determine what can be updated • If history flag is switched on for the Base Object, then the update process writes to the history tables of Base Object and XREF
Load Process • Inserts • Load job applies inserts for records that do not exist in the XREF table • ROWID_OBJECT values are generated for the new records • New records are inserted into base object and XREF with CONSOLIDATION_IND = 4 • If history flag is switched on for the Base Object, then the insert process writes to the history tables of Base Object and XREF
Load Process • Rejects • Referential Integrity is maintained among base objects in the consolidated data model • Rejects will occur in the load process if any records violate the RI constraint • Parent records do not exist • Child records are loaded before the parent records • Lookup has been defined incorrectly • Rejected records are inserted in the reject table of Staging table staging_table_name_REJ
Objectives • Following are the objectives of this topic: • Match & Merge Overview • Match Rules Configuration • Exact Match/Search Strategy • Fuzzy Match/Search Strategy • Match Server Architecture
Match & Search Strategy Match Process
Match & Merge Overview • Challenges with identifying duplicate records • Misspellings, typing, and transcription errors • Nicknames • Synonyms • Abbreviations • Foreign and Anglicized words • Prefix and suffix abbreviations • Concatenation or splitting of words • Noise words and punctuation • Casing and character set variations
Match & Merge Overview • To merge or link records, MRM needs to know which records are likely duplicates of each other • Match rules tell MRM how to identify likely duplicates • Match rules also tell MRM if two matching records are similar enough to automatically merge/link, or if they should be reviewed by a data steward
Match & Merge Overview • Data Consolidation Options • Merging (merge-style base objects) physically combines the matched records in the base object. Makes the most-current best version of the truth (BVT) available • Linking (link-style base objects) quickly determines the BVT without physically combining the records. Provides much faster overall throughput
Match/Search Strategy • Exact • Does not allow for any variations in the data in the match columns • Very simple match process, therefore fast Fuzzy • Allows for variations in spelling, formats, word order, nicknames, synonyms, etc. • More complex match process, therefore slower
Match/Search Strategy High level process flow for the match process RegisterMATCHjob Fuzzy Fuzzy or Exact? Generate Keys Search for Match Candidates Exact Compare records to match against rest of records in base object Compare records to match against match candidates Populate match table with matched ids End MATCH job
Match Path • Match Path • A Match Path represents the base object which will provide data for matching purpose • Traverse the hierarchy between records across multiple base objects or within a single base object • Foreign Key Relationships between tables are used to traverse the relationships • Parent-to-child or child-to-parent relationships can be specified Match Path - Check for Missing Children • By default, MDM does an inner join between the base objects defined in the Match Path • The join therefore excludes rows that don’t have corresponding rows in the joined tables • To include those records, switch on “Check for Missing Children” – MDM will then do an outer join instead of an inner join
Match Path Match Path – Inter Table
Match Path Match Path – Intra Table
Match Column • A match column contains an identifying characteristic of the base object to be consolidated • Each base object can have multiple match columns • Examples: Full Name Generation Address Phone Email • Provider column(s) is the base object columns that provide the data for the match column: • Can be a single column or a concatenation of columns • Must be a VARCHAR / CHAR column to concatenate • Date column is also supported for matching
Match Column • Each match column is based on one or more columns • From base object • Or from X-ref (in some cases) • Or from child base object (in some cases) Customer: Would get false positive matches if matching just on Name Include Address attributes in the Match to reduce false positives Address:
Exact Match/Search Strategy • Steps for defining Exact Match Rules • Select Match/Search Strategy = Exact • Define Match Path • Define Match Columns • Create at least one Match Rule Set • Create Match Rules for Match Rule Set(s)
Exact Match/Search Strategy • Match Columns • A match column contains an identifying characteristic of the base object record to be consolidated • Exact Match Columns: • Does not make allowance for any variations in data content • Records will match if they have identical values in the match columns used in match rules
Exact Match/Search Strategy • Match Rule & Match Rule Set • Match Rules are grouped into Match Rule Sets • Can define multiple rule sets • Only one match rule set can be active at any point in time • Match rule defines the combination of columns that constitute a match Match Rule - Auto property • Match rules are flagged either for auto merge/auto link or for manual merge/link • Matches resulting from auto merge/auto link rules will result in the records being automatically merged/linked by the system when the auto merge/auto link batch job runs • Matches resulting from manual merge/link rules will be queued for review by a data steward
Exact Match/Search Strategy • Match Rule – Null Matching • By default, NULL is not regarded as being the same as NULL • NULL Matches NULL: Use this flag to specify the match columns in a match rule that should be regarded as matches even if the 2 values being compared are both NULL • NULL Matches non-NULL: Use this flag to specify the match columns in a match rule that should be regarded as matches when one of the values being compared is NULL and the other is not Data Example: In the above example the effects of Null Matching on the Generation column are shown
Exact Match/Search Strategy • Match Rule – Non-Equal Matching • Specifies that 2 records are a match if they do not have the same values in the non-equal match column • Reverses whatever would/would NOT have matched without Non-equal match • If using non-equal match, then MUST switch on Validate Matches property in Base Object Advanced Properties Data Example: • If non-equal match is used on the CRM_FLAG column to prevent 2 records from the CRM system from matching each other, then – • NULL=Y is a match • NULL=NULL is a match • Y=Y is not a match
Exact Match/Search Strategy • Match Rule – Segment Matching • Allows a match rule to be limited to a specific subset of data • Different match rules can use different segment values Data Example: • Use a Segment Match value of ‘ORG’ on Customer Type match column to create a match rule that only applies to Organizations. • Use a Segment Match value of ‘IND’ on Customer Type match column to create a match rule that only applies to Individuals.
Exact Match/Search Strategy Match Rule
Fuzzy Match/Search Strategy • Steps for defining Fuzzy Match Rules • Select Match/Search Strategy = Fuzzy • Choose a Population • Define Match Path • Define Match Key • Define Match Columns • Create at least one Match Rule Set & choose Search Level • Create Match Rules for Match Rule Set(s)
Fuzzy Match/Search Strategy • Population • Population is intended to addresses the name distribution problem • Common family names in each population skew the data and query performance e.g. Smith, Williams in English-speaking populations • Each population also has a large number of uncommon names that tend to have the most error and variability • Match needs to account for both of these situations in the way that the keys are built, to give optimal search performance for both • Defines how to identify matches within a particular population and language • Defines how to build keys and perform searches on name and address • Supports a specific set of match purposes
Fuzzy Match/Search Strategy Population
Fuzzy Match/Search Strategy • Match Key • Match key is used to search for match candidates • It is a fixed-length, compressed, and encoded value • Built from a combination of the words and numbers in a name or address • For one name or address, multiple SSA match keys are generated • Match Key Properties: • Key Type • Key Width • Path Component • Match Column Contents