110 likes | 470 Views
Oracle Enterprise Data Quality. The Phrase Profiler. The Phrase Profiler. Used alongside the Parse processor to profile: Names, addresses, product descriptions etc. Provides a quick way to build classification lists. E.g : Titles: Mr, Mrs, Ms, Miss, Dr.
E N D
Oracle Enterprise Data Quality The Phrase Profiler
The Phrase Profiler • Used alongside the Parse processor to profile: • Names, addresses, product descriptions etc. • Provides a quick way to build classification lists. E.g: • Titles: Mr, Mrs, Ms, Miss, Dr. • Countries: UK, USA, France, Germany. • Product Categories: Dairy, Frozen, Bread, Meat, Fruit & Veg. • Assesses data to understand which parsing rules to apply.
Common Words and Phrases • Example: names and addresses: Identified words and phrases Number of occurrences Locations of words and phrases
Identify Misplaced Data • ‘Mr’ is stored in wrong attribute: • On investigating...
Identify and Manage Ambiguities • ‘Victoria’ might be classified as a given name. • ‘Victoria Centre’ might be classified as a valid building.
What is Reference Data? (Recap) • Tables of data stored within Enterprise Data Quality. • Can be used to store lists of any data used in project. • E.g. patterns, valid data, invalid data, characters. • Often used to check and improve working data. • Optional lookup column.
Capture Reference Data • Create or add to lists of terms in reference data. • You can add to your lists iteratively. • Then use within the Parse processor.
Lab Overview • Lab 1: Profiling Textual Data: • Create a Project. • Create a Data Store. • Create a Snapshot. • Create a Process. • Use the Phrase Profiler. • Adjust Phrase Profiler Options.