120 likes | 254 Views
Finding Information. A337/A523. What are some of the possible problems with finding information?. What are some of the possible problems with finding information?. Information is often lacks STRUCTURE
E N D
Finding Information A337/A523
What are some of the possible problems with finding information?
What are some of the possible problems with finding information? • Information is often lacks STRUCTURE • ASSOCIATION between the identifying information (i.e., labels and the actual information is not always obvious) and the data • CONSISTENCY is not always present. E.g., • 317-274-0185 • (317)274-0185 • 3172740185 • May later need to MANIPULATE data (filter, sorting, etc.)
Typical “Office” Applications • Word Processing • Spreadsheet • Database Management System (DBMS)
Spreadsheets and DBMSes • Information often lacks STRUCTURE • ASSOCIATION between the identifying information (i.e., labels and the actual information) is not always obvious • CONSISTENCY is not always present. E.g., • 317-274-0185 • (317)274-0185 • 3172740185 • May later need to MANIPULATE data (deeper search, sorting, etc.) • Columns (labels) • Rows (“instance” or record) • Intersection (value)
Spreadsheets • Information often lacks STRUCTURE • ASSOCIATION between the identifying information (i.e., labels and the actual information) is not always obvious • CONSISTENCY is not always present. E.g., • 317-274-0185 • (317)274-0185 • 3172740185 • May later need to MANIPULATE data (deeper search, sorting, etc.) Tables in MS Excel
DBMSes • Information often lacks STRUCTURE • ASSOCIATION between the identifying information (i.e., labels and the actual information) is not always obvious • CONSISTENCY is not always present. E.g., • 317-274-0185 • (317)274-0185 • 3172740185 • May later need to MANIPULATE data (deeper search, sorting, etc.) Tables in MS Access • Table is one of many objects in a database • Easier to associate tables than in a spreadsheet (i.e., vlookup) • Tables have several unique properties we’ll discuss later
ERP Systems • Information often lacks STRUCTURE • ASSOCIATION between the identifying information (i.e., labels and the actual information) is not always obvious • CONSISTENCY is not always present. E.g., • 317-274-0185 • (317)274-0185 • 3172740185 • May later need to MANIPULATE data (deeper search, sorting, etc.) Centralized database eliminates the need to associated data located on separate systems
Data Quality: What is Dirty Data? • It happens when the UPC code on a package doesn't match the item. • Causes? Vendor-Unique product code and cost Retailer-Unique product code and price
Data Quality: What is Dirty Data? Potential Problems? • Inventory Reorder • Profit per unit Net profit • Customer Satisfaction • Repeat Business • Angry Bloggers Solution: Same code for vendor and retailer Data Integrity: Wal-Mart's Dirty Secret
Extract, Transform, Load (ETL) From Computerworld QuickStudy