240 likes | 255 Views
Data quality. Stefano Grazioli. Last SQL homework due Fixed demo issues. Added a note to the homework text Easy Meter. Critical Thinking. What is Data Quality?. The degree to which data is suitable for a business purpose Accuracy, precision.
E N D
Data quality Stefano Grazioli
Last SQL homework due Fixed demo issues. Added a note to the homework text Easy Meter Critical Thinking
What is Data Quality? • The degree to which data is suitable for a business purpose • Accuracy, precision
The quality of the data stored in organizational databases is often poor • 10-25% of the records have inaccuracies or missing elements • Data frequently misinterpreted • Known data loss and theft • Most databases implement inconsistent definitions Source: T. Redman, Data Driven, 2008
Why is Data Bad? No one gets up in the morning and says“I’m going to make lots of errors today” - Cathy Bessant Source: T. Redman, Data Driven, 2008
Approaches to Data Quality Find and Fix Prevent at the source Do nothing
Homework Business Scenario:Google’s Daily Cagr
You are a financial analyst at a fintech firm Many of our customers invest for short amounts of time on Google. They sell their shares within a few weeks…. I wonder: do they make any money out of it? I am on it….. While you are at it… clean the data, first Consider it done.
You get a file with~1000customers whorecently boughtand sold GOOG. Three steps (and two homework) Clean data: phones, dates Compute Daily Cagr = [(final price/initial price)1/days ]-1 Report the Average Daily Cagracross all customers. Daily Cagr for Google
From:#2345348565 To:(234)-534-8565 Cleaning Phone Numbers
UML Activity Diagram - Daily Compound Average Growth of a Security (part I) When the user presses a button, a file selection windows pops out. The user selects a file. The file is shown starting at cell “A1”. The start button becomes invisible. Three more buttons appear: “Clean phone numbers”, “Format Dates”, and “Compute Daily CAGR”. A Next homework Next homework [Compute] [Format Dates] [Clean ph.no] Select the next phone no. Count its digits [Exactly 10 digits] Format as(xxx)-xxx-xxxx & clear highlight if any Highlight the cell in red A [No More Ph.No]
WINIT What Is NewIn Technology?
used in data quality and beyond Text manipulation
DimmyString AsString=“This is a sample string" DimmyString2 AsString = "s" DimmyChar AsChar = "s"c Strings and Characters
DimmyString AsString = "#2344-234-33-3" Dimtemp AsString = "" ForEach x AsCharIn myString If IsNumeric(x) Then temp = temp + x EndIf Next Testing Numbers
DimmyStrAsString = "This is a sample string" myStr= myStr.Insert(4, "xyz") myStr= myStr.Remove(4, 3) 'starting where,howmany myStr= myStr.Replace(" is", " was") Inserting and Removing
Composing text Dim sAsString = “4344562456” Dimtemp AsString temp = "Ph. " + s.Substring(0, 3) + " / "+s.Substring(3, 3) + " " + s.Substring(6, 4) ' The above is an example that produces this: ‘ Ph. 434 / 456 2456 ‘ or - same thing - temp = String.Format("Ph. {0} / {1} {2}", s.Substring(0, 3), s.Substring(3, 3), s.Substring(6, 4))
myLenght = myString.Length myNewString = myString.Trim() myNewString = myString.TrimEnd() myNewString = myString.TrimStart() myNewString = myString.PadLeft(50) myNewString = myString.PadRight(20) Trimming and Padding Total length of the result
' store the address of the current active sheet (the ‘target’) DimmyActiveS AsExcel.Worksheet = Application.ActiveSheet ' select a file DimmyFile AsString= Application.GetOpenFilename() 'get the data in a new temporary workbook Application.Workbooks.OpenText(myFile, , , Excel.XlTextParsingType.xlDelimited, , , , , True) ' store the address of the temporary workbook DimmyActiveWB As Excel.Workbook = Application.ActiveWorkbook ' copy the content from the temporary to the ‘target’ sheet myActiveS.Range("A1:J1000").Value = Application.ActiveSheet.Range("A1:J1000").Value ‘ close the temp workbook myActiveWB.Close() Reading a File into EXCEL
Dim lastRow As Integer lastRow = Cells(Rows.Count,1). End(Excel.XlDirection.xlUp). Row Finding the last non-empty row
Suggestions • Give yourself plenty of time