240 likes | 260 Views
In this homework, you will clean data of Google investors to compute the Average Daily Cagr. Learn about data quality, UML diagrams, text manipulation techniques, and Excel functions. Focus on accuracy and precision to improve data suitability for analysis.
E N D
Data quality Stefano Grazioli
Last SQL homework due Fixed demo issues. Added a note to the homework text Easy Meter Critical Thinking
What is Data Quality? • The degree to which data is suitable for a business purpose • Accuracy, precision
The quality of the data stored in organizational databases is often poor • 10-25% of the records have inaccuracies or missing elements • Data frequently misinterpreted • Known data loss and theft • Most databases implement inconsistent definitions Source: T. Redman, Data Driven, 2008
Why is Data Bad? No one gets up in the morning and says“I’m going to make lots of errors today” - Cathy Bessant Source: T. Redman, Data Driven, 2008
Approaches to Data Quality Find and Fix Prevent at the source Do nothing
Homework Business Scenario:Google’s Daily Cagr
You are a financial analyst at a fintech firm Many of our customers invest for short amounts of time on Google. They sell their shares within a few weeks…. I wonder: do they make any money out of it? I am on it….. While you are at it… clean the data, first Consider it done.
You get a file with~1000customers whorecently boughtand sold GOOG. Three steps (and two homework) Clean data: phones, dates Compute Daily Cagr = [(final price/initial price)1/days ]-1 Report the Average Daily Cagracross all customers. Daily Cagr for Google
From:#2345348565 To:(234)-534-8565 Cleaning Phone Numbers
UML Activity Diagram - Daily Compound Average Growth of a Security (part I) When the user presses a button, a file selection windows pops out. The user selects a file. The file is shown starting at cell “A1”. The start button becomes invisible. Three more buttons appear: “Clean phone numbers”, “Format Dates”, and “Compute Daily CAGR”. A Next homework Next homework [Compute] [Format Dates] [Clean ph.no] Select the next phone no. Count its digits [Exactly 10 digits] Format as(xxx)-xxx-xxxx & clear highlight if any Highlight the cell in red A [No More Ph.No]
WINIT What Is NewIn Technology?
used in data quality and beyond Text manipulation
DimmyString AsString=“This is a sample string" DimmyString2 AsString = "s" DimmyChar AsChar = "s"c Strings and Characters
DimmyString AsString = "#2344-234-33-3" Dimtemp AsString = "" ForEach x AsCharIn myString If IsNumeric(x) Then temp = temp + x EndIf Next Testing Numbers
DimmyStrAsString = "This is a sample string" myStr= myStr.Insert(4, "xyz") myStr= myStr.Remove(4, 3) 'starting where,howmany myStr= myStr.Replace(" is", " was") Inserting and Removing
Composing text Dim sAsString = “4344562456” Dimtemp AsString temp = "Ph. " + s.Substring(0, 3) + " / "+s.Substring(3, 3) + " " + s.Substring(6, 4) ' The above is an example that produces this: ‘ Ph. 434 / 456 2456 ‘ or - same thing - temp = String.Format("Ph. {0} / {1} {2}", s.Substring(0, 3), s.Substring(3, 3), s.Substring(6, 4))
myLenght = myString.Length myNewString = myString.Trim() myNewString = myString.TrimEnd() myNewString = myString.TrimStart() myNewString = myString.PadLeft(50) myNewString = myString.PadRight(20) Trimming and Padding Total length of the result
' store the address of the current active sheet (the ‘target’) DimmyActiveS AsExcel.Worksheet = Application.ActiveSheet ' select a file DimmyFile AsString= Application.GetOpenFilename() 'get the data in a new temporary workbook Application.Workbooks.OpenText(myFile, , , Excel.XlTextParsingType.xlDelimited, , , , , True) ' store the address of the temporary workbook DimmyActiveWB As Excel.Workbook = Application.ActiveWorkbook ' copy the content from the temporary to the ‘target’ sheet myActiveS.Range("A1:J1000").Value = Application.ActiveSheet.Range("A1:J1000").Value ‘ close the temp workbook myActiveWB.Close() Reading a File into EXCEL
Dim lastRow As Integer lastRow = Cells(Rows.Count,1). End(Excel.XlDirection.xlUp). Row Finding the last non-empty row
Suggestions • Give yourself plenty of time