200 likes | 465 Views
Data Preparation for Analytics Using SAS. Gerhard Svolba, Ph.D. Reviewed by Madera Ebby, Ph.D. What is the purpose of this book?. Introduces the reader to data preparation Why data preparation is not only important but a must prior to data analysis
E N D
Data Preparation for AnalyticsUsing SAS Gerhard Svolba, Ph.D. Reviewed by Madera Ebby, Ph.D.
What is the purpose of this book? • Introduces the reader to data preparation • Why data preparation is not only important but a must prior to data analysis • From data preparation process to data analytics
The Analysis Path: From raw data to results that can be implemented
The Analysis Path: From raw data to results that can be implemented Good Results ` Clever Modeling Adequate Preparation Data availability
Four Dimensionsfor Analytic Data Preparation Business and Process Knowledge Analytical Knowledge Analytic Data Preparation Efficient SAS coding Documentation and Maintenance
Business question: How did students who met the provincial standard in grade 3 perform in grade 6? • Generates many other questions • Work with people in other departments such as IT to carry out a data analytic process
Why is this author qualified or not qualified to address this topic? • He is an experienced SAS user as exemplified in the many Macros • He addresses issues by presenting examples from different background
What are the strengths or weaknesses of this book? • The book is written clearly and is easy to read • Provides the reader with a lot of examples of codes, input and outputs
Would you recommend this book? If so, who would you recommend it to and for what purpose? • Those who prepare data marts for statistics or data mining or time series analyses • Those who provide data used in creating data marts IT and data warehousing • Both new and experienced SAS users who perform data analyses using data marts • Those who prepare data in relational databases with SQL
Does the book achieve its purpose? Absolutely! It enables one to: • Understand the business environment in which data preparation occurs • Extract and structure your data • Create derived variables from different tables • Program SAS in an efficient way
What is the best tip or technique addressed in this book? • There are many new techniques that I learnt from this book. For example: • Examine the mean scores for math by board mident
Continued… • Procmeansdata=datalib.boards noprintnway; • class board_mident; • var Math_score; • outputout=datalib.aggr_static(drop=_type_ _freq_) • Mean= Sum= N= STD= MIN= MAX= /Autoname; • run;
Continued… • To run analysis by board_mident, we use a CLASS statement. A BY statement could also be used but data would have to be sorted by board_mident • NWAY suppresses grand total mean and all other totals so that output data contains only rows for 5 boards which are the analysis subjects • The NOPRINT in order to suppress the printed output from the log, which can be thousands of descriptive measures even for a small sample of 5 observations • In the OUTPUT statement we specify the statistics that will be calculated . The AUTONAME option creates the new variable names in the form of VARIABLENAME_ STATISTIC • If we want to calculate different statistics for different input variables we can specify it on the OUTPUT statement: e.g SUM(VARIABLE)=sum_variable • In the OUTPUT statement we drop the _TYPE_ and _FREQ_vaiables, although we could keep the _FREQ_ and omit N from the statistics list. • Chapter 18, Multiple Interval-Scaled Observations per subject, page 183.
Are there other books (or sources of information) available with similar content? • Yes, but tend to present bits and pieces of information • E.g. Resources on the internet • The Little SAS Book by Delwiche and Slaughter • If so, how does this book compare? • Comprehensive, well illustrated presentation of material