200 likes | 340 Views
Taming Statistics with Limited Domain Operators. Stephen Mansour, PhD University of Scranton and The Carlisle Group Dyalog ’14 Conference, Eastbourne , UK. Why another Statistical Package?. M any statistical software packages out there: Minitab, R, Excel, SPSS
E N D
Taming Statistics with Limited Domain Operators Stephen Mansour, PhD University of Scranton and The Carlisle Group Dyalog ’14 Conference, Eastbourne, UK
Why another Statistical Package? • Many statistical software packages out there: Minitab, R, Excel, SPSS • Excel has about 87 statistical functions. 6 of them involve the t distribution alone: T.DIST T.INV T.DIST.RT T.INV.2T T.DIST.2T T.TEST • R has four related functions for each of 20 distributions resulting in a total of 80 distribution functions alone
What does APL have that other Statistical package don’t? Defined Operators! • How can we exploit operators to reduce the explosive number of statistical functions? • Let’s look at an example . . .
Planning Next Year’s Conference User Meeting • Typical attendance is about 100 delegates with a standard deviation of 20. • Assume next year’s conference centre can support up to130 delegates. • What are the chances that next year’s attendance will exceed capacity?
Let’s implement this in Excel: =1-NORM.DIST(130,100,20,TRUE) Now let’s use R-Connect in APL: +#.∆r.x 'pnorm(⍵,⍵,⍵,⍵)' 130 100 20 0 Wouldn’t it be nice to enter: 100 20normal probability > 130 100 20(normal probability >)130
APL Syntax showingdata, functions, operators normalprobability<1.64 100 20 normalprobabilitybetween 110 130 5 0.5 binomialprobability=2 7 tDistcriticalValue< 0.05 5 chiSquarerandomVariable 13 meanconfidenceInterval X (SEX='F')proportionhypothesis≥ 0.5 GROUPA meanhypothesis= GROUPB variancetheoreticalbinomial 5 0.2
Statistics deals primarily with three types of functions: • Summary Functions • Descriptive Statistics • Probability Distributions • Theoretical Models • Relations
Summary Functions • Summary functions are of the form: • They produce a single value from a vector. • Structurally they are equivalent to g/ where g is a scalar function and the right argument is a simple numeric vector. • A statistic is a summary function of a sample; a parameter is a summary function of a population.
Examples of Summary Functions • Examples • Measures of central tendency: mean, median, mode • Measures of Spread variance, standard deviation, range , IQR • Measures of Position min, max, quartiles, percentiles • Measures of shape skewness, kurtosis
Probability Distributions • Probability Distributions are functions defined in a natural way when they are called without an operator: • Discrete: probability mass function • Continuous: density function • Left argument is parameter list • Right argument can be any value taken on by the distribution. • Probability Distributions are scalar with respect to the right argument.
Relational Functions • Relational functions are dyadic functions whose range is {0,1} • 1=relation is satisfied, 0 otherwise. • Examples: < ≤ = ≥ > ≠ ∊ between←{¯1=×/×⍺∘.-⍵}
Limited-Domain Operators • By limiting the domain of an operator to one of the previously-defined functional classifications, we can create an operator to perform statistical analysis. • For a dyadic operator, each operand can be limited to a particular (but not necessarily the same) functional classification.
This is about design and syntax, not implementation • Most functions and operators can easily be written in APL. • Internals not important to user • R interface can be used if necessary for statistical distributions. • Correct nomenclature and ease of use is critical.
Data Representation A sample can be represented by raw data, a frequency distribution, or sample statistics. The following items are interchangeable as arguments to the limited domain operators above: • Raw data: Vector • Frequency Distribution: Matrix • Summary Statistics: PropertySpace
Examples of Data Representation D 2 0 3 4 3 1 0 2 0 4 ⎕←FT←frequencyD 0 3 1 1 2 2 3 2 4 2 mean D 1.9 variance D 2.5444 PS←⎕NS '' PS.count←10 PS.mean←1.9 PS.variance←2.544 Matrix: Frequency Distribution Namespace: Sample Statistics
Implementation • )LOAD TamingStatistics • All APL version • )LOAD TamingStatisticsR • Third party – Must install R (Free)
Conclusion • There are many statistical packages out there; some, like R can be used with APL • Operator syntax is unique to APL • R can be called directly from APL using RCONNECT, but APL operator syntax is easier to understand.