1 / 20

Taming Statistics with Limited Domain Operators

Taming Statistics with Limited Domain Operators. Stephen Mansour, PhD University of Scranton and The Carlisle Group Dyalog ’14 Conference, Eastbourne , UK. Why another Statistical Package?. M any statistical software packages out there: Minitab, R, Excel, SPSS

marnie
Download Presentation

Taming Statistics with Limited Domain Operators

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Taming Statistics with Limited Domain Operators Stephen Mansour, PhD University of Scranton and The Carlisle Group Dyalog ’14 Conference, Eastbourne, UK

  2. Why another Statistical Package? • Many statistical software packages out there: Minitab, R, Excel, SPSS • Excel has about 87 statistical functions. 6 of them involve the t distribution alone: T.DIST T.INV T.DIST.RT T.INV.2T T.DIST.2T T.TEST • R has four related functions for each of 20 distributions resulting in a total of 80 distribution functions alone

  3. What does APL have that other Statistical package don’t? Defined Operators! • How can we exploit operators to reduce the explosive number of statistical functions? • Let’s look at an example . . .

  4. Planning Next Year’s Conference User Meeting • Typical attendance is about 100 delegates with a standard deviation of 20. • Assume next year’s conference centre can support up to130 delegates. • What are the chances that next year’s attendance will exceed capacity?

  5. Let’s implement this in Excel: =1-NORM.DIST(130,100,20,TRUE) Now let’s use R-Connect in APL: +#.∆r.x 'pnorm(⍵,⍵,⍵,⍵)' 130 100 20 0 Wouldn’t it be nice to enter: 100 20normal probability > 130 100 20(normal probability >)130

  6. APL Syntax showingdata, functions, operators normalprobability<1.64 100 20 normalprobabilitybetween 110 130 5 0.5 binomialprobability=2 7 tDistcriticalValue< 0.05 5 chiSquarerandomVariable 13 meanconfidenceInterval X (SEX='F')proportionhypothesis≥ 0.5 GROUPA meanhypothesis= GROUPB variancetheoreticalbinomial 5 0.2

  7. Statistics deals primarily with three types of functions: • Summary Functions • Descriptive Statistics • Probability Distributions • Theoretical Models • Relations

  8. Summary Functions • Summary functions are of the form: • They produce a single value from a vector. • Structurally they are equivalent to g/ where g is a scalar function and the right argument is a simple numeric vector. • A statistic is a summary function of a sample; a parameter is a summary function of a population.

  9. Examples of Summary Functions • Examples • Measures of central tendency: mean, median, mode • Measures of Spread variance, standard deviation, range , IQR • Measures of Position min, max, quartiles, percentiles • Measures of shape skewness, kurtosis

  10. Probability Distributions • Probability Distributions are functions defined in a natural way when they are called without an operator: • Discrete: probability mass function • Continuous: density function • Left argument is parameter list • Right argument can be any value taken on by the distribution. • Probability Distributions are scalar with respect to the right argument.

  11. Probability Distributions (Discrete)

  12. Probability Distributions (Continuous)

  13. Relational Functions • Relational functions are dyadic functions whose range is {0,1} • 1=relation is satisfied, 0 otherwise. • Examples: < ≤ = ≥ > ≠ ∊ between←{¯1=×/×⍺∘.-⍵}

  14. Limited-Domain Operators • By limiting the domain of an operator to one of the previously-defined functional classifications, we can create an operator to perform statistical analysis. • For a dyadic operator, each operand can be limited to a particular (but not necessarily the same) functional classification.

  15. Limited Domain Operators

  16. This is about design and syntax, not implementation • Most functions and operators can easily be written in APL. • Internals not important to user • R interface can be used if necessary for statistical distributions. • Correct nomenclature and ease of use is critical.

  17. Data Representation A sample can be represented by raw data, a frequency distribution, or sample statistics. The following items are interchangeable as arguments to the limited domain operators above: • Raw data: Vector • Frequency Distribution: Matrix • Summary Statistics: PropertySpace

  18. Examples of Data Representation D 2 0 3 4 3 1 0 2 0 4 ⎕←FT←frequencyD 0 3 1 1 2 2 3 2 4 2 mean D 1.9 variance D 2.5444 PS←⎕NS '' PS.count←10 PS.mean←1.9 PS.variance←2.544 Matrix: Frequency Distribution Namespace: Sample Statistics

  19. Implementation • )LOAD TamingStatistics • All APL version • )LOAD TamingStatisticsR • Third party – Must install R (Free)

  20. Conclusion • There are many statistical packages out there; some, like R can be used with APL • Operator syntax is unique to APL • R can be called directly from APL using RCONNECT, but APL operator syntax is easier to understand.

More Related