280 likes | 485 Views
Literate programming with SAS - and other languages. Søren Højsgaard Faculty of Agricultural Sciences Aarhus University Denmark SASforum, May 2009, Copenhagen. Take-home message. Literate programming: Combining text, code and results in one document Supports text formats:
E N D
Literate programming with SAS- and other languages Søren Højsgaard Faculty of Agricultural Sciences Aarhus University Denmark SASforum, May 2009, Copenhagen A A R H U S U N I V E R S I T E T Faculty of Agricultural Sciences
Take-home message • Literate programming: Combining text, code and results in one document • Supports text formats: • LaTeX / OpenOffice (OpenDocument Text) • In combination with the ’engines’ • SAS, R, S-plus, Maple, Stata, … • Ensures reproducibility of analysis • Great help in ”recalling what I did 2 months ago” • StatWeave does all this – and is free… • This talk: Focus on StatWeave with OpenOffice and SAS/R …
Source document Writing SAS statements More writing R statements Even more writing More SAS statements More writing… Final document Writing SAS statements SAS output SAS graphics More writing R statements R output Even more writing SAS statements SAS output More writing… Overview – Combining code, documentation and results
What is literate programming • Knuth (1979) coined the term literate programming: • Create software as works of literature: • Embed source code into descriptive text (rather than the opposite which is common practice) • Software should follow flow of thoughts and logic • Should be designed to be readable by humans (and not only by compilers / programs). • Very useful idea in statistics…
Why literate programming? • Reproducible statistical analysis • Research, consulting • Document exactly what has been done • Possible to re-run if data change • Manuals, course notes etc. • Shown output guaranteed to be result of shown code
Some systems for literate programming • Comments inside code • WEB (Knuth 1979) and friends • Sweave (Lesich 2002) • R code in LaTeX documents • odfWeave (Kuhn and Coulter 2007) • R code in OpenOffice documents • SASweave (Lenth and Højsgaard 2007) • SAS / R code in LaTeX documents • StatWeave • SAS / R / maple / S-plus / Stata … code in LaTeX and OpenOffice documents
StatWeave • StatWeave created by Russ Lenth, University of Iowa, USA • Available: http://www.cs.uiowa.edu/~rlenth/StatWeave/ • StatWeave is in its making, but becomming ”mature” and stable. • Statweave design goals • Support many languages • R, S-plus, SAS, Stata, Maple, … • Support different word processing systems, currently • LaTeX • OpenDocument Text (ODT) www.openoffice.org • Portability: Usable on all platforms (Written in JAVA) • Extendible: • Add other languages
Under the hood of StatWeave • Source file is regular text document but with code chunks added (with special tags) • Two basic operations • Weaving: Process source file into single document with code listings, output listings, graphs… • Tangling: Extract code from source file to run later • Weaving is useful for reproducible statistical analysis
Running StatWeave • Command-line interface:statweave SAS-HelloWorld-swv.odt statweave --tangle SAS-HelloWorld-swv.odtstatweave --keepall SAS-HelloWorld-swv.odt • Graphical User Interface: • Generally, source xxx-swv.odt becomes output xxx.odt
Chicken weight data • Set global options (for SAS code) • Inline evaluation of expressions
… chicken weight data • Output can be saved for later use • - and display
Code reuse and argument substitution • Save code chunks for later execution • Pass arguments to code chunks • Simplest case: Not unlike a macro…
…code reuse and argument substitution • Costumize display and output (tables) by reusable code chunk
Multi-language example: SAS, R and DOS together • Can use different engines in the same source file • Use SAS when appropriate; use R when appropriate; use Maple when appropriate… • Weaving: • SAS/R/XX chunks assembled into separate code files. • Code files are processed in order of first appearence in the source file
…Multi-language example: SAS, R and DOS together • Synchronization issue: SAS chunk depends on data from R chunk which depends on data from SAS chunk…. • Solution: The restart option will restart the engines
Code chunks are processed as a whole • Code chunks are processed as a ”unit” so in general one can not split a call to proc xxxx over several chunks: • Thus the following is illegal
Odds and ends – Maple • Differentiate y= sin(x) xxx • Output is ugly, but it reads:
Odds and ends – calling the shell • Want to list all StatWeave / Open office source files: *-swv.odt
Summary • Reproducible statistical analyses • Integrate text, code and results in one document • Several text formats • Several languages • This talk (and the examples) are avaiable at http://genetics.agrsci.dk/~sorenh/misc/ • All credit is due to Russ Lenth, the creator of StatWeave. Thanks!!!!