1 / 31

CCPR Computing Services More Efficient Programming July 13, 2006

CCPR Computing Services More Efficient Programming July 13, 2006. Outline. Thinking through a programming task Ways of efficiently documenting and organizing your project Naming variables, programs, files Commenting code Including file header Implementing directory structure

lela
Download Presentation

CCPR Computing Services More Efficient Programming July 13, 2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CCPR Computing ServicesMore Efficient ProgrammingJuly 13, 2006

  2. Outline • Thinking through a programming task • Ways of efficiently documenting and organizing your project • Naming variables, programs, files • Commenting code • Including file header • Implementing directory structure • Programming constructs • Raw data -> finished product: are your results replicable?

  3. Before you start coding… • Think • Clearly define the problem in writing • Write down the solution/algorithm in English • Modularity • Create test (if reasonable) • Translate one section to code • Test the section thoroughly • Translate/Test next section, etc.

  4. Documentation - File Header • Each do-file/program/file you create should include: • Your name • Project name • Project location • Date • Software Version • Purpose of program • Inputs, Outputs • Special Notes

  5. Naming Files, Variables, and Functions • Use language standard (if it exists) • Be aware of language-specific rules • Max length, underscore, case, reserved words • Differentiating log files: • Programs MergeHH.sas, MergeHH.do • Log files MergeHHsas.log, MergeHHsta.log • Meaningful variable names: • LogWt vs. var1 • AgeLt30 vs. x • Procedure that cleans missing values of Age: • fixMissingAge • Matrix multiplication X transpose times X • matXX

  6. Commenting Code • Good code is self-commenting • Naming conventions, structure/formatting, header should explain 95% • Comments should explain • Purpose of code, not every detail • Tricks used • Reasons for unusual coding • Comments do not • fix sloppy code • translate syntax • If it takes longer to read the comment than to read the code, don’t add a comment!

  7. Commenting Code - Stata example Compare formatting, comments, variable name and function names SAMPLE 2 *Convert names in dataset to lowercase. program deflowerVarNames foreach v of varlist _all { local LowName = lower("`v'") if `"`v'"' != `"`LowName'"' { rename `v' `=lower("`v'")' } } end SAMPLE 1 program def function1 foreach v of varlist _all { local x = lower("`v'") if `"`v'"' != `"`x'"' { rename `v' `=lower("`v'")' } } end

  8. Directory Structure • A project consists of many different types of files • Use folders toseparate files in a logical way • Be consistent across projects if possible • ATTIC folder for older versions

  9. ** Paths: global parentpath "C:\Documents and Settings\piersol\Summer06\prog\progtips" global pgmsloc "$parentpath\pgms" global logsloc "$parentpath\logs" global cleandataloc "$parentpath\data\clean" global rawdataloc "$parentpath\data\raw" capture log close log using "$logsloc\test200607", text replace ********************************************************************* *INSERT FILE HEADER HERE...then it’s included in log file. ********************************************************************* macro list webuse union, clear save "$rawdataloc\union.dta", replace *keep idcode year age grade save "$cleandataloc\unionLJP.dta", replace log close Stata example: using directory structure

  10. Programming Constructs • Tools to simplify and clarify your coding • Available in virtually all languages • Constructs • Loops - for, foreach, do, while • If/elseif/else– if, then, else, case • continue • exit

  11. Loop Example 1 • Problem: Given 4 indicator variables (south, union, black, not_smsa) and 2 discrete variables (age, grade), generate 8 new indicator variables: • south_age21 = south and age > 21, • south_gr12 = south and grade > 12 • Similarly for union, black, not_smsa • Solution without loop • 8 lines of code similar to: • generate newvar = (south==1 & age>21 & age<.) • generate newvar = (south==1 & grade>12 & grade<.) • Solution with loop foreach j in south union black not_smsa { gen `j'_age21 = (age>21 & age<. & `j'==1) gen `j'_gr12 = (grade>12 & grade<. & `j'==1) }

  12. Loop Example 1, cont. *CHECK GENERATED VARIABLES AGAINST ORIGINAL VARIABLES foreach j in south union black not_smsa { qui count if `j'==1 & age>21 & age<. local origCount = r(N) qui count if `j'_age21==1 if `origCount' ~= `r(N)' { display "Counts do not match for `j'_age21!" } else display "Counts match for `j'_age21." qui count if `j'==1 & grade>12 & grade<. local origCount = r(N) qui count if `j'_gr12==1 if `origCount' ~= `r(N)' { display "Counts do not match for `j'_gr21!" } else display "Counts match for `j'_gr21." }

  13. Loop Example 2 • Given indicator variables white, black, other, and continuous variable educyrs, create interaction variables • Solution using loop: local allraces "white black other" foreach race of varlist `allraces' { generate `race'_educ=`race'*educyrs }

  14. Loop Example 3 • Problem: • Dataset contains variables over multiple years (1970-1990) • Need to perform a number of commands separately for 1970, 1975, 1980, 1985. • Solution without loop bysort year: command1 if year==70 | year==75 | year==80 | year==85 bysort year: command2 if year==70 | year==75 | year==80 | year==85 • Solution with loop foreach year in 70 75 80 85 { di as result "***Regression for year = `year':" regress ln_wage grade tenure ttl_exp if year==`year' di as result "***Summarize for year = `year':" summarize ln_wage if year==`year' }

  15. Loop Example 4 – pulling from 2 lists • From Stata FAQ website Code: local agrp "cat dog cow pig" local bgrp "meow woof moo oinkoink" local n : word count `agrp' forvalues i = 1/`n' { local a : word `i' of `agrp' local b : word `i' of `bgrp' di "`a' says `b'" } Resulting output: cat says meow dog says woof cow says moo pig says oinkoink

  16. Constructs - If/then/else • Execute section of code if condition is true: ifconditionthen {execute this code if condition true} end • Execute one of two sections of code: ifconditionthen {execute this code if condition true} else {execute this code if condition false} end

  17. If/Else Example • Problem: need to execute commands on an operating system, but only if the os is Unix…the commands will fail if os is anything else • Solution: if "`c(os)'"~="Unix" { di as err "Sorry; this section requires Unix OS." } else { ** continue with unix commands… }

  18. Constructs - Elseif/case • Elseif - Execute one of many sections of code: ifcondition1then {execute this code if condition1 true} elseifcondition2 then {execute this code if condition2 true} else {execute this code if condition1, condition2 are all false} end • Case- same idea, different name casecondition1 then {execute this code if condition1 true} case condition2 then {execute this code if condition2 true} etc.

  19. Elseif Example • Problem: Continue example from if…else, but execute different section of code for Unix, Windows, and Mac • Solution: if "`c(os)'"=="Unix" { di "This is a Unix environment" } else if "`c(os)'" == "Windows" { di "This is a Windows environment" } else if "`c(os)'" =="MacOSX" { di "This is a MacOS” environment." } else { di as err "`c(os)' not recognized." }

  20. Stata- If command vs. if qualifier • ifcmd was designed to be used with a single expression • Example: • Given variable x with 5 observations: 1, 1, 2, 1, 3, • Compare the following three pieces of Stata code: if x==2 { replace x=99 } if x==1 { replace x=99 } replace x=99 if x==2

  21. Stata- If command vs. if qualifier

  22. Constucts -- Continue Example from Stata online help • Continue is used to exit current iteration of loop and continue with next iteration • The following two loops produce the same result: forvalues x = 1/10 { if mod(`x',2)==1 { display "`x' is odd" continue } display "`x' is even" } forvalues x = 1/10 { if mod(`x',2)==1 { display "`x' is odd" } else { display "`x' is even" } }

  23. Constructs – Exit • Stop execution of program • Examples: • Do-file contains a number of data checks followed by analysis commands. If data checks reveal something unacceptable, you can exit out of do-file before running analysis. • Program requires user input. If user enters “bad” information, need to quit program. • Debugging. If particular error occurs then break. • Check denominator prior to dividing. If equals zero, exit.

  24. Raw data to finished product Raw data Analysis data Runs/results Finished product

  25. Raw Data -> Analysis Data • Always have two distinct data files- the raw data and analysis data • A program should completely re-create analysis data from raw data • NO interactive changes!! Final changes must go in a program!!

  26. Raw Data -> Analysis Data • Document all of the following: • Outliers? • Errors? • Missing data? • Changes to the data? • Remember to check- • Consistency across variables • Duplicates • Individual records, not just summary stats • “Smell tests”

  27. Analysis Data -> Results • All results should be produced by a program • Program should use analysis data (not raw) • Have a “translation” of raw variable names -> analysis variable names -> publication variable names

  28. Analysis Data -> Results • Document- • How were variances estimated? Why? • What algorithms were used and why? Were results robust? • What starting values were used? Was convergence sensitive? • Did you perform diagnostics? Include in programs/documentation.

  29. Log files • Your log file should tell a story to the reader. • As you print results to the log file, include words explaining the results • Include not only what your code is doing, but your reasoning and thought process • Don’t output everything to the log-file- use quietlyand noisily in a meaningful way.

  30. Project Clean-up • Create a zip file that contains everything necessary for complete replication • Use a readme.txt file to describe zip contents • Delete/archive unused or old files • Include any referenced files in zip • When you have a final zip archive containing everything- • Open it in it’s own directory and run the script • Check that all the results match

  31. Questions/Feedback

More Related