280 likes | 404 Views
Teaching Astronomical Data Analysis in Python. Prof. Joseph Harrington Department of Physics University of Central Florida Orlando, Florida. Outline. Teaching how to do research My class NumPy doc project My (better) class Examples of early assignments
E N D
Teaching Astronomical Data Analysis in Python Prof. Joseph Harrington Department of Physics University of Central Florida Orlando, Florida
Outline Teaching how to do research My class NumPy doc project My (better) class Examples of early assignments Future thoughts on teaching research students
Grad Aps and Career Prep • What do we need to do to prepare students for a research career? Want them to: • Be clear on research as a career • Show career direction • Field area • Experiment or theory? • Demonstrate research capacity, creativity, initiative • “Nice grades, but can she work?” YES! • Know enough to make good school choice • Contact future grad advisors or employers • The best way to prepare to do something is to do it!
Undergrad Research “Sequence” Y1-2: General tasks, 1-2 projects Y3: Specific task leads Y3-4: Talks at conferences, paper co-authorship Y4: Top students lead a paper (advisor's holy grail)
Preparing Students For Research • Need some basic skills to start: • computer programming • Data handling • Statistics & Error analysis • Modeling • Data presentation • Need to start before most classes taken! • One-on-one training by faculty is time-consuming • Many research supervisors not current on computational techniques • Need an intro course!
Joe's Pet Theory of Learning • Expert = knowledge base, not talent 10 years' study • What is accumulated? What's needed/important: • Survival, food, protection (big, now) • Create envisioned future (big, later) • Complete task (small, now) • Repeated (must be important) • Related to existing knowledge (more are better) • Educator's job: • Create sparse but firm knowledge base spanning field • All new knowledge encounterd connects, retained
Why do Classes Work? • Equate class success to survival • Learn best, but very risky • Know material needed for career/dream/success • Know material needed for test/project (task) • They don't! It's what they do, not what you say.
Class Design • Intro to programming (the right way!) • Intro to stats, error analysis, modeling (Bevington) • Monte Carlo coding exercises teach programming • Intro to astronomical data (Howell) • Data formation – photon emission, sky “bg”, PSF • Detectors, measurements, file formats • Photons to electrons, detector systematics • Photometry coding homework sequence • Project in photometry, real dataset, 100% own code • Intro to spectroscopy • Prerequisite: multivariate calculus, Astro 101 • Challenge: Get programming in early for HW!
Graduate Students • Also need this as a grad class • Many come to planetary sciences from other fields • Many not prepared in data analysis or detectors • Most have programmed some • Most have done error analysis (few understand it) • Few understand computational methods • BUT, we don't have enough students for 2 classes • Grad class shares undergrad lecture • Additional readings (papers, Numerical Recipes) • Grads do harder project in spectroscopy
V 1.0: Cornell, IDL • Excellent students, 4-6 per year • Worked like dogs, up to 15 hours/week HW • All who didn't drop in first week did very well • Top students at summer schools • Immediate, high-level work on, e.g., Mars Rovers • IDL well documented, widely used in astronomy • Books available • Expensive! • Quirky, encourages bad programming • Lacking many modern features • Unable to cut-n-paste a loop to the command line
Cornell->UCF, IDL->NumPy • I got a job! • New PhD program in Planetary Sciences at UCF • 51,000 undergrads, poorer, many FTIC, awful HSs • IDL expensive, encourages bad programming • NumPy • Open: you can look at the source (educational) • Free – students can use it forever, anywhere • Enforces good programming! • Code shorter than pseudocode (implicit loops) • NOT a data language, a real language doing data! • Designed for wrapping, and almost everything is • Can be used in cloud computing, clusters, etc.
V 1.99: UCF 2007, NumPy Students mixed, some Cornell-level, 6 students Removed 30% of class assignments Switched to NumPy NumPy docs not good, few books, poor reference Much larger codebase than IDL, but not integrated Students worked like dogs, 15 hours/week HW No undergrad finished the project (all grads did) Rescoped class in final weeks Graded on basis of 2008 plan All students well prepared for research (!)
Lessons • The better students were similar! All were capable. • Given Guide to NumPy, web site, mailing lists, me, sources, astro tutorial, encouraged to work together • BUT: • Often spent 2 hours to find and learn a simple routine • Students (and I) put heroic effort into class • All bombed, even experienced programmers • Function docs not even poor; mostly nonexistent • Grads and I learned NumPy by induction from IDL • Undergrads lost • What to do for 2008? • Must either have docs or go back to IDL (Nooooooo...)
If I can't find a reindeer, I'll make one instead.-T. Grinch, 1957 • Except I'm a pre-tenure prof • Research very active, time very constrained • Saw what happened to Travis O., expected same • Resources: • Community • Some surplus funding • Experience and skill as a communicator • About 8 months
Requirements - General Reference pages and manual Tutorial user guide Quick “Getting Started” guide Topic-specific user guides Examples that work “Live” collection of worked examples (cookbook) Tools to view and search docs interactively Variety of access methods (help(), web, PDF, etc.) Mechanisms for community input and management NumPy, then SciPy, then others
Requirements - Reference • Reference page per function/class (“docstrings”) • Thorough, complete, professional text • Accessible one level below expected user • Examples, crossrefs, math, images (limited) • Pages for general topics • Glossary • Slicing, broadcasting, indexing • Reference manual • Organized by topic areas • Index
General Timeline • Infrastructure • Doc wiki (and SVN link, stats, reviews, etc.) • Templates and standards • Indexing, math, image mechanisms in ReST • Produce text, PDF, HTML automatically • NumPy • Reference (functions/objects, topic pages) • Getting Started Guide • Tutorial • Reading tools beyond help(), browsers, PDF readers • SciPy (yikes)
Milestones NumPy is big – page triage, UCF course funcs first! 2008 Fall semester: NumPyI 50%+ to first draft 2009 Summer end: NumPyI 100% to first draft Early 2010: NumPyI Proofed 2010: SciPy...
Organization Leerless Feeder: jh – organization, plans, standards and processes, funding, whip, person who will die if this fails... Reliable and available expert (on UCF contract): Stéfan van der Walt – interface with SVN, PDF and HTML, standards and processes, wiki, chief writer, the one person I could dump on mercilessly... Emmanuelle Guillart – wiki, wiki hosting Pauli Virtanen – wiki, writing Perry Greenfield – numpy.doc pages About 40 grad/PhD-level volunteer doc writers!
How to pay? Shirts! • Write 1000 words or do equivalent work • Wiki • Reviewing • Proofing • Now or later (we'll mail) • Offer good until we withdraw it • Don't be an “expert” • Tell us the size you want • Not the list of all possible sizes you could wear!
Results – Summer 2008 Quality is high! Lots of examples, and they work. PDF doc has 309 printed pages Doc wiki is your best source for docstrings! docs.scipy.org
V 2.0: UCF, NumPy Now have docs! Simpler undergrad project Grads have old undergrad project Again mix of levels, 6 students Worked up to 9 hours/week on HW, most less All who tried, finished All did decent to excellent projects Major epiphany occurred in class! Students excited to learn, positive attitude Students well prepared to do research
Observations • NumPy (w/ docs) learned much faster than IDL • Students who already program • Do best in course • Work least in course • Hardest to teach good programming habits to • HUGE grading load: 15-60 min/student/week • How to scale to larger class? • Only fully grade some work • Peer evaluation (as HW assignment?) • No matter what happened, ultimate goal met • Doing is learning
What's Next? • Course requires calculus • Success predictor is programming • 1980s/90s students knew much more than today's • Freshmen need to do research, can't yet take course • Why not teach programming for science in year 1? • Basic interactive command line • Plot, visualize equations or data • Run routines on data • Basics of programming • Tell all their profs to assign vis. and calc probs • Students will build their knowledge in all classes