260 likes | 439 Views
Strategies for solving scientific problems using computers. Outline. Motivation A standard framework Which tool to use? Critical considerations Aftermath. Motivation. Most (all?) problems in modern geoscience benefit strongly from computer methods
E N D
Outline • Motivation • A standard framework • Which tool to use? • Critical considerations • Aftermath
Motivation • Most (all?) problems in modern geoscience benefit strongly from computer methods • A good hypothesis warrants a clear analytical approach • Make large problems more tractable • Avoid a posteriori rationalizations as much as possible • Encourage predictions rather than diagnoses • Our scientific thought process must be defensible,and so should our methodology
Overarching framework for a scientific problem Review existing research Formulate a hypothesis Collect new data Where will most of your time/energy be spent? Process this data Interpretation Evaluate and present your hypothesis
A sub-framework for computer-based problems Load data within chosen work environment Format this data Process this data Visualize results Scientific interpretation
What relevant tools exist? • Many options are free or free-ish • Overlapping functionality • Many are user-extendable
Finding the best tool for the job • Are you already familiar with it? • Can it already do what you need it to do, or is it conceivable that it could do so after some effort? • Is it easy (enough) to learn? • Is it intuitive? • Is it fast enough? • Does it support the command line, a GUI or both? • Can you understand what it’s doing, or is it a black box? • Does it have sufficient mathematical functionality? • Does it have sufficient mapping functionality? • Can it easily generate reproducible output? • Is it popular within your field? • Can its output be shared easily? • Is it affordable and accessible?
Which tool to use for geoscience? • MATLAB, Python, GMT and ArcGIS are the best current options
A sub-framework for computer-based problems Load data within chosen work environment Format this data Process this data Visualize results Scientific interpretation
A directory structure for computer-based problems research/code/ current_project data (raw) mat (formatted) your code fig (useful not pretty) old (no need to delete)
Incidentally, a similar manuscript structure research/manuscript/ current_paper draft (versioned) fig (pretty) master document revised (basically inevitable) final (proofs, published)
Loading data • Load all necessary data first • This step can be (but is rarely actually) a deal-breaker • If someone or something generated it, you can almost certainly read it • A question that will keep coming up:How often will you need to do this? • The answer is almost always: Much more often than you think • A valuable habit: Spend the time to record data loading (i.e., not just ad hoc in the command line) and sourcing • Save the MATLAB/etc.-formatted data before processing
How often will you need to do this? • Only once, I swear: • command line and save • import data using GUI and save • Every time I want to do this analysis: • Write it down and comment • Often and with lots of data: • Time to consider how to make it faster • So often that other people will have to do it for me: • Consider writing a GUI, which enforces standardization
Format data • Data structures to use in descending order of preference: • scalar • vector • matrix • structure/object • cell
Numeric vs. logical vs. string • Several different data types to consider • numeric (MATLAB defaults to double precision signed) • string • logical (true/false)
Most (all?) data are imperfect • NaN: Not a Number
Poor variable names • data • index • constant • var • test • temp • i, j • any name identical to or confusingly similar to an existing function name • do not abuse case sensitivity • names that are not descriptive: you will forget what “A” means
Processing data • Document what not how even you think it’s just for you, because you are your own worst enemy • Re-use shamelessly, but avoid copy/paste • Is this a function or a script? Will you re-use it often? • The other kind of MATLAB cell
Visualizing data • Physically separate visualization code • Visualize as you’re writing, but not as you’re running • Again, use cells
MATLAB is trying to help you (similar to Word) • Code could be better • Code is wrong
Whitespace and indentation • Choose a style and stick with it No Better
Order of operations • Forget it exists and use parentheses instead Never Better
Aftermath • In the long term, do not keep failed/commented code in a working function/script • Getting complicated and/or popular? Consider a versioning scheme or repository, e.g., Github or RunMyCode