Deep Thoughts after < 15 minutes of reflection

Deep Thoughts after < 15 minutes of reflection Michael Franklin Jim Frew

The Big Question: Why are we here? Two Primary Answers: • There is a sea change in the way that science will be done. • Processing on a world-wide ever increasing collection of experimental data • From primary collecting to “data mining” • We need to reason about, assemble, manage, and avoid re-doing loosely coupled, long-running, world wide computations. 2

Some Major Issues from Day 1 • (re) creation vs. validation/explanation • Data vs. Process • Talmud vs. (Kosher) Sausage Factory • The “S” Word… 3

The “Bogus”(?) Distinction • Derivation • Provenance • Lineage • Annotation • Pedigree • (Re) creation vs. validation • Is the DPLAP Executable? 4

Subject, Object, Verb? • Which should be our focus? • Edges • Boxes • Both of the above • What flows along the edges? • Data • Control • Events • All of the above 5

Talmud vs. (Kosher) Sausage Factory • We seem to have two very different (perhaps irreconcilable?) modes of operation. • e.g.,Bioinformatics Databases – Continual Accretion of knowledge. • e.g., Physics Experiments – Composable DAGs/GRAPHS/Pipelines leading towards “data products” • Differences: • Scale: Time, Resources, … • Human involvement vs. automation • How desirable, necessary is each? 6

The Great Thing About Standards is… Core + Extensibility Framework: • Keys for Objects (i.e., Identifiers) • Granularity • Equivalence and Similarity • Are these application/scientist-dependent? • Common Terminology/Data Integration • Versioning and Schema Evolution • Need to Choose: • Declarative vs. Procedural • Strongly Typed? 7

What else? 8

Deep Thoughts after < 15 minutes of reflection