How is Biology as a Science Possible?

How is Biology as a Science Possible? Edward R. Dougherty Department of Electrical and Computer Engineering, Texas A&M University Computational Biology Division, Translational Genomics Research Institute Department of Bioinformatics and Computational Biology, M. D. Anderson Cancer Center gsp.tamu.edu

Kant’s Question Immanuel Kant (Prolegomena to Any Future Metaphysics, 1783): “How is metaphysics as a science possible?” Clearly metaphysical talk is possible. The fundamental issue is twofold: What is science? What constraints must be put on metaphysical thinking to make it a science? gsp.tamu.edu gsp.tamu.edu

Our Question How is biology as a science possible? Clearly, talk about biology (living organisms) is possible. The fundamental issue is twofold: What is science? What constraints must be put on biological thinking to make it a science? gsp.tamu.edu gsp.tamu.edu

Everyday Versus Scientific Thinking Everyday categories of thought: Informal in their meaning and criteria for truthfulness Subjective Naïve belief in the intelligibility of the “real world” Scientific categories of thought: Formal in their meaning and criteria for truthfulness Intersubjective Do not assume the intelligibility of the “real world” gsp.tamu.edu gsp.tamu.edu 10/24/2019

Newton on Science Isaac Newton (Principia): “Our purpose is only to trace out the quantity and properties of this force from the phenomena, and to apply what we discover in some simple cases as principles, by which, in a mathematical way, we may estimate the effects thereof in more involved cases… We said, in a mathematical way, to avoid all questions about the nature or quality of this force.” gsp.tamu.edu gsp.tamu.edu 10/24/2019

James Jeans (Mysterious Universe): “The final truth about phenomena resides in the mathematical description of it; so long as there is no imperfection in this, our knowledge is complete. We go beyond the mathematical formula at our own risk; we may find a [nonmathematical] model or picture which helps us to understand it, but we have no right to expect this. Constitution of Scientific Knowledge gsp.tamu.edu gsp.tamu.edu 10/24/2019

Morris Kline: “The insurgent seventeenth century found a qualitative world whose study was aided by mathematical abstractions. It bequeathed a mathematical, quantitative world that subsumed under its mathematical laws the concreteness of the physical world… What science has done, then, is to sacrifice physical intelligibility for the sake of mathematical description and mathematical prediction.” The Sacrifice of Intelligibility gsp.tamu.edu gsp.tamu.edu 10/24/2019

Richard Feynman: “It is whether or not the theory gives predictions that agree with experiment. It is not a question of whether a theory is philosophically delightful, or easy to understand, or perfectly reasonable from the point of view of common sense. Prediction Grounds Scientific Knowledge gsp.tamu.edu gsp.tamu.edu 10/24/2019

Radical Empiricism Denies Knowledge Hans Reichenbach (Rise of Scientific Philosophy): “A mere report of relations observed in the past cannot be called knowledge. If knowledge is to reveal objective relations of physical objects, it must include reliable predictions. A radical empiricism, therefore, denies the possibility of knowledge.” A collection of measurements, together with statements about the measurements, is not scientific knowledge. gsp.tamu.edu gsp.tamu.edu 10/24/2019

Scientific Biological Knowledge is Possible Scientific biological knowledge is constituted in mathematics and its truthfulness is assessed via experimental predictions. Mathematics: Provides a formal structure for relations. Provides a formal structure for truthfulness: quantitative conclusions. Provides a framework for ground truth: statistical analysis of experiments. Intersubjective. Does not assume the intelligibility of the “real world.” gsp.tamu.edu gsp.tamu.edu 10/24/2019

gsp.tamu.edu Epistemology of Classification • To give this meaning requires taking into account several factors: • Classification Rule • Feature Selection • Error Estimation Rule • Feature-label Distribution • Epistemologically, we must have a classifier model to have meaning.

gsp.tamu.edu gsp.tamu.edu Everyday Classification • Some algorithm is proposed. • The algorithm separates some data set. • We are not told the distribution from which the data come. • An estimation rule is used to estimate the “error.” • We are given no reason why the estimate should be good. • In fact, often we expect that the estimate is not good. • The estimate is small and the algorithm is claimed to be “validated.” • We are given no justification for the claim. • We are given no conditions under which it is valid. 10/24/2019

gsp.tamu.edu Classifier Model • A classifier model is a pair M = (ψ, εψ) composed of a classifierψ: Rd → {0, 1} and an errorεψ [0, 1]. • M is valid for the feature-label distribution FX,Y to the extent that εψ εψ,F, the true error of ψ on FX,Y. • Validity is measured by |εψ εψ,F|. • Problem: If we know εψ,F, then the obvious thing to do is use the model M = (ψ, εψ,F). • Dougherty, E. R., and U. Braga-Neto, “Epistemology of Computational Biology: Mathematical Models and Experimental Prediction as the Basis of Their Validity,” Biological Systems, 14 (1), 65-90, 2006.

gsp.tamu.edu Rule Model • In practice, a classifier model is built from a rule model L = (, Ξ) and a finite set S: • S is a sample from the feature-label distribution. •  is a classification rule yielding ψ = (S). • Ξ is an error estimation rule yielding εψ = Ξ(S). • Ξ could use all of S, or S could be split and Ξ uses test data. • Validity is based on the deviation distribution, that is, the distribution of Ξ(S)  εΨ(S),F. • For instance, ES[|Ξ(S)  εΨ(S),F|2], the MSE (or RMS). • MSE is decomposed into bias plus deviation variance. • Validity is on L is via ES[|Ξ(S)  εΨ(S),F|].

gsp.tamu.edu gsp.tamu.edu Error Estimation Using Training Data • Cross-Validation: Error estimated by iteratively leaving out points, testing on the deleted points, and averaging. • Approximately unbiased: Expectation[CV estimate  error]  0. • But this is not determinative if the variance is large, which it is for small samples. • Bootstrap: Select n points with replacement, design on selected points, estimate error on remaining points, iterate and average: • .632 bootstrap = 0.632  bootstrap + 0.368  resubstitution 10/24/2019

Lack of Regression for Error Estimation Vardev[est] = σ2est + σ2tru  2ρσest σtru LDA, n = 50, 295-microarrary breast cancer data set. Dougherty, E. R., Sima, C., Hua, J., Hanczar, B., and U. M. Braga-Neto, “Performance of Error Estimators for Classification,” Current Bioinformatics, 5, 53-67, 2010. gsp.tamu.edu gsp.tamu.edu 10/24/2019

Negative Correlation – Analytic Multinomial distribution using exact correlation (analytic) Zipf model Correlation versus sample size LOO – dashed Resubstitution – solid Braga-Neto, U. M., and E. R. Dougherty, “Exact Correlation between Actual and Estimated Errors in Discrete Classification,” Pattern Recognition Letters, 31, 407-413, 2010. gsp.tamu.edu gsp.tamu.edu 10/24/2019

No Excuses Error standard deviations for LDA in 1D Gaussian model, n = 20, as a function of |2  1| (Bayes error). True error – sold black Resubstitution – red dots Leave-one-out – blue dashes Ned Glick (1978): “I shall try to convince you that one should not use this modification of the counting estimator” Glick, N., “Additive Estimators for Probabilities of Correct Classification,” Pattern Recognition, 10, 211-222, 1978. gsp.tamu.edu gsp.tamu.edu 10/24/2019

Analytic Results – Not Ad Hoc Unbiased bootstrap weight – analytic. 2D Gaussian model with equal covariance matrix. gsp.tamu.edu gsp.tamu.edu • Vu, T., Braga-Neto, U. M., Sima, C., and E. R. Dougherty, “Unbiased Bootstrap Error Estimation for Linear Discriminant Analysis,” in revision, 2011.

Sola Fides Since virtually nothing is known about the error estimators used in thousands of papers, one can only conclude that scientific epistemology is being replaced by Faith Alone. Scientific categories are being replaced by everyday categories in a rush to make science egalitarian, so that anyone cam discuss Nature in everyday terms. Why endure the rigors of mathematics, statistics, and experimental design, when faith alone will do? gsp.tamu.edu gsp.tamu.edu 10/24/2019

gsp.tamu.edu gsp.tamu.edu Is Knowledge Possible? • The scientific meaning of a classifier and its error estimate relate to the properties of the error estimator. • Choice 1: Estimate population density – difficult. • Choice 2: Distribution-free error bounds – useless. • Answer: Model-Based Analysis • Knowledge is possible with proper epistemology. 10/24/2019

gsp.tamu.edu gsp.tamu.edu Accuracy of Error Estimation • Reporting an error estimate without a characterization of its accuracy is meaningless. • Absent a characterization of accuracy, an error estimation rule is simply a computation without meaning. • One way to proceed is to provide a bound on RMS • Model-free bounds are much too loose for small samples – and rarely exist. • Model-based bounds are necessary. 10/24/2019

Distribution-Free Bound for LOO Consider the multinomial distribution with b cells, the histogram classification rule, and leave-one-out. A bound exists, but it is useless. For n = 100, RMS 0.435. gsp.tamu.edu gsp.tamu.edu 10/24/2019

gsp.tamu.edu gsp.tamu.edu Model-Based Bounds • A model class is assumed. • A specific model is determined by parameters: , , ,… • For instance, a Gaussian model depends on mean vectors and covariance matrices. • An RMS bound for sample size n is given by the maximum RMS over all parameter values – find required sample size for acceptable RMS. • The more we assume known about the model class, the smaller the RMS bound. 10/24/2019

Gaussian Distribution Bounds for LOO Gaussian model with known covariance and LDA, then RMS is known exactly in terms of model parameters. RMS as a function of the Bayes error (or |1  0|). Sample sizes: n = 20, 40, 60: gsp.tamu.edu gsp.tamu.edu • Zollanvari, A., Braga-Neto, U. M., and E. R. Dougherty, “Analytic Study of Performance of Error Estimators for Linear Discriminant Analysis,” IEEE Transactions on Signal Processing, 2011. 10/24/2019

gsp.tamu.edu Bayesian MMSE Error Estimator • Since one must make assumptions to use an error estimator a small sample, why not use the MMSE estimator? • Assume a prior distribution π(θ) on feature-label distribution. • Given θ and a sample Sn, obtain true error εn(θ, Sn). • Posterior distribution: π*(θ| Sn) = π*(θ). • Bayesian estimator: εbay(Sn) = Eθ[εn(θ, Sn)|Sn] = Eπ*[εn]. • Dalton, L., and E. R. Dougherty, “Bayesian Minimum Mean-Square Error Estimation for Classification Error – Parts I and II,” IEEE Trans. on Signal Processing, 59(1), 115-144, 2011.

gsp.tamu.edu Properties of Bayesian Error Estimator • Minimizes Eθ,S[|εn(θ, Sn)  ,g(Sn)|2] over all measurable g. • Optimal on average over θ, not necessarily for any particular θ. • Unbiased when averaged over all . • Closed form for discrete classification. • Generalized beta priors and arbitrary bin size – Part I. • Closed form for linear classification in Gaussian model. • For general covariance matrices, π*(Σ) possesses an inverse-Wishart distribution – Part II.

Knowledge Discovery Knowing the constitution of scientific knowledge and how to validate it leaves open the question of how to discover knowledge. Obviously, we need to observe Nature, but in what manner. gsp.tamu.edu gsp.tamu.edu 10/24/2019

Experimentation Versus Groping Francis Bacon (Novum Organum, 1620): “[Accidental experience is] a mere groping, as of men in the dark, that feel all round them for the chance of finding their way…. But the true method of experience, on the contrary, first lights the candle, and then by means of the candle shows the way; commencing as it does with experience duly ordered and digested, not bungling or erratic.” gsp.tamu.edu gsp.tamu.edu 10/24/2019

Experimental Design: The Path of Progress Immanuel Kant (Critique of Pure Reason, 1786): “Reason must approach Nature… [as] a judge who compels witnesses to reply to those questions which he himself thinks fit to propose. To this single idea must the revolution be ascribed, by which, after groping in the dark for so many centuries, natural science was at length conducted into the path of certain progress.” gsp.tamu.edu gsp.tamu.edu • Scientific experiment is not mere observation; it is methodological observation. • Model represents knowledge that the scientist brings to the table.

An Experiment is a Question Hans Reichenbach (Rise of Scientific Philosophy): “An experiment is a question addressed to Nature….As long as we depend on the observation of occurrences not involving our assistance, the observable happenings are usually the product of so many factors that we cannot determine the contribution of each individual factor to the total result.” gsp.tamu.edu gsp.tamu.edu 10/24/2019

Foolish Questions Yield Foolish Answers Arturo Rosenblueth and Norbert Wiener:“An experiment is a question. A precise answer is seldom obtained if the question is not precise; indeed, foolish answers – i.e., inconsistent, discrepant or irrelevant experimental results – are usually indicative of a foolish question.” gsp.tamu.edu gsp.tamu.edu 10/24/2019

gsp.tamu.edu What Can We Expect from Feature Selection? • Top: Regression of selected Feature Set error on best Feature Set error. • Bottom: Regression of best Feature Set error on selected Feature Set error. • Sima, C., and E. R. Dougherty, “What Should One Expect from Feature Selection in Small-Sample Settings,” Bioinformatics, 22 (19), 2430-2436, 2006.

gsp.tamu.edu gsp.tamu.edu Data Mining • Data mining is a return to pre-Galilean groping, albeit, at a much faster groping rate than was then possible. • It suffers from three debilitating properties: • It does not ask precise questions. • It lacks experimental design – does not light a candle. • It lacks a characterization of prediction in the context of a feature-label (or other) distribution. • Sometimes it is “justified” by large sample theory, typically absent a rigorous analysis to the problem at hand. 10/24/2019

Shooting Sparrows With Cannons Ronald A. Fisher (1925): “Little experience is sufficient to show that the traditional [large sample] machinery of statistical processes is wholly unsuited to the needs of practical research. Not only does it take a cannon to shoot a sparrow, but it misses the sparrow! … Only by systematically tackling small sample problems on their merits does it seem possible to apply accurate tests to practical data.” gsp.tamu.edu gsp.tamu.edu 10/24/2019

Non-systems Biological Science: Oxymoron Conrad Waddington (How Animals Develop, 1935): “To say that an animal is an organism means in fact two things: firstly, that it is a system made up of separate parts, and secondly, that in order to describe fully how any one part works one has to refer either to the whole system or to the other parts.” gsp.tamu.edu gsp.tamu.edu 10/24/2019

The Creative Principle Albert Einstein: “Experience, of course, remains the sole criterion for the serviceability of mathematical constructions for physics, but the truly creative principle resides in mathematics.” Ipso facto, ones scientific creativity is limited by ones mathematical knowledge. Waddington’s characterization implies that the creation of significant biological knowledge absent knowledge of stochastic systems theory is a vain hope. gsp.tamu.edu gsp.tamu.edu

Closing Comment Albert Einstein: “Science without epistemology is – insofar as it is thinkable at all – primitive and muddled.” gsp.tamu.edu gsp.tamu.edu

Reference gsp.tamu.edu gsp.tamu.edu 10/24/2019

How is Biology as a Science Possible?