140 likes | 474 Views
Generalization from empirical studies. Tore Dybå: Session introduction (~20 min.) Erik Arisholm: Generalizing results through a series of replicated experiments on software maintainability (~20 min.) Jeff Carver: Methods and tools for supporting generalization (~20 min.)
E N D
Generalization from empirical studies • Tore Dybå: Session introduction (~20 min.) • Erik Arisholm: Generalizing results through a series of replicated experiments on software maintainability (~20 min.) • Jeff Carver: Methods and tools for supporting generalization (~20 min.) • Mini-group discussions (~10 min.) • Plenary discussion (~20 min.) ISERN Meeting, Noosa Heads, Queensland, Australia 14–15 November, 2005
Generalization from Empirical Studies in SE: Session Introduction Tore Dybå SINTEF ICT tore.dyba@sintef.no ISERN Meeting, Noosa Heads, Queensland, Australia 14–15 November, 2005
(Some of) the problem • Empirical SE research often generalizes about software organizations as if they were all alike, or refrains from generalizing at all, as if they were all unique: • In the first case, it is never really clear that findings about organizations actually sampled apply to organizations not sampled. • With respect to the second, is there really any point in studying software organizations if one does not believe that common denominators exist among relatively large classes of organizations? We must become more concerned about the conditions under which our research findings are valid if our work is to be applied more widely.
Generalization is closely related toconstruct validity and external validity Construct validity: • the degree to which inferences are warranted from the observed persons, settings, and cause and effect operations included in a study to the constructs that these instances might represent.* External validity: • the validity of inferences about whether the causal relationship holds over variations in persons, settings, treatment variables, and measurement variables.* *W.R. Shadish, T.D. Cook, and D.T. Campbell (2002) Experimental and Quasi-Experimental Designs for Generalized Causal Inference, Houghton Mifflin Company.
Statistical, sampling-based generalization • The statistician’s traditional two-step ideal of • the random selection of units for enhancing generalization; and • the random assignment of those units to different treatments for promoting causal inference; • is often advocated as the gold standard for empirical studies. However, this model is of limited utility for generalized causal inference in empirical SE because • it assumes that random selection and its goals do not conflict with random assignment and its goals; • it is rarely relevant for making generalizations about systems, tasks, settings, treatments and outcome variables; • ethical, political, logistical, and economical constraints often limit random selection to less meaningful populations.
The “painful” problem of induction • Hume’s truism: • In past experience, all tests have confirmedTheory 1. • Therefore, the next test will confirm Theory 1 or all tests will confirm Theory 1. “… induction or generalization is never fully justified logically. Whereas the problems of internal validity are solvable within the limits of the logic of probability of statistics, the problems of external validity are not logically solvable in any neat, conclusive way. Generalization always turns out to involve extrapolation into a realm not represented in one’s sample. Such extrapolation is made by assuming one knows the relevant laws.”* *D.T. Campbell and J.C. Stanley (1963) Experimental and Quasi-Experimental Designs for Research, Houghton Mifflin Company, p. 17.
Yin’s conception of generalization* theory rival theory Level-2 inference(Analytical) experimental findings population characteristics case study findings Level-1 inference(Statistical) sample subjects *R.K. Yin (2003) Case Study Research: Design and Methods, Third Edition, Sage Publications.
Lee and Baskerville’s framework* Generalizing to empiricalstatements Generalizing to theoreticalstatements Generalizingfrom empiricalstatements EE Generalizingfrom datato description ET Generalizingfrom descriptionto theory TE Generalizingfrom theoryto description TT Generalizingfrom conceptsto theory Generalizingfrom theoreticalstatements *A.S. Lee and R.L. Baskerville (2003) Generalizing Generalizability in Information Systems Research, Information Systems Research, 14(3):221-243.
Shadish, Cook, and Campbell*Five principles of generalized causal inference • Surface similarity: judging the apparent similarities between what was studied and the targets of generalization. • Ruling out irrelevancy: identifying those attributes of persons, settings, treatments, and outcome measures that are irrelevant because they do not change a generalization. • Making discriminations: making discriminations that limit generalization (e.g., from the lab to the field). • Interpolation and extrapolation: interpolating to unsampled values within the range of the sampled persons, settings, treatments, and outcomes and by extrapolating beyond the sampled range. • Causal explanation: developing and testing explanatory theories about the target of generalization. *W.R. Shadish, T.D. Cook, and D.T. Campbell (2002) Experimental and Quasi-Experimental Designs for Generalized Causal Inference, Houghton Mifflin Company.
Summary • Formal sampling-based methods are of limited use for generalizing from empirical SE studies. • specifically so for tasks, settings, treatments, and outcome measures • Additionally, there’s a dilemma between scientific validity (complying with Hume’s truism) and practical impact (applying a theory in a new organizational setting). • Although we should advocate the two-step model of random sampling followed by random assignment when it is feasible, we cannot advocate it as the model for generalized causal inference in SE. • So, SE researchers must use other concepts and methods to explore generalization from empirical SE studies. • In fact, most SE researchers routinely make such generalizations without using formal sampling theory. • In the rest of this session we will attempt to make explicit the concepts and methods used in such work. • We turn to examples of such alternative methods now …
Mini-group and plenary discussions • Form mini-groups with three persons – without leaving your chairs (first three, next three, etc.) • Discuss the following two questions in the mini-groups for ~10 minutes: • How do you generalize the results from YOUR studies? • How can you improve the validity of these generalizations? • Plenary discussion based on viewpoints from the groups