Session 1: Overview of Quantitative Research Methods in Innovation Studies

13.15-15.00 December 10 2012 For Survey of Quantitative Research, NORSI Session 1: Overview of Quantitative Research Methods in Innovation Studies Taehyun Jung taehyun.jung@circle.lu.se CIRCLE, Lund University

Contents • Motivation – data analytic trends • Qualitative v. Quantitative Research • Empirical research design • Validity & Reliability • Structure and Elements of Empirical Research • Research process • Example • Research Question • Data

Motivation

Motivation > Data Analytic Trends in strategic management

Qualitative v. Quantitative Research

Qualitative Research • aims at understanding. It answers primarily to how? –questions • interpretive approach to data, studies `things' within their context and considers the subjective meanings that people bring to their situation • Case studies • Cf. “the method does not imply any particular form of data collection - which can be qualitative or quantitative” (Yin 1993) Quantitative Research • aims at (causal) explanation. It answers primarily to why? –questions • statistical, quantitative research methods and analysis • Social surveys and experiments Complementary - not contradictory • different kinds of research questions and objects of research • different perspectives on the same research objects / questions (methodological triangulation) Qualitative v. Quantitative research

Based on the idea that social phenomena can be quantified, measured and expressed numerically. The information about a social phenomenon is expressed in numeric terms that can be analyzed by statistical methods. The observations can be directly numeric information or can be classified into numeric variables. The quantitative method

Quantitative research • ...and Weaknesses. • Simplifies and “compresses” the complex reality: abstract and constrained perspective. • Only applicable for measurable (quantifiable) phenomena • Presumes relatively extensive knowledge on the subject matter in order to be able to ask “correct” questions. • Difficult to study processes or “dynamic” phenomena: produces static view of the reality • Description of actors’ perspectives, intentions and meanings difficult. • Strengths... • Enables the research and description of social structures and processes that are not directly observable. • Well-suited for quantitative description, comparisons between groups, areas etc. • Description of change. • Analysis and explanation of (causal) dependencies between social phenomena.

What is going on (descriptive research)? • E.g. social, innovation indicators • to describe the invention rate in a country, to examine trends over time or to compare the rates in different countries • Good description provokes the `why' questions of explanatory research Why is it going on (explanatory research)? • focuses on why questions • why the invention rate is as high as it is, why some types of invention are increasing or why the rate is higher in some countries than in others? • Answering the `why' questions involves developing causal explanations. Causal explanations argue that phenomenon Y (e.g. income level) is affected by factor X (e.g. gender). Most research includes both description and explanation Description and explanation Source: De Vaus, D. (2001). Research design in social research: SAGE Publications Ltd.

Three types of causal relationships Source: De Vaus, D. (2001). Research design in social research: SAGE Publications Ltd.

Correlation and causation: • There is a correlation between the number of fire engines at a fire and the amount of damage caused by the fire (the more fire engines the more damage) • Is it therefore reasonable to conclude that the number of fire engines causes the amount of damage? • Clearly the number of fire engines and the amount of damage will both be due to some third factor - such as the seriousness of the fire Prediction and causation: • Knowing the type of school attended improves our capacity to predict academic achievement. • But this does not mean that the school type affectsacademic achievement. Predicting performance on the basis of school type does not tell us why private school students do better. • Good prediction does not depend on causal relationships. Nor does the ability to predict accurately demonstrate anything about causality. causation Source: De Vaus, D. (2001). Research design in social research: SAGE Publications Ltd.

While we can observe correlation we cannot observe cause. We have to infer cause. • These inferences are `necessarily fallible . . . [they] are only indirectly linked to observables' (Cook and Campbell, 1979: 10). • Because our inferences are fallible we must minimize the chances of incorrectly saying that a relationship is causal when in fact it is not. One of the fundamental purposes of research design in explanatory research is to avoid invalid inferences. Adopting a sceptical approach to explanations • scientific knowledge must always be provisional (Popper) • rather than seeking evidence that is consistent with our theory we should seek evidence that provides a compelling test of the theory • strategies for doing this: • eliminating rival explanations of the evidence • deliberately seeking evidence that could disprove the theory We have to infer cause Source: De Vaus, D. (2001). Research design in social research: SAGE Publications Ltd.

If A then B. B is true. Therefore A is true If A [or C, or D, or E, or F, or . . .] then B. We observe B. Therefore A [or C, or D, or E, or F, or . . .] is true “There always may be an unthought-of explanation” • The more alternative explanations that have been eliminated and the more we have tried to disprove our theory, the more confidence we will have in it, but we should avoid thinking that it is proven Think of the alternative hypotheses and avoid the logical fallacy of affirming the consequent Source: De Vaus, D. (2001). Research design in social research: SAGE Publications Ltd.

Empirical research design

Logical structure of the research (data). “The function of a research design is to ensure that the evidence obtained enables us to answer the initial question as unambiguously as possible.” (David de Vaus: Research Design in Social Research, 2001) • given this research question (or theory), • what type of evidence is needed to answer the question (or test the theory) in a convincing way? Empirical support for practically any hypothesis can usually be obtained by manipulating data. Good research design prevents this kind of manipulative use of data by taking into account possible alternative explanations and enabling comparisons and judgments between them. Research design

Validity: are conclusions true? • Degree to which you are truly measuring what you intend to measure • Does the instrument measure what it is meant to measure? • An instrument can be reliable, but not valid • Example: Measure anxiety with the temperature readings on a thermometer • If an instrument is valid, it must also be reliable Reliability: can findings be repeated? • If the design of a research study is reliable, then its findings should be repeatable, replicable, generalizable • Can the study be replicated? • Will the research yield stable, consistent results when applied repeatedly? • Example: “How many books have you borrowed this year?” • A study in which this is an important question might be unreliable - subjects likely will not recall the exact number, will guess different numbers at different times • "repeatability" or "consistency". Validity and Reliability

The center of the target: the concept that you are trying to measure Shots (dots) = observation Validity and Reliability Source: http://www.socialresearchmethods.net/kb/relandval.php

As developed by Campbell (1957), Campbell & Stanley (1963), Cook & Campbell (1979), with very minor changes in Shadish, Cook & Campbell (2002) • Internal Validity • Statistical Conclusion Validity • Construct Validity • External Validity Each of the validity types has prototypical threats to validity—common reasons why we are often wrong about each of the four inferences. Campbell’s Validity Typology

did the treatment affect the outcome (Campbellian)? • whether observed covariation between A (the presumed treatment) and B (the presumed outcome) reflects a causal relationship from A to B, as those variables were manipulated or measured. approximate truth about inferences regarding cause-effect or causal relationships • identify casual relationships and rule out other explanations for relationships • not relevant in most observational or descriptive studies • Central focus for studies that assess the effects of social programs or interventions Goal is to be sure that the conclusions drawn from experimental results accurately reflect what went on in the experiment itself • whether observed changes can be attributed to your program or intervention (i.e., the cause) and not to other possible causes (or “alternative explanations” or “confounding factors” for the outcome) Internal validity

Selection threat • groups exposed to treatments non-randomly may differ in ways that mimic what treatment might achieve • Participant characteristics confounded with treatment conditions because of use of intact or self-selected participants, or more generally, whenever predictor variables represent measured characteristics as opposed to independently manipulated treatments. History threat • treatment groups may differ over time because an event happened to the units assigned to on treatment but not the other • Events, in addition to an assigned condition, to which participants are exposed between repeated measurements that could influence performance. Maturation threat • treatment groups may grow apart over time because they spontaneously mature at different rates • Observed changes as a result of ongoing, naturally occurring processes rather than condition effects. Threats to Internal validity could plausibly have caused an observed relationship even if the treatment have never taken place (Campbell 91)

Instrumentation Threat • E.g. Changed definitions of ‘innovation’ Attrition (mortality) Threat • Differential drop out across conditions at one or more time points that may be responsible for differences. • E.g. Innovative performances of new firms in year 1 and year 5 Regression threat • "regression artifact" or "regression to the mean“: a statistical phenomenon • if a variable is extreme on its first measurement, it will tend to be closer to the average on a second measurement, and—a fact that may superficially seem paradoxical—if it is extreme on a second measurement, will tend to have been closer to the average on the first measurement Threats to Internal validity (cont’d)

Best ruled out through random assignment Or control group design • In this scenario, you would have two groups: one receives your program and the other one doesn't. In fact, the only difference between these groups should be the program. If that's true, then the control group would experience all the same history and maturation threats, would have the same testing and instrumentation issues, and would have similar rates of mortality and regression to the mean. In other words, a good control group is one of the most effective ways to rule out the single-group threats to internal validity. Of course, when you add a control group, you no-longer have a single group design. • Cf. Jaffe’s matching design

Given there is a valid causal relationship, is the interpretation of the constructs involved in that relationship correct? the degree to which inferences can legitimately be made from the operationalizations in your study to the theoretical constructs on which those operationalizations were based • how accurately our talk matches what we actually did • generalizing from your program or measures to the concept of your program or measures • an assessment of how well you translated your ideas or theories into actual programs or measures • E.g. innovation (concept) measured by patents (measure) Construct Validity

Inadequate Preoperational Explication of Constructs • you didn't do a good enough job of defining (operationally) what you mean by the construct • Failure to adequately explicate a construct may lead to incorrect inferences about the relationship between the operation and construct. Mono-Operation Bias • Mono-operation bias pertains to the independent variable, cause, program or treatment in your study • If you only use a single version of a program in a single place at a single point in time, you may not be capturing the full breadth of the concept of the program Mono-Method Bias • When all operationalizations use the same method (e.g., self-report), that method is part of the construct actually studied Threats to construct validity

Hypothesis Guessing • Participants are likely to base their behavior on what they guess about the study, not just on your treatment. Evaluation Apprehension • Many people are anxious about being evaluated. • For example women taking a math test may not perform to their full potential because of concerns regarding women’s stereotyped difficulties with math. In this situation, evaluation apprehension is called stereotype threat Experimenter Expectancies • Sometimes the researcher can communicate what the desired outcome for a study might be (and participant desire to "look good" leads them to react that way). • For instance, the researcher might look pleased when participants give a desired answer. If this is what causes the response, it would be wrong to label the response as a treatment effect. The "Social" Threats to Construct Validity

“was the original statistical inference correct?” The validity of inferences about the correlation (covariation) between treatment and outcome. • The power of the analysis focuses on the sensitivity or ability to detect a relationship • Did the investigators arrive at the correct conclusion regarding whether or not a relationship between the variables exists or the extent of the relationship? • Not concerned with the causal relationship between variables, but whether or not there is any relationship, either causal or not Closely tied to Internal Validity • SCV asks if the two variables are correlated. IV asks if that correlation is due to causation Type I Error • Conclude that a relationship exists between two variables, when in fact there is no relationship. Type II Error • Conclude that there is no relationship when one exists. Statistical Conclusion Validity

Low Statistical Power (very common) Violated Assumptions of Statistical Tests (especially problems of nesting—students nested in classes) Unreliability of Measures Restriction of Range Unreliability of Treatment Implementation Extraneous Variance in the Experimental Setting Heterogeneity of Units Inaccurate Effect Size Estimation Threats to Statistical Conclusion Validity

Threats to statistical conclusion validity and their remedies

“Can the finding be generalized across populations, settings, or time?” The validity of inferences about whether the cause-effect relationship holds over variation in persons, settings, treatment variables, and measurement variables. Generalization and applicability of your research to similar problems/settings External Validity

“validity is subjective rather than objective” (Cronbach 1982) • Validity is a property of a conclusion to a critical audience • Validity is assimilated to credibility

Structure and Elements of Empirical Research

The logic of the research process Source: De Vaus, D. (2001). Research design in social research: SAGE Publications Ltd.

Articulate research problem Select research design • Proper empirical setting • Measures • Analytic methods Collect data Analyze data Infer the results Research process

Cohen, W. M., Nelson, R. R., & Walsh, J. P. (2002). Links and Impacts: The Influence of Public Research on Industrial R&D. Management Science, 48(1), 1-23. citation 1233 counted on Dec. 6, 2012 Google Scholar Research questions • “to characterize the extent and nature of the contribution of public research to industrial R&D” • 1. “how public research tends to be used in industrial R&D labs” • 2. “the overall importance of public research, as well as that of specific fields of basic and applied research and engineering” • 3. “the importance of the different pathways through which public research may impact industrial R&D, including publications, informal interactions, consulting, and the hiring of university graduates” • 4. “what roles different kinds of flrms (e.g., large versus small and start-ups versus established flrms) play in bridging public research and industrial R&D.” Structure of a typical empirical paper

Data • a survey of R&D managers administered in 1994 • The population sampled are all the R&D units located in the U.S. conducting R&D in manufacturing industries as part of a manufacturing firm • The sample was randomly drawn from the eligible labs listed in Bowker's Directory of American Research and Technology (1994) or belonging to firms listed in Standard and Poor's COMPUSTAT, stratified by three-digit SIC industry We sampled 3,240 labs, and received 1,478 responses, yielding an unadjusted response rate of 46% and an adjusted response rate of 54% For the analysis in this paper, we restricted our sample to firms whose focus industry was in the manufacturing sector and were not foreign owned, yielding a sample of 1,267 cases • Sample characteristics Cohen, Nelson, & Walsh (2002)

Sample characteristics Cohen, Nelson, & Walsh (2002)

Analysis - RQ1 • Public research outscores, however, consultants/contract R&D as a source of knowledge for both suggesting new R&D projects (p < 0.0001) and contributing to project completion (n.s.)… Although rivals constitute a more important source for project ideas than public research institutions (41% versus 32%, p < 0.0001), public research institutions are markedly more important than rivals as a source of knowledge contributing to project completion— 36% for public research versus 12% for competitors (p < 0.0001), suggesting that the impact of public research on firms' R&D is at least comparable to that of rivals' R&D Cohen, Nelson, & Walsh (2002)

Articulate research problem • In the form of research question(s) and or hypotheses • Determine the appropriate type of research design • To formalize the research topic into an operational guide for the study, connecting the conceptual framework to the methods • Focused and testable E.g. what is the best • Also clarifies the specific type of data to be collected

Secondary data • Financial data, indicators, patent documents, Thompson Web of Science, etc.. Self-report measures • Survey & questionnaire • Advantages Sample large populations (cheap on materials & effort) Efficiently ask a lot of questions • Disadvantages Self-report is fallible Response biases are unavoidable • Interviews Data

Survey & questionnaire Sampling Selecting respondents from population of concern • Specify population • Sampling framework • Sampling bias (Simple) random sampling • randomly selected individuals. Each individual in the population has the same probability of being in the sample. All possible samples of size n have the same chance of being drawn Systematic selection Stratified sampling Convenience sampling Voluntary Response Sampling Snowball sampling • especially useful when you do not know very well about population. • E.g. Name three experts in nanotechnology

Convenience sampling: Just ask whoever is around. • Example: “Man on the street” survey (cheap, convenient, often quite opinionated or emotional → now very popular with TV “journalism”) • Which men, and on which street? • Ask about gun control or legalizing marijuana “on the street” in Berkeley, CA and in some small town in Idaho and you would probably get totally different answers. • Even within an area, answers would probably differ if you did the survey outside a high school or a country-western bar. • Bias: Opinions limited to individuals present Voluntary Response Sampling: • Individuals choose to be involved. These samples are very susceptible to being biased because different people are motivated to respond or not. They are often called “public opinion polls” and are not considered valid or scientific. • Bias: Sample design systematically favors a particular outcome. Bad sampling

Sampling bias • are respondents representative of population of interest? How were they selected? • do all persons in the population have an equal chance of getting selected? Non-response & self selection bias • People who feel they have something to hide or who don’t like their privacy being invaded probably won’t answer. Yet they are part of the population. Response bias • Social desirability: Fancy term for lying when you think you should not tell the truth. Like if your family doctor asks: “How much do you drink?” Or a survey of female students asking: “How many men do you date per week?” • Recencyeffects: People also simply forget and often give erroneous answers to questions about the past. Wording (or framing) effects: • Questions worded like “Do you agree that it is awful that…” are prompting you to give a particular response. General Survey Biases

inference The techniques of inferential statistics allow us to draw inferences or conclusions about a population from a sample. • Your estimate of the population is only as good as your sampling design - Work hard to eliminate biases. • Your sample is only an estimate—and if you randomly sampled again, you would probably get a somewhat different result. • The bigger the sample the better. inference Population Sample

interval/ quantitative /scale • Something that can be counted or measured for each individual on a scale of equal units. Can be added, subtracted, averaged, etc., across individuals in the population. • Example: How tall you are, your age, your blood cholesterol level, the number of credit cards you own. categorical • Something that falls into one of several categories. What can be counted is the count or proportion of individuals in each category. • Nominal: no inherent order • Ordinal: ordered but cannot measure the differences in meaningful units • Dichotomous or dummy: only two values (e.g. yes or no, male and female, promoted and not promoted) • Example: Your blood type (A, B, AB, O), your hair color, your ethnicity, whether you paid income tax last tax year or not. Types of variables

Session 2: • Correlation • Statistical Inference and Hypothesis Testing • t-Test • Confidence Interval • Chi-square Statistic Session 3 • Simple Regression Model Next session

Session 1: Overview of Quantitative Research Methods in Innovation Studies