What is Science

1. I/O PsychologyResearch Methods Today we�re going to talk about the science of IO psychology. Because bfore we start talking about the theories, knowledge, and principles we�ve developed, its important that you understand from where this knowledge comes. Like all other areas of science (e.g., life sciences, natural, technical, social, etc), we rely on the scientific research process. Today we�re going to talk about the science of IO psychology. Because bfore we start talking about the theories, knowledge, and principles we�ve developed, its important that you understand from where this knowledge comes. Like all other areas of science (e.g., life sciences, natural, technical, social, etc), we rely on the scientific research process.

2. What is Science? Science: Approach that involves the understanding, prediction, and control of some phenomenon of interest. Scientific Knowledge is Logical and Concerned with Understanding Empirical Communicable and Precise Probabilistic (Disprove, NOT Prove) Objective / Disinterestedness HOW DO WE KNOW SOMETHING Personal experience (how do you know sun is yellow?) Authority (how do you know there will be three exams in this class?) Intuition (how do you know when a close relative needs you?) Logical: derive hypotheses from general principles/theories; test them through research; and make inferences using inductive logic; Concerned with general principles � not why did Fred not perform well yesterday, buy what causes daily shifts in performance Empirical: use data to explore our hypotheses or theories; relies on observation of events (what workers want most � I can assume I know, or I can survey) Communicable: published and disseminated in detail so that others can assess the findings and replicate them (makes measurement important � want to know how much one employee is better from another) Probabilistic: Science does not set out to prove theories/hypotheses � it sets out to disprove them. SO we never say �prove�, only fail to disprove � frustrates managers; want to eliminate alternative hypotheses (if I find that older workers perform lower, can I say I�ve �proved� that workers get worse with age?) Objective: different scientists can arrive at the same conclusions using same methods, uninfluenced by biases or prejudices Like other scientists, I/O psychologists conduct research based on theories and hypotheses. They gather data, publish those data, and design their research to eliminate alternative explanations for the research results.HOW DO WE KNOW SOMETHING Personal experience (how do you know sun is yellow?) Authority (how do you know there will be three exams in this class?) Intuition (how do you know when a close relative needs you?) Logical: derive hypotheses from general principles/theories; test them through research; and make inferences using inductive logic; Concerned with general principles � not why did Fred not perform well yesterday, buy what causes daily shifts in performance Empirical: use data to explore our hypotheses or theories; relies on observation of events (what workers want most � I can assume I know, or I can survey) Communicable: published and disseminated in detail so that others can assess the findings and replicate them (makes measurement important � want to know how much one employee is better from another) Probabilistic: Science does not set out to prove theories/hypotheses � it sets out to disprove them. SO we never say �prove�, only fail to disprove � frustrates managers; want to eliminate alternative hypotheses (if I find that older workers perform lower, can I say I�ve �proved� that workers get worse with age?) Objective: different scientists can arrive at the same conclusions using same methods, uninfluenced by biases or prejudices Like other scientists, I/O psychologists conduct research based on theories and hypotheses. They gather data, publish those data, and design their research to eliminate alternative explanations for the research results.

3. Goals of Science Ex: We want to study absenteeism in an organization Description: What is the current state of affairs? Prediction: What will happen in the future? Explanation: What is the cause of the phenomena we�re interested in? Description: What is the current state of affairs? We analyze organizations, jobs, and people to describe them according to certain characteristics. What is the current absence rate? How much money each year does the organization lose because of absenteeism? Prediction: What will happen in the future? what info about the past or present can be used to reliably and accurately forecast the future Who is most likely to be absent? Explanation Hardest goal, but most interesting This is essentially theory building. Having theories allows us to understand how people will behave in general so we can develop jobs, orgs, rules, etc. based on certain principles without having to do trial and error each time. Allows us to predict how things will work in a novel circumstance. Why are employees absent? Description: What is the current state of affairs? We analyze organizations, jobs, and people to describe them according to certain characteristics. What is the current absence rate? How much money each year does the organization lose because of absenteeism? Prediction: What will happen in the future? what info about the past or present can be used to reliably and accurately forecast the future Who is most likely to be absent? Explanation Hardest goal, but most interesting This is essentially theory building. Having theories allows us to understand how people will behave in general so we can develop jobs, orgs, rules, etc. based on certain principles without having to do trial and error each time. Allows us to predict how things will work in a novel circumstance. Why are employees absent?

4. What is �research�? Systematic study of phenomena according to scientific principles. A set of procedures used to obtain empirical and verifiable information from which we then make informed, educated conclusions. How do we achieve these goals? Through research which is: Systematic study of phenomena according to scientific principles -so research is the tool we use to achieve the 3 main goals of science Set of procedures undertaken to answer a question -procedures used to obtain verifiable, quantifiable information about a phenomenon How do we achieve these goals? Through research which is: Systematic study of phenomena according to scientific principles -so research is the tool we use to achieve the 3 main goals of science Set of procedures undertaken to answer a question -procedures used to obtain verifiable, quantifiable information about a phenomenon

5. The Empirical Research Process 1. Statement of the Problem 2. Design of the Research Study 3. Measurement of Variables 4. Analysis of Data 5. Interpretation/Conclusions The research process is a undertaken through a series of steps. What question or problem needs to be answered? How do you design a study to answer that question? How do you measure variables and collect the necessary data? How do you apply statistical procedures to analyze the data? How do you make sense out of all the data collected? How do you draw conclusions from analyzing the data?The research process is a undertaken through a series of steps. What question or problem needs to be answered? How do you design a study to answer that question? How do you measure variables and collect the necessary data? How do you apply statistical procedures to analyze the data? How do you make sense out of all the data collected? How do you draw conclusions from analyzing the data?

6. Step 1: Statement of the Problem Theory: statement that explains the relationship among phenomena; gives us a framework within which to conduct research. �There is nothing quite so practical as a good theory.� Kurt Lewin Two Approaches: Inductive � theory building; use data to derive theory. Deductive � theory testing; start with theory and collect data to test that theory. What question needs to be answered? Questions that initiate research don�t arise out of thin air. Can come from personal experience, intuition, or they can stem from some FORMAL THEORY. The value of theory in science is that it integrates and summarizes large amounts of information and provides a framework for research. Theory is difficult in psychology because psychological principles are not natural laws that apply in every situation or even most situations; (no equivalent to newton�s laws of motion) particularly difficult in I/O where dealing with the dynamic nature of organization and people in organizations Theory: Some researchers advocate theory 100%; Lewin is saying that research should be guided by theory Others think theory gets us away from the actual issue Many ideas come from real-life or intuition, especially in I/O Can use theory to guide and explain, but not as a sole source of ideas or an end solution Two ways to derive theory: inductive method starts with data and culminates in theory; deductive method starts with theory and collects data to test that theory.What question needs to be answered? Questions that initiate research don�t arise out of thin air. Can come from personal experience, intuition, or they can stem from some FORMAL THEORY. The value of theory in science is that it integrates and summarizes large amounts of information and provides a framework for research. Theory is difficult in psychology because psychological principles are not natural laws that apply in every situation or even most situations; (no equivalent to newton�s laws of motion) particularly difficult in I/O where dealing with the dynamic nature of organization and people in organizations Theory: Some researchers advocate theory 100%; Lewin is saying that research should be guided by theory Others think theory gets us away from the actual issue Many ideas come from real-life or intuition, especially in I/O Can use theory to guide and explain, but not as a sole source of ideas or an end solution Two ways to derive theory: inductive method starts with data and culminates in theory; deductive method starts with theory and collects data to test that theory.

7. Step 1: Statement of the Problem Hypothesis A testable statement about the status of a variable or the relationship among multiple variables Must be falsifiable! Theories are often complex and not necessarily testable in their original form. Hypothesis: prediction about relationship among variables of interest. Theories are often complex and not necessarily testable in their original form. Hypothesis: prediction about relationship among variables of interest.

8. Step 1: Statement of the Problem Types of variables Independent Variables (IV): Are variables that are manipulated by the researcher. Dependent Variables (DV): Are the outcomes of interest. Predictors and Criterion Confounding variables: Uncontrolled extraneous variables that permits alternative explanations for the results of a study. Variable is a symbol that can assume a range of numerical values. IV: controlled by experimenter Those things we think could be influencing the outcome (the DV) In experiments, these are manipulated or controlled by the researcher In differential research, they are not manipulated, but we have reason to believe they are the �cause� and we allow them to vary naturally We assess the effect of the IVs on the DV E.g., interested in studying the effect of night versus early shift work on productivity and turnover. IV Type of shift work. DV productivity/turnover DV: measured by experimenter - variable of interest The outcome of interest The thing we�re trying to predict or explain The value it takes �DEPENDS� on the IV. NOTE: Variables not inherently IV or DV; function of the research design and what you do with the variables Predictor: score used to predict; used to forecast another variable Criterion: scores trying to predict; the variable we want to predict NOTE: IV and DV used in causal experimentation Predictor and criterion used to explain a relationship; the status of individuals on one variable as a function of their status on the other (predictor) variable Confounding Variables � variables that affect our dependant variable or criterion variables that are not a part of what we are studying sometimes we know what there are and we can measure them and control for them, and sometimes we don�t EXAMPLE: Hawthorne studies; they were studying the effect of light levels on performance; confounding variables was the presence of the researchers; way to test this is to examine researchers as a IV independent of changing the light eVariable is a symbol that can assume a range of numerical values. IV: controlled by experimenter Those things we think could be influencing the outcome (the DV) In experiments, these are manipulated or controlled by the researcher In differential research, they are not manipulated, but we have reason to believe they are the �cause� and we allow them to vary naturally We assess the effect of the IVs on the DV E.g., interested in studying the effect of night versus early shift work on productivity and turnover. IV Type of shift work. DV productivity/turnover DV: measured by experimenter - variable of interest The outcome of interest The thing we�re trying to predict or explain The value it takes �DEPENDS� on the IV. NOTE: Variables not inherently IV or DV; function of the research design and what you do with the variables Predictor: score used to predict; used to forecast another variable Criterion: scores trying to predict; the variable we want to predict NOTE: IV and DV used in causal experimentation Predictor and criterion used to explain a relationship; the status of individuals on one variable as a function of their status on the other (predictor) variable Confounding Variables � variables that affect our dependant variable or criterion variables that are not a part of what we are studying sometimes we know what there are and we can measure them and control for them, and sometimes we don�t EXAMPLE: Hawthorne studies; they were studying the effect of light levels on performance; confounding variables was the presence of the researchers; way to test this is to examine researchers as a IV independent of changing the light e

9. Moderator Variable Special type of IV that influences the relationship between 2 other variables X Y M Example Gender & Hiring rate M = Type of job Relationship b/t gender and hiring rate may change depending on the type of job individuals are applying for. Moderator Variables A variable that interacts with the IV to affect the DV The effect of IV on DV depends on the level of the moderator variable EXAMPLE: Suppose I find that there is no relationship between gender and hiring rate; women are just as likely to get hired as men Now suppose I also examine the type of jobs they are applying for as a moderator When women apply for stereotypically female jobs they have a higher hiring rate, but when apply for stereotypically masculine jobs they have a much lower hiring rate The opposite is true for men Important because while initial results show no relationship, there are important implications for women Moderator Variables A variable that interacts with the IV to affect the DV The effect of IV on DV depends on the level of the moderator variable EXAMPLE: Suppose I find that there is no relationship between gender and hiring rate; women are just as likely to get hired as men Now suppose I also examine the type of jobs they are applying for as a moderator When women apply for stereotypically female jobs they have a higher hiring rate, but when apply for stereotypically masculine jobs they have a much lower hiring rate The opposite is true for men Important because while initial results show no relationship, there are important implications for women

10. Mediator Variable Special type of IV that accounts for the relation between the IV and the DV. Mediation implies a causal relation in which an IV causes a mediator which causes a DV. IV MED DV Example: IV = negative feedback MED = negative thoughts DV = willingness to participate

11. Moderator vs. Mediator A moderator variable is one that influences the strength of a relationship between two other variables. A mediator variable is one that explains the relationship between the two other variables.

12. Example You are an I/O psychologist working for an insurance company. You want to assess which of two training methods is most effective for training new secretaries. You give one group of secretaries on-the-job training and a booklet to study at home. You give the second group of secretaries on-the-job training and have them watch a 30-minute video. A. What is the IV (training method) B. What would be a logical DV (job performance, number of errors, etc.) C. What variables might you want to control D. Why might the field experiment involve less control than if it were a lab experiment? E. Can you think of any other conditions you may want to test?A. What is the IV (training method) B. What would be a logical DV (job performance, number of errors, etc.) C. What variables might you want to control D. Why might the field experiment involve less control than if it were a lab experiment? E. Can you think of any other conditions you may want to test?

13. Step 2: Research Design A research design is the structure or architecture for the study. A plan for how to treat variables that can influence results so as to rule out alternative interpretations. Primary Research Methods: Experimental (Laboratory vs. Field Research) Quasi-Experimental Non-Experimental (Observational, Survey) Primary Research Methods: generate new information on a particular question Experiment: 2 (or more) equivalent groups of P�s are treated exactly the same in all ways except the IV. Differences in measurements of the DV can be attributed to differences in the IV. (machines) Random assignment: equivalent groups are necessary in experiment. This can be achieved through RA. Each P is equally likely to be assigned to each condition. RA ensures that P characteristics that may effect the DV are distributed evenly across groups. Does not guarantee equivalent groups; however, it ensures that differences between groups will occur due to chance and not be systematic. participants are randomly selected and randomly assigned - difficult in real world - hard to get organizations to do this Laboratory vs. Field (2001) 33% lab studies and 67% field studies published in IO. Quasi-Experiment: P�s assigned to conditions but NO random assignment� much less control than in a true experiment (shift work ex) Non-experiment: no control, no conditions Observational design: The researcher observes employee behavior and systematically records what is observed. Questionnaire (survey design): research strategy in which Ps are asked to complete a questionnaire/survey. Self-report Relatively fast and easy Effective for: Sensitive subjects, Large populations, Anonymity Correlational: measure variables as they naturally occur and examine relations; good for getting initial data that might test later; methods are very common in I/O Obtrusive vs. Unobtrusive Primary Research Methods: generate new information on a particular question Experiment: 2 (or more) equivalent groups of P�s are treated exactly the same in all ways except the IV. Differences in measurements of the DV can be attributed to differences in the IV. (machines) Random assignment: equivalent groups are necessary in experiment. This can be achieved through RA. Each P is equally likely to be assigned to each condition. RA ensures that P characteristics that may effect the DV are distributed evenly across groups. Does not guarantee equivalent groups; however, it ensures that differences between groups will occur due to chance and not be systematic. participants are randomly selected and randomly assigned - difficult in real world - hard to get organizations to do this Laboratory vs. Field (2001) 33% lab studies and 67% field studies published in IO. Quasi-Experiment: P�s assigned to conditions but NO random assignment� much less control than in a true experiment (shift work ex) Non-experiment: no control, no conditions Observational design: The researcher observes employee behavior and systematically records what is observed. Questionnaire (survey design): research strategy in which Ps are asked to complete a questionnaire/survey. Self-report Relatively fast and easy Effective for: Sensitive subjects, Large populations, Anonymity Correlational: measure variables as they naturally occur and examine relations; good for getting initial data that might test later; methods are very common in I/O Obtrusive vs. Unobtrusive

14. Step 2: Research Design Secondary Research Methods Meta-analysis: statistical method for combining/analyzing the results from many studies to draw a general conclusion about relationships among variables (p.61). Qualitative Research Methods Rely on observation, interview, case study, and analysis of diaries to produce narrative descriptions of events or processes. SECONDARY: uses existing data from previous research; most common is meta-analysis; used in integrate the findings of previous research; useful for topics that have received a lot of research; greatly increases N; requires a lot of subjective decisions Historically, IO psychologists used quantitative methods for measuring important variables/behavior. These methods rely heavily on tests, rating scales to yield results in numbers. QUALITATIVE: Researcher takes an active part in becoming involved with the subjects and interacting with them in-depth Use observations, interviews, case studies, diaries, ESM still involves statistics (critical incidents); data is just gathered in a different way more in-depth Less common in I/O, but becoming more popular � I/O is very quantitative and hence skeptical of this method since numbers and stats conform to traditional view of science. Qualitative and quantitative not mutually exclusive SECONDARY: uses existing data from previous research; most common is meta-analysis; used in integrate the findings of previous research; useful for topics that have received a lot of research; greatly increases N; requires a lot of subjective decisions Historically, IO psychologists used quantitative methods for measuring important variables/behavior. These methods rely heavily on tests, rating scales to yield results in numbers. QUALITATIVE: Researcher takes an active part in becoming involved with the subjects and interacting with them in-depth Use observations, interviews, case studies, diaries, ESM still involves statistics (critical incidents); data is just gathered in a different way more in-depth Less common in I/O, but becoming more popular � I/O is very quantitative and hence skeptical of this method since numbers and stats conform to traditional view of science. Qualitative and quantitative not mutually exclusive

15. Evaluating Research Design Internal validity (Control) Does X cause Y? Lab studies eliminate distracting variables through experimental control. Using of statistical techniques to control for the influences of certain variables is statistical control. External validity (Generalizability) Does the relation of X and Y hold in other settings and with other participants and stimuli? Internal Validity: Degree to which the results of my research are due to the variables being studied as opposed to some other explanation; goal of research is to rule out as many possible alternative hypotheses as possible External Validity and Generalizability are the same thing; Generalizability is very important because without it, the findings can not be used in any other context (setting or sample) (EX: study stereotypes with Psych subject pool at Purdue and want to generalize to individuals age 18-21 � what is the problem with this?) Can�t have external validity without internal validity; necessary conditionInternal Validity: Degree to which the results of my research are due to the variables being studied as opposed to some other explanation; goal of research is to rule out as many possible alternative hypotheses as possible External Validity and Generalizability are the same thing; Generalizability is very important because without it, the findings can not be used in any other context (setting or sample) (EX: study stereotypes with Psych subject pool at Purdue and want to generalize to individuals age 18-21 � what is the problem with this?) Can�t have external validity without internal validity; necessary condition

16. Threats to Internal Validity History Instrumentation Selection Maturation Mortality/Attrition Testing Experimenter Bias Awareness of Being a Subject **There are an infinite number of threats � these are some of the most common History: what�s happening at the time of the research that might influence the relation of X to Y Instrumentation: does the relation of X to Y reflect changes in the instrument or measure? Selection: does the relation of X to Y reflect biases in sampling of participants? Pervasive in quasi-exps. Maturation: does the relation of X to Y reflect natural growth or development? Regression to the Mean: SI Curse; does X cause decrease in Y, or is this just a function of previous high performance that automatically decreased Mortality: does the relation of X to Y reflect differences in drop out rates? Testing: does the relation of X to Y reflect the use of a pretest? practice effect? Experimenter Bias: does the relation reflect differences in experimenter behavior as a result of expectations of the outcome (often based on hypotheses) Awareness of being a subject: does relation of X to Y reflect subject awareness **There are an infinite number of threats � these are some of the most common History: what�s happening at the time of the research that might influence the relation of X to Y Instrumentation: does the relation of X to Y reflect changes in the instrument or measure? Selection: does the relation of X to Y reflect biases in sampling of participants? Pervasive in quasi-exps. Maturation: does the relation of X to Y reflect natural growth or development? Regression to the Mean: SI Curse; does X cause decrease in Y, or is this just a function of previous high performance that automatically decreased Mortality: does the relation of X to Y reflect differences in drop out rates? Testing: does the relation of X to Y reflect the use of a pretest? practice effect? Experimenter Bias: does the relation reflect differences in experimenter behavior as a result of expectations of the outcome (often based on hypotheses) Awareness of being a subject: does relation of X to Y reflect subject awareness

17. Step 3: Measurement Goal: Quantify the IV and DV Psychological Measurement � the process of quantifying variables (called constructs) �The process of assigning numerical values to represent individual differences, that is, variations among individuals on the attribute of interest� A �Measure� � Any mechanism, procedure, tool, etc, that purports to translate attribute differences into numerical values Now we have a question, have identified the IV & DV and we have selected the most appropriate research design for our study, now what? Now we need to figure out how to turn the IVs and DVs into numbers so that we can conduct the appropriate statistical analyses to answer our question. Measurement reflects the process by which we attempt to assess magnitude differences in the attribute of interest, and represent those differences as numbers E.g., a yard-stick assesses differences in length Now we have a question, have identified the IV & DV and we have selected the most appropriate research design for our study, now what? Now we need to figure out how to turn the IVs and DVs into numbers so that we can conduct the appropriate statistical analyses to answer our question. Measurement reflects the process by which we attempt to assess magnitude differences in the attribute of interest, and represent those differences as numbers E.g., a yard-stick assesses differences in length

18. Step 3: Measurement Two classes of measured variables: Categorical (or Qualitative) Differ in type but not amount Continuous (or Quantitative) Differ in amount Categorical -not inherently numerical -ex: gender, employment status, job type Continuous - objective � naturally numerical � money, time, distance - subjective � have �amounts� of but don�t naturally come in numbers -when you think of 2 people, you might say one has more than the other ex � happiness, friendliness, satisfaction After we have collected the data we must make sense of it.Categorical -not inherently numerical -ex: gender, employment status, job type Continuous - objective � naturally numerical � money, time, distance - subjective � have �amounts� of but don�t naturally come in numbers -when you think of 2 people, you might say one has more than the other ex � happiness, friendliness, satisfaction After we have collected the data we must make sense of it.

19. Step 4: Data Analysis Statistics are what we use to summarize relationship among variables and to estimate the odds that they reflect more than mere chance Descriptive Statistics: Summarize, organize, and describe a sample of data. Inferential Statistics: Used to make inferences from sample data to a larger sample or population. Distributions Descriptive Statistics:describe the population we used Inferential Statistics: infer whether differences found in sample are likely to occur in population Distributions (distributions of frequency): Many naturally-occurring things are normally distributed (height, weight, IQ) � call this the �Bell Curve� Many statistical principals are based on this assumption Many things are not normal Number of thefts among employees tends to be positively skewed Supervisor ratings tend to be negatively skewedDescriptive Statistics:describe the population we used Inferential Statistics: infer whether differences found in sample are likely to occur in population Distributions (distributions of frequency): Many naturally-occurring things are normally distributed (height, weight, IQ) � call this the �Bell Curve� Many statistical principals are based on this assumption Many things are not normal Number of thefts among employees tends to be positively skewed Supervisor ratings tend to be negatively skewed

20. Descriptive Statistics Measures of Central Tendency Mean, Median, Mode Measures of Variability Range, Variance, SD Descriptive statistics used to describe frequency distributions. Measures of Central Tendency: stats that indicate where the center of a distribution is located. Mean � mathematical average or scores in a distribution (sum of x�s/sample size). Median � middle score of a distribution � 50% above, 50% below Mode � most frequent occurring number (least used) Measures of Variability � spread or dispersion of values in a distribution (the extent to which scores in a distribution vary) Range � full range of dispersion � highest minus the lowest Variance � average distance of scores in a distribution from the mean � have to used square values because regular values would always sum to 0 SD � square root of variance � puts back in original units (68% fall within =/- 1 SD in a normal distribution; if distribution is normal, SDs give us idea of where a particular score falls)Descriptive statistics used to describe frequency distributions. Measures of Central Tendency: stats that indicate where the center of a distribution is located. Mean � mathematical average or scores in a distribution (sum of x�s/sample size). Median � middle score of a distribution � 50% above, 50% below Mode � most frequent occurring number (least used) Measures of Variability � spread or dispersion of values in a distribution (the extent to which scores in a distribution vary) Range � full range of dispersion � highest minus the lowest Variance � average distance of scores in a distribution from the mean � have to used square values because regular values would always sum to 0 SD � square root of variance � puts back in original units (68% fall within =/- 1 SD in a normal distribution; if distribution is normal, SDs give us idea of where a particular score falls)

21. Differences in Variance

22. Inferential Statistics Compares a hypothesis to an alternative Statistical Significance: The likelihood that the observed difference would be obtained if the null hypothesis were true Statistical Power: Likelihood of finding a statistically significant difference when a true difference exists Don�t need to know how these are calculated, but these are what are typically used to test hypotheses and will be presented after descriptive stats in a research article. The smaller the sample size, the lower the power to detect a difference; can do power analyses before conducting a study in order to determine necessary sample sizeDon�t need to know how these are calculated, but these are what are typically used to test hypotheses and will be presented after descriptive stats in a research article. The smaller the sample size, the lower the power to detect a difference; can do power analyses before conducting a study in order to determine necessary sample size

23. Correlation Correlation Used to assess the relationship between 2 variables Represented by the correlation coefficient �r� r can take on values from �1 to +1 Size denotes the magnitude of the relationship 0 means no relationship Correlation � the degree of relationship between two variables Many types but we�re going to concentrate on one � correlation Sign � denotes positive or negative relationship Positive relationship: as one variable goes up, the other variable goes up Negative relationship: as one variable goes up, the other variable goes down Size � denotes magnitude of relationship Independent of sign: -.80 is same strength as +.80 Correlation � the degree of relationship between two variables Many types but we�re going to concentrate on one � correlation Sign � denotes positive or negative relationship Positive relationship: as one variable goes up, the other variable goes up Negative relationship: as one variable goes up, the other variable goes down Size � denotes magnitude of relationship Independent of sign: -.80 is same strength as +.80

24. Correlation and Regression Correlation Scatterplot Regression Line Linear vs. Non-Linear Multiple Correlations Correlation and Causation Correlation � the degree of relationship between two variables Use a scatterplot to plot the relationship between two variable Regression Line � line that bests fits the data; can use to predict other scores Very hard to �eyeball� the relationship between two variables based on a scatter plot; easier with the regression line Correlation Coefficient � strength of the relationship Positive or negative, from -1 to + The higher the correlation, the more accurately we can predict one score from another A = .93, B = .71, D = -.51, E = -.04 Linear vs. Non-Linear A zero correlation indicates that there is no linear relationship, but there may be a relationship not detected by correlation Take the example of stress and performance; curvilinear relationship, but correlation would be 0 Many relationships in I/O are linear, but some aren�t Multiple Correlations The relationship between multiple predictors and a single outcome Selection � use of multiple correlations; rarely use a single predictor Correlation and Causation does not allow for causality; Job Sat and Perf - which way? Height and weight; correlated buy don�t cause each otherCorrelation � the degree of relationship between two variables Use a scatterplot to plot the relationship between two variable Regression Line � line that bests fits the data; can use to predict other scores Very hard to �eyeball� the relationship between two variables based on a scatter plot; easier with the regression line Correlation Coefficient � strength of the relationship Positive or negative, from -1 to + The higher the correlation, the more accurately we can predict one score from another A = .93, B = .71, D = -.51, E = -.04 Linear vs. Non-Linear A zero correlation indicates that there is no linear relationship, but there may be a relationship not detected by correlation Take the example of stress and performance; curvilinear relationship, but correlation would be 0 Many relationships in I/O are linear, but some aren�t Multiple Correlations The relationship between multiple predictors and a single outcome Selection � use of multiple correlations; rarely use a single predictor Correlation and Causation does not allow for causality; Job Sat and Perf - which way? Height and weight; correlated buy don�t cause each other

25. Prediction of the DV with one IV Correlations allow us to make predictions

26. Interpretation: Evaluating Measures How do you determine the usefulness of the information gathered from our measures? The Answer: Reliability Evidence Validity Evidence Consistency AccuracyConsistency Accuracy

27. Interpretation: Evaluating Measures Reliability: Consistency or stability of a measure. A measure should yield a similar score each time it is given We can get a reliable measure by reducing errors of measurement: any factor that affects obtained scores but is not related to the thing we want to measure. Errors of measurement Random factors, practice effects, etc. Evaluating Measures Reliability: degree to which a measure is free of random error and consistent in the numbers assigned to objects or events; thermometer that says 30 every time it is a given temperature - nothing to do with being �right� Person factors: mood, anxiety, hunger Environmental factors: temperature, testing conditions, time of day Practice or memory effects - if give same intelligence measure ten times, people may get better scores but not because more intelligent but because they have had practice at the test � reflects error - if gave same test twice, may remember the items - if not trying to measure practice or memory, changes in the scores reflect errors of measurement Evaluating Measures Reliability: degree to which a measure is free of random error and consistent in the numbers assigned to objects or events; thermometer that says 30 every time it is a given temperature - nothing to do with being �right� Person factors: mood, anxiety, hunger Environmental factors: temperature, testing conditions, time of day Practice or memory effects - if give same intelligence measure ten times, people may get better scores but not because more intelligent but because they have had practice at the test � reflects error - if gave same test twice, may remember the items - if not trying to measure practice or memory, changes in the scores reflect errors of measurement

28. Evaluating Measures: Reliability Test-Retest (Index of Stability) Method: Give the same test on two occasions and correlate sets of scores (coefficient of stability) Error: Anything that differentially influences scores across time for the same test Issue: How long should the time interval be? Limitations: Not good for tests that are supposed to assess change Not good for tests of things that change quickly (i.e., mood) Difficult and expensive to retest Memory/practice effects are likely Reliability Indexes: correlation between two readings Test-Retest: relationship between scores at two points in time those who score high should score high again (i.e., the rank order should stay the same) Equivalent Forms: requires making two equivalent tests very hard to make two genuinely equivalent tests often very useful for test security reasons Internal consistency: extent to which items measure same thing assessed by looking at the relationship among all the items if the responses to items are highly correlated, the test is said to be measuring one thing and to be internally consistent Inter-Rater: many assessment exercises use raters to assess performance on the exercise; these assessments are not always objective and can lead to discrepancies; EX: American Idol use inter-rater reliability to measure agreement among raters; if raters do not agree, we conclude that the behavior was not reliably observed Reliability Indexes: correlation between two readings Test-Retest: relationship between scores at two points in time those who score high should score high again (i.e., the rank order should stay the same) Equivalent Forms: requires making two equivalent tests very hard to make two genuinely equivalent tests often very useful for test security reasons Internal consistency: extent to which items measure same thing assessed by looking at the relationship among all the items if the responses to items are highly correlated, the test is said to be measuring one thing and to be internally consistent Inter-Rater: many assessment exercises use raters to assess performance on the exercise; these assessments are not always objective and can lead to discrepancies; EX: American Idol use inter-rater reliability to measure agreement among raters; if raters do not agree, we conclude that the behavior was not reliably observed

29. Evaluating Measures: Reliability Equivalent Forms (Index of Equivalence) Method: Give two versions of a test and correlate scores (coefficient of equivalence) Reflects the extent to which the two different versions are measuring the same concept in the same way Issue: are tests really parallel?; length of interval? Limitations: Difficult and expensive Testing time Unique estimate for each interval Evaluating Measures Reliability: degree to which a measure is free of random error and consistent in the numbers assigned to objects or events; thermometer that says 30 every time it is a given temperature - nothing to do with being �right� Reliability Indexes: correlation between two readings Test-Retest: relationship between scores at two points in time those who score high should score high again (i.e., the rank order should stay the same) Equivalent Forms: requires making two equivalent tests very hard to make two genuinely equivalent tests often very useful for test security reasons Internal consistency: extent to which items measure same thing assessed by looking at the relationship among all the items if the responses to items are highly correlated, the test is said to be measuring one thing and to be internally consistent Inter-Rater: many assessment exercises use raters to assess performance on the exercise; these assessments are not always objective and can lead to discrepancies; EX: American Idol use inter-rater reliability to measure agreement among raters; if raters do not agree, we conclude that the behavior was not reliably observed Evaluating Measures Reliability: degree to which a measure is free of random error and consistent in the numbers assigned to objects or events; thermometer that says 30 every time it is a given temperature - nothing to do with being �right� Reliability Indexes: correlation between two readings Test-Retest: relationship between scores at two points in time those who score high should score high again (i.e., the rank order should stay the same) Equivalent Forms: requires making two equivalent tests very hard to make two genuinely equivalent tests often very useful for test security reasons Internal consistency: extent to which items measure same thing assessed by looking at the relationship among all the items if the responses to items are highly correlated, the test is said to be measuring one thing and to be internally consistent Inter-Rater: many assessment exercises use raters to assess performance on the exercise; these assessments are not always objective and can lead to discrepancies; EX: American Idol use inter-rater reliability to measure agreement among raters; if raters do not agree, we conclude that the behavior was not reliably observed

30. Evaluating Measures: Reliability Internal Consistency Reliability Method: take a single test and look at how well the items on the test relate to each other Split-half: similar to alternate forms (e.g., odd vs. even items) Cronbach�s Alpha: mathematically equivalent to the average of all possible split-half estimates Limitations Only use for multiple item tests Some �tests� are not designed to be homogeneous Doesn�t assess stability over time Class of methodsClass of methods

31. Evaluating Measures: Reliability Inter-Rater Reliability Method: two different raters rate the same target and the ratings are correlated Correlation reflects the proportion of consistency among the ratings Issue: reliability doesn�t imply accuracy Limitations Need informed, trained raters Ratings are not a good way to measure many attributes

32. Interpretation: Evaluating Measures Validity: The accurateness of inferences made based on data. Whether a measure accurately and completely represents what was intended to be measured. Validity is not a property of the test It is a property of the inferences we make from the test scores VALIDITY: the extent to which the instrument is measuring what you are trying to measure; thermometer that says 30 every time it is 60 is not valid; it is, however, reliable; validity is not a function of the measure but what we use it for While validity is essentially one thing, we use several indexes to measure it; ideally, should consider all of these Validity Indexes: Criterion-Related Validity: how well a measure predicts some criterion/outcome Probably the most practically significant of the three forms of validity the one we use most often; easiest to defend try to show that people who score high on the predictor also score high on the criterion Use correlation to determine the relationship between variables use validity coefficient (correlation coefficient) to assess strength of relationship; square r is % variance accounted for Content Validity: how well a test measures the full range of domain; can�t measure this Sometimes expressed as a proportion (test covers 80% of the identified content area) sometimes use SMEs who are experts in the content area to assess content validity Often use content validity for knowledge tests (police knowledge necessary for entry officers) Construct Validity: Construct: theoretical concepts we use to explain behavior; constructs are not tangible but we use concrete tests to assess them (leadership, motivation, extraversion) Construct Validity: assesses how well measures reflects its underlying construct Assessing Construct Validity Convergent Validity: compare measure to other measures of the same construct; should be strongly related, although not so high that this test doesn�t measure anything new Divergent Validity: compare measure to measures of unrelated constructs; should not be related (a test of physical strength should not be related to test of leadership) Reliability is inherent in a test, that is a test is either reliable or not; Validity, however, depends on the use The length of your foot would not be a valid predictor of intelligence; it would be a valid predictor of shoe size Reliability is a necessary but not sufficient condition for Validity VALIDITY: the extent to which the instrument is measuring what you are trying to measure; thermometer that says 30 every time it is 60 is not valid; it is, however, reliable; validity is not a function of the measure but what we use it for While validity is essentially one thing, we use several indexes to measure it; ideally, should consider all of these Validity Indexes: Criterion-Related Validity: how well a measure predicts some criterion/outcome Probably the most practically significant of the three forms of validity the one we use most often; easiest to defend try to show that people who score high on the predictor also score high on the criterion Use correlation to determine the relationship between variables use validity coefficient (correlation coefficient) to assess strength of relationship; square r is % variance accounted for Content Validity: how well a test measures the full range of domain; can�t measure this Sometimes expressed as a proportion (test covers 80% of the identified content area) sometimes use SMEs who are experts in the content area to assess content validity Often use content validity for knowledge tests (police knowledge necessary for entry officers) Construct Validity: Construct: theoretical concepts we use to explain behavior; constructs are not tangible but we use concrete tests to assess them (leadership, motivation, extraversion) Construct Validity: assesses how well measures reflects its underlying construct Assessing Construct Validity Convergent Validity: compare measure to other measures of the same construct; should be strongly related, although not so high that this test doesn�t measure anything new Divergent Validity: compare measure to measures of unrelated constructs; should not be related (a test of physical strength should not be related to test of leadership) Reliability is inherent in a test, that is a test is either reliable or not; Validity, however, depends on the use The length of your foot would not be a valid predictor of intelligence; it would be a valid predictor of shoe size Reliability is a necessary but not sufficient condition for Validity

33. Evaluating Measures: Validity Criterion-Related Predictive Concurrent Content-Related Construct-Related Reliability is a necessary but not sufficient condition for validity VALIDITY: the extent to which the instrument is measuring what you are trying to measure; thermometer that says 30 every time it is 60 is not valid; it is, however, reliable; validity is not a function of the measure but what we use it for While validity is essentially one thing, we use several indexes to measure it; ideally, should consider all of these Validity Indexes: Criterion-Related Validity: how well a measure predicts some criterion/outcome Probably the most practically significant of the three forms of validity the one we use most often; easiest to defend try to show that people who score high on the predictor also score high on the criterion Use correlation to determine the relationship between variables use validity coefficient (correlation coefficient) to assess strength of relationship; square r is % variance accounted for Content Validity: how well a test measures the full range of domain; can�t measure this Sometimes expressed as a proportion (test covers 80% of the identified content area) sometimes use SMEs who are experts in the content area to assess content validity Often use content validity for knowledge tests (police knowledge necessary for entry officers) Construct Validity: Construct: theoretical concepts we use to explain behavior; constructs are not tangible but we use concrete tests to assess them (leadership, motivation, extraversion) Construct Validity: assesses how well measures reflects its underlying construct Assessing Construct Validity Convergent Validity: compare measure to other measures of the same construct; should be strongly related, although not so high that this test doesn�t measure anything new Divergent Validity: compare measure to measures of unrelated constructs; should not be related (a test of physical strength should not be related to test of leadership) Reliability is inherent in a test, that is a test is either reliable or not; Validity, however, depends on the use The length of your foot would not be a valid predictor of intelligence; it would be a valid predictor of shoe size Reliability is a necessary but not sufficient condition for Validity VALIDITY: the extent to which the instrument is measuring what you are trying to measure; thermometer that says 30 every time it is 60 is not valid; it is, however, reliable; validity is not a function of the measure but what we use it for While validity is essentially one thing, we use several indexes to measure it; ideally, should consider all of these Validity Indexes: Criterion-Related Validity: how well a measure predicts some criterion/outcome Probably the most practically significant of the three forms of validity the one we use most often; easiest to defend try to show that people who score high on the predictor also score high on the criterion Use correlation to determine the relationship between variables use validity coefficient (correlation coefficient) to assess strength of relationship; square r is % variance accounted for Content Validity: how well a test measures the full range of domain; can�t measure this Sometimes expressed as a proportion (test covers 80% of the identified content area) sometimes use SMEs who are experts in the content area to assess content validity Often use content validity for knowledge tests (police knowledge necessary for entry officers) Construct Validity: Construct: theoretical concepts we use to explain behavior; constructs are not tangible but we use concrete tests to assess them (leadership, motivation, extraversion) Construct Validity: assesses how well measures reflects its underlying construct Assessing Construct Validity Convergent Validity: compare measure to other measures of the same construct; should be strongly related, although not so high that this test doesn�t measure anything new Divergent Validity: compare measure to measures of unrelated constructs; should not be related (a test of physical strength should not be related to test of leadership) Reliability is inherent in a test, that is a test is either reliable or not; Validity, however, depends on the use The length of your foot would not be a valid predictor of intelligence; it would be a valid predictor of shoe size Reliability is a necessary but not sufficient condition for Validity

34. Content Validity The extent to which a predictor provides a representative sample of the thing we�re measuring Example: First Exam Content: history, research methods, criterion theory, job analysis, measurement in selection Evidence SME evaluation My test should be designed to assess your understanding and application of these concepts so I should pick test questions that will tap these different areas These areas are basically defining the construct I�m interested in assessing: stuff from the 1st part of the course If I only asked you questions about the history of I/O Psychology, my test would not be very content valid because I�m not sampling each of the different areas of my construct If I gave you this test and got your scores back, I really would not be able to make inferences about how well you know all these different areas because I only asked you about the history of I/O To make this test content valid, I would have to have questions that tapped into all of the areas How do we assess content validity? Evaluations of content validity are made by SMEs � people considered experts on the construct of interest Strong link to job analysis Employers develop tests that should assess the KSAOs of a job Need to assess how much the content of these tests is related to the actual job Need to show that the test is tapping into those KSAOs that are actually important to the job A related validity is face validity My test should be designed to assess your understanding and application of these concepts so I should pick test questions that will tap these different areas These areas are basically defining the construct I�m interested in assessing: stuff from the 1st part of the course If I only asked you questions about the history of I/O Psychology, my test would not be very content valid because I�m not sampling each of the different areas of my construct If I gave you this test and got your scores back, I really would not be able to make inferences about how well you know all these different areas because I only asked you about the history of I/O To make this test content valid, I would have to have questions that tapped into all of the areas How do we assess content validity? Evaluations of content validity are made by SMEs � people considered experts on the construct of interest Strong link to job analysis Employers develop tests that should assess the KSAOs of a job Need to assess how much the content of these tests is related to the actual job Need to show that the test is tapping into those KSAOs that are actually important to the job A related validity is face validity

35. Criterion-Related Validity The extent to which a predictor relates to a criterion Evidence Correlation (called the validity coefficient) A good validity coefficient is around .3 to .4 Concurrent Validity Predictive Validity Concurrent - Validity coefficient � based on a study that measured predictor and criterion at the same time (give test that is supposed to predict pf to people on the job & measure their pf for the same time) adv: speed, cheap disadv: restriction of range � so underestimate validity; ex: NBA players and height as a predictor Predictive � validity coefficient � based on study that measured predictor 1st & then later looked at criterion (give test to applicants & make selection decisions w/o using the info from the test; after on the job for a while collect pf info) more powerful, but more costly & harder to do Concurrent - Validity coefficient � based on a study that measured predictor and criterion at the same time (give test that is supposed to predict pf to people on the job & measure their pf for the same time) adv: speed, cheap disadv: restriction of range � so underestimate validity; ex: NBA players and height as a predictor Predictive � validity coefficient � based on study that measured predictor 1st & then later looked at criterion (give test to applicants & make selection decisions w/o using the info from the test; after on the job for a while collect pf info) more powerful, but more costly & harder to do

36. Construct Validity The extent to which a test is an accurate representation of the construct it is trying to measure Construct validity results from the slow accumulation of evidence (multiple methods) Evidence: Content validity and criterion-related validity can provide support for construct validity Convergent validity Divergent (discriminant) validity Convergent Validity - when test correlates highly with other tests of similar constructs Divergent Validity � when test does not correlate with tests of unrelated constructs Bushman & Wells (1998) - ex of predictive criterion-related validity & convergent and divergent construct validity The idea they had was to try to predict the number of aggressive penalties kids got over a season based on their scores from a measure of trait aggressiveness - sample � 91 boys in a hockey league in Iowa - IV � score on the physical aggression subscale of the aggression questionnaire, completed before the beginning of the season - DVs � number of minutes penalized for aggressive penalties � roughing, tripping, slashing - number of minutes penalized for non-aggressive penalties � delay of game, no mouth guard - analysis - r = .33 b/t trait aggressiveness & number of aggressive penalty minutes - r = .04 b/t trait aggressiveness & number of non-aggressive penalty minutes Convergent Validity - when test correlates highly with other tests of similar constructs Divergent Validity � when test does not correlate with tests of unrelated constructs Bushman & Wells (1998) - ex of predictive criterion-related validity & convergent and divergent construct validity The idea they had was to try to predict the number of aggressive penalties kids got over a season based on their scores from a measure of trait aggressiveness - sample � 91 boys in a hockey league in Iowa - IV � score on the physical aggression subscale of the aggression questionnaire, completed before the beginning of the season - DVs � number of minutes penalized for aggressive penalties � roughing, tripping, slashing - number of minutes penalized for non-aggressive penalties � delay of game, no mouth guard - analysis - r = .33 b/t trait aggressiveness & number of aggressive penalty minutes - r = .04 b/t trait aggressiveness & number of non-aggressive penalty minutes

37. Step 5: Conclusions From Research You are making inferences! What if it you�re inferences seem �wrong�? Theory is wrong? Information (data) is bad? Bad measurement? Bad research design? Bad sample? Analysis was wrong? Research is a Cumulative Process � we rarely reach any kind of conclusions from a single study Dissemination happens through conferences and journals Every study has boundary conditions for generalizing- book talks about these Representativeness of subjects (college students) Fit between subjects and task (relevant to I/O because students have less job experience) Research method A lot of research is serendipity � chance occurences Research is a Cumulative Process � we rarely reach any kind of conclusions from a single study Dissemination happens through conferences and journals Every study has boundary conditions for generalizing- book talks about these Representativeness of subjects (college students) Fit between subjects and task (relevant to I/O because students have less job experience) Research method A lot of research is serendipity � chance occurences

38. Step 5: Conclusions From Research Cumulative Process Dissemination Conference presentations & journal publications Boundary conditions Generalizability Causation Serendipity Research is a Cumulative Process � we rarely reach any kind of conclusions from a single study Dissemination happens through conferences and journals Every study has boundary conditions for generalizing- book talks about these Representativeness of subjects (college students) Fit between subjects and task (relevant to I/O because students have less job experience) Research method A lot of research is serendipity � chance occurences Research is a Cumulative Process � we rarely reach any kind of conclusions from a single study Dissemination happens through conferences and journals Every study has boundary conditions for generalizing- book talks about these Representativeness of subjects (college students) Fit between subjects and task (relevant to I/O because students have less job experience) Research method A lot of research is serendipity � chance occurences

39. Research Ethics Informed consent Welfare of subjects Conflicting obligations to the organization and to the participants Informed consent Benefits must outweigh any potential harm Informed consent Benefits must outweigh any potential harm

What is Science

What is Science

Presentation Transcript

WHAT IS SCIENCE?

What is Science?

What is Science?

What is Science?

What is Science?

What is Science?

WHAT IS SCIENCE?

What is Science?

What is Science?

What is Science?

What is Science?

What is Science?

What is Science?

What is science?

What is Science?

What is Science?

What is Science?

What is Science?

What is Science?