390 likes | 566 Views
What is Science?. Science: Approach that involves the understanding, prediction, and control of some phenomenon of interest.Scientific Knowledge isLogical and Concerned with UnderstandingEmpiricalCommunicable and PreciseProbabilistic (Disprove, NOT Prove)Objective / Disinterestedness. Goals of Science.
E N D
1. I/O PsychologyResearch Methods Today we’re going to talk about the science of IO psychology.
Because bfore we start talking about the theories, knowledge, and principles we’ve developed, its important that you understand from where this knowledge comes. Like all other areas of science (e.g., life sciences, natural, technical, social, etc), we rely on the scientific research process. Today we’re going to talk about the science of IO psychology.
Because bfore we start talking about the theories, knowledge, and principles we’ve developed, its important that you understand from where this knowledge comes. Like all other areas of science (e.g., life sciences, natural, technical, social, etc), we rely on the scientific research process.
2. What is Science? Science: Approach that involves the understanding, prediction, and control of some phenomenon of interest.
Scientific Knowledge is
Logical and Concerned with Understanding
Empirical
Communicable and Precise
Probabilistic (Disprove, NOT Prove)
Objective / Disinterestedness
HOW DO WE KNOW SOMETHING
Personal experience (how do you know sun is yellow?)
Authority (how do you know there will be three exams in this class?)
Intuition (how do you know when a close relative needs you?)
Logical: derive hypotheses from general principles/theories; test them through research; and make inferences using inductive logic; Concerned with general principles – not why did Fred not perform well yesterday, buy what causes daily shifts in performance
Empirical: use data to explore our hypotheses or theories; relies on observation of events (what workers want most – I can assume I know, or I can survey)
Communicable: published and disseminated in detail so that others can assess the findings and replicate them (makes measurement important – want to know how much one employee is better from another)
Probabilistic: Science does not set out to prove theories/hypotheses – it sets out to disprove them. SO we never say “prove”, only fail to disprove – frustrates managers; want to eliminate alternative hypotheses (if I find that older workers perform lower, can I say I’ve “proved” that workers get worse with age?)
Objective: different scientists can arrive at the same conclusions using same methods, uninfluenced by biases or prejudices
Like other scientists, I/O psychologists conduct research based on theories and hypotheses. They gather data, publish those data, and design their research to eliminate alternative explanations for the research results.HOW DO WE KNOW SOMETHING
Personal experience (how do you know sun is yellow?)
Authority (how do you know there will be three exams in this class?)
Intuition (how do you know when a close relative needs you?)
Logical: derive hypotheses from general principles/theories; test them through research; and make inferences using inductive logic; Concerned with general principles – not why did Fred not perform well yesterday, buy what causes daily shifts in performance
Empirical: use data to explore our hypotheses or theories; relies on observation of events (what workers want most – I can assume I know, or I can survey)
Communicable: published and disseminated in detail so that others can assess the findings and replicate them (makes measurement important – want to know how much one employee is better from another)
Probabilistic: Science does not set out to prove theories/hypotheses – it sets out to disprove them. SO we never say “prove”, only fail to disprove – frustrates managers; want to eliminate alternative hypotheses (if I find that older workers perform lower, can I say I’ve “proved” that workers get worse with age?)
Objective: different scientists can arrive at the same conclusions using same methods, uninfluenced by biases or prejudices
Like other scientists, I/O psychologists conduct research based on theories and hypotheses. They gather data, publish those data, and design their research to eliminate alternative explanations for the research results.
3. Goals of Science Ex: We want to study absenteeism in an organization
Description: What is the current state of affairs?
Prediction: What will happen in the future?
Explanation: What is the cause of the phenomena we’re interested in?
Description: What is the current state of affairs?
We analyze organizations, jobs, and people to describe them according to certain characteristics.
What is the current absence rate?
How much money each year does the organization lose because of absenteeism?
Prediction: What will happen in the future?
what info about the past or present can be used to reliably and accurately forecast the future
Who is most likely to be absent?
Explanation
Hardest goal, but most interesting
This is essentially theory building. Having theories allows us to understand how people will behave in general so we can develop jobs, orgs, rules, etc. based on certain principles without having to do trial and error each time.
Allows us to predict how things will work in a novel circumstance.
Why are employees absent?
Description: What is the current state of affairs?
We analyze organizations, jobs, and people to describe them according to certain characteristics.
What is the current absence rate?
How much money each year does the organization lose because of absenteeism?
Prediction: What will happen in the future?
what info about the past or present can be used to reliably and accurately forecast the future
Who is most likely to be absent?
Explanation
Hardest goal, but most interesting
This is essentially theory building. Having theories allows us to understand how people will behave in general so we can develop jobs, orgs, rules, etc. based on certain principles without having to do trial and error each time.
Allows us to predict how things will work in a novel circumstance.
Why are employees absent?
4. What is “research”? Systematic study of phenomena according to scientific principles.
A set of procedures used to obtain empirical and verifiable information from which we then make informed, educated conclusions.
How do we achieve these goals?
Through research which is:
Systematic study of phenomena according to scientific principles
-so research is the tool we use to achieve the 3 main goals of science
Set of procedures undertaken to answer a question
-procedures used to obtain verifiable, quantifiable information about a phenomenon
How do we achieve these goals?
Through research which is:
Systematic study of phenomena according to scientific principles
-so research is the tool we use to achieve the 3 main goals of science
Set of procedures undertaken to answer a question
-procedures used to obtain verifiable, quantifiable information about a phenomenon
5. The Empirical Research Process 1. Statement of the Problem
2. Design of the Research Study
3. Measurement of Variables
4. Analysis of Data
5. Interpretation/Conclusions
The research process is a undertaken through a series of steps.
What question or problem needs to be answered?
How do you design a study to answer that question?
How do you measure variables and collect the necessary data?
How do you apply statistical procedures to analyze the data? How do you make sense out of all the data collected?
How do you draw conclusions from analyzing the data?The research process is a undertaken through a series of steps.
What question or problem needs to be answered?
How do you design a study to answer that question?
How do you measure variables and collect the necessary data?
How do you apply statistical procedures to analyze the data? How do you make sense out of all the data collected?
How do you draw conclusions from analyzing the data?
6. Step 1: Statement of the Problem Theory: statement that explains the relationship among phenomena; gives us a framework within which to conduct research.
“There is nothing quite so practical as a good theory.” Kurt Lewin
Two Approaches:
Inductive – theory building; use data to derive theory.
Deductive – theory testing; start with theory and collect data to test that theory.
What question needs to be answered?
Questions that initiate research don’t arise out of thin air. Can come from personal experience, intuition, or they can stem from some FORMAL THEORY.
The value of theory in science is that it integrates and summarizes large amounts of information and provides a framework for research.
Theory is difficult in psychology because psychological principles are not natural laws that apply in every situation or even most situations; (no equivalent to newton’s laws of motion) particularly difficult in I/O where dealing with the dynamic nature of organization and people in organizations
Theory:
Some researchers advocate theory 100%; Lewin is saying that research should be guided by theory
Others think theory gets us away from the actual issue
Many ideas come from real-life or intuition, especially in I/O
Can use theory to guide and explain, but not as a sole source of ideas or an end solution
Two ways to derive theory: inductive method starts with data and culminates in theory; deductive method starts with theory and collects data to test that theory.What question needs to be answered?
Questions that initiate research don’t arise out of thin air. Can come from personal experience, intuition, or they can stem from some FORMAL THEORY.
The value of theory in science is that it integrates and summarizes large amounts of information and provides a framework for research.
Theory is difficult in psychology because psychological principles are not natural laws that apply in every situation or even most situations; (no equivalent to newton’s laws of motion) particularly difficult in I/O where dealing with the dynamic nature of organization and people in organizations
Theory:
Some researchers advocate theory 100%; Lewin is saying that research should be guided by theory
Others think theory gets us away from the actual issue
Many ideas come from real-life or intuition, especially in I/O
Can use theory to guide and explain, but not as a sole source of ideas or an end solution
Two ways to derive theory: inductive method starts with data and culminates in theory; deductive method starts with theory and collects data to test that theory.
7. Step 1: Statement of the Problem Hypothesis
A testable statement about the status of a variable or the relationship among multiple variables
Must be falsifiable!
Theories are often complex and not necessarily testable in their original form.
Hypothesis: prediction about relationship among variables of interest.
Theories are often complex and not necessarily testable in their original form.
Hypothesis: prediction about relationship among variables of interest.
8. Step 1: Statement of the Problem Types of variables
Independent Variables (IV): Are variables that are manipulated by the researcher.
Dependent Variables (DV): Are the outcomes of interest.
Predictors and Criterion
Confounding variables: Uncontrolled extraneous variables that permits alternative explanations for the results of a study.
Variable is a symbol that can assume a range of numerical values.
IV: controlled by experimenter
Those things we think could be influencing the outcome (the DV)
In experiments, these are manipulated or controlled by the researcher
In differential research, they are not manipulated, but we have reason to believe they are the “cause” and we allow them to vary naturally
We assess the effect of the IVs on the DV
E.g., interested in studying the effect of night versus early shift work on productivity and turnover. IV Type of shift work. DV productivity/turnover
DV: measured by experimenter - variable of interest
The outcome of interest
The thing we’re trying to predict or explain
The value it takes “DEPENDS” on the IV.
NOTE: Variables not inherently IV or DV; function of the research design and what you do with the variables
Predictor: score used to predict; used to forecast another variable
Criterion: scores trying to predict; the variable we want to predict
NOTE: IV and DV used in causal experimentation
Predictor and criterion used to explain a relationship; the status of individuals on one variable as a function of their status on the other (predictor) variable
Confounding Variables –
variables that affect our dependant variable or criterion variables that are not a part of what we are studying
sometimes we know what there are and we can measure them and control for them, and sometimes we don’t
EXAMPLE: Hawthorne studies; they were studying the effect of light levels on performance; confounding variables was the presence of the researchers; way to test this is to examine researchers as a IV independent of changing the light
eVariable is a symbol that can assume a range of numerical values.
IV: controlled by experimenter
Those things we think could be influencing the outcome (the DV)
In experiments, these are manipulated or controlled by the researcher
In differential research, they are not manipulated, but we have reason to believe they are the “cause” and we allow them to vary naturally
We assess the effect of the IVs on the DV
E.g., interested in studying the effect of night versus early shift work on productivity and turnover. IV Type of shift work. DV productivity/turnover
DV: measured by experimenter - variable of interest
The outcome of interest
The thing we’re trying to predict or explain
The value it takes “DEPENDS” on the IV.
NOTE: Variables not inherently IV or DV; function of the research design and what you do with the variables
Predictor: score used to predict; used to forecast another variable
Criterion: scores trying to predict; the variable we want to predict
NOTE: IV and DV used in causal experimentation
Predictor and criterion used to explain a relationship; the status of individuals on one variable as a function of their status on the other (predictor) variable
Confounding Variables –
variables that affect our dependant variable or criterion variables that are not a part of what we are studying
sometimes we know what there are and we can measure them and control for them, and sometimes we don’t
EXAMPLE: Hawthorne studies; they were studying the effect of light levels on performance; confounding variables was the presence of the researchers; way to test this is to examine researchers as a IV independent of changing the light
e
9. Moderator Variable Special type of IV that influences the relationship between 2 other variables
X Y
M
Example
Gender & Hiring rate
M = Type of job
Relationship b/t gender and hiring rate may change depending on the type of job individuals are applying for. Moderator Variables
A variable that interacts with the IV to affect the DV
The effect of IV on DV depends on the level of the moderator variable
EXAMPLE:
Suppose I find that there is no relationship between gender and hiring rate; women are just as likely to get hired as men
Now suppose I also examine the type of jobs they are applying for as a moderator
When women apply for stereotypically female jobs they have a higher hiring rate, but when apply for stereotypically masculine jobs they have a much lower hiring rate
The opposite is true for men
Important because while initial results show no relationship, there are important implications for women
Moderator Variables
A variable that interacts with the IV to affect the DV
The effect of IV on DV depends on the level of the moderator variable
EXAMPLE:
Suppose I find that there is no relationship between gender and hiring rate; women are just as likely to get hired as men
Now suppose I also examine the type of jobs they are applying for as a moderator
When women apply for stereotypically female jobs they have a higher hiring rate, but when apply for stereotypically masculine jobs they have a much lower hiring rate
The opposite is true for men
Important because while initial results show no relationship, there are important implications for women
10. Mediator Variable Special type of IV that accounts for the relation between the IV and the DV.
Mediation implies a causal relation in which an IV causes a mediator which causes a DV.
IV MED DV
Example:
IV = negative feedback
MED = negative thoughts
DV = willingness to participate
11. Moderator vs. Mediator A moderator variable is one that influences the strength of a relationship between two other variables.
A mediator variable is one that explains the relationship between the two other variables.
12. Example You are an I/O psychologist working for an insurance company. You want to assess which of two training methods is most effective for training new secretaries. You give one group of secretaries on-the-job training and a booklet to study at home. You give the second group of secretaries on-the-job training and have them watch a 30-minute video. A. What is the IV (training method)
B. What would be a logical DV (job performance, number of errors, etc.)
C. What variables might you want to control
D. Why might the field experiment involve less control than if it were a lab experiment?
E. Can you think of any other conditions you may want to test?A. What is the IV (training method)
B. What would be a logical DV (job performance, number of errors, etc.)
C. What variables might you want to control
D. Why might the field experiment involve less control than if it were a lab experiment?
E. Can you think of any other conditions you may want to test?
13. Step 2: Research Design A research design is the structure or architecture for the study.
A plan for how to treat variables that can influence results so as to rule out alternative interpretations.
Primary Research Methods:
Experimental (Laboratory vs. Field Research)
Quasi-Experimental
Non-Experimental (Observational, Survey)
Primary Research Methods: generate new information on a particular question
Experiment: 2 (or more) equivalent groups of P’s are treated exactly the same in all ways except the IV. Differences in measurements of the DV can be attributed to differences in the IV. (machines)
Random assignment: equivalent groups are necessary in experiment. This can be achieved through RA. Each P is equally likely to be assigned to each condition. RA ensures that P characteristics that may effect the DV are distributed evenly across groups. Does not guarantee equivalent groups; however, it ensures that differences between groups will occur due to chance and not be systematic.
participants are randomly selected and randomly assigned - difficult in real world - hard to get organizations to do this
Laboratory vs. Field (2001) 33% lab studies and 67% field studies published in IO.
Quasi-Experiment: P’s assigned to conditions but NO random assignment– much less control than in a true experiment (shift work ex)
Non-experiment: no control, no conditions
Observational design: The researcher observes employee behavior and systematically records what is observed.
Questionnaire (survey design): research strategy in which Ps are asked to complete a questionnaire/survey.
Self-report
Relatively fast and easy
Effective for:
Sensitive subjects, Large populations, Anonymity
Correlational: measure variables as they naturally occur and examine relations; good for getting initial data that might test later; methods are very common in I/O
Obtrusive vs. Unobtrusive
Primary Research Methods: generate new information on a particular question
Experiment: 2 (or more) equivalent groups of P’s are treated exactly the same in all ways except the IV. Differences in measurements of the DV can be attributed to differences in the IV. (machines)
Random assignment: equivalent groups are necessary in experiment. This can be achieved through RA. Each P is equally likely to be assigned to each condition. RA ensures that P characteristics that may effect the DV are distributed evenly across groups. Does not guarantee equivalent groups; however, it ensures that differences between groups will occur due to chance and not be systematic.
participants are randomly selected and randomly assigned - difficult in real world - hard to get organizations to do this
Laboratory vs. Field (2001) 33% lab studies and 67% field studies published in IO.
Quasi-Experiment: P’s assigned to conditions but NO random assignment– much less control than in a true experiment (shift work ex)
Non-experiment: no control, no conditions
Observational design: The researcher observes employee behavior and systematically records what is observed.
Questionnaire (survey design): research strategy in which Ps are asked to complete a questionnaire/survey.
Self-report
Relatively fast and easy
Effective for:
Sensitive subjects, Large populations, Anonymity
Correlational: measure variables as they naturally occur and examine relations; good for getting initial data that might test later; methods are very common in I/O
Obtrusive vs. Unobtrusive
14. Step 2: Research Design Secondary Research Methods
Meta-analysis: statistical method for combining/analyzing the results from many studies to draw a general conclusion about relationships among variables (p.61).
Qualitative Research Methods
Rely on observation, interview, case study, and analysis of diaries to produce narrative descriptions of events or processes.
SECONDARY: uses existing data from previous research; most common is meta-analysis; used in integrate the findings of previous research; useful for topics that have received a lot of research; greatly increases N; requires a lot of subjective decisions
Historically, IO psychologists used quantitative methods for measuring important variables/behavior. These methods rely heavily on tests, rating scales to yield results in numbers.
QUALITATIVE:
Researcher takes an active part in becoming involved with the subjects and interacting with them in-depth
Use observations, interviews, case studies, diaries, ESM
still involves statistics (critical incidents); data is just gathered in a different way
more in-depth
Less common in I/O, but becoming more popular – I/O is very quantitative and hence skeptical of this method since numbers and stats conform to traditional view of science.
Qualitative and quantitative not mutually exclusive
SECONDARY: uses existing data from previous research; most common is meta-analysis; used in integrate the findings of previous research; useful for topics that have received a lot of research; greatly increases N; requires a lot of subjective decisions
Historically, IO psychologists used quantitative methods for measuring important variables/behavior. These methods rely heavily on tests, rating scales to yield results in numbers.
QUALITATIVE:
Researcher takes an active part in becoming involved with the subjects and interacting with them in-depth
Use observations, interviews, case studies, diaries, ESM
still involves statistics (critical incidents); data is just gathered in a different way
more in-depth
Less common in I/O, but becoming more popular – I/O is very quantitative and hence skeptical of this method since numbers and stats conform to traditional view of science.
Qualitative and quantitative not mutually exclusive
15. Evaluating Research Design Internal validity (Control)
Does X cause Y?
Lab studies eliminate distracting variables through experimental control.
Using of statistical techniques to control for the influences of certain variables is statistical control.
External validity (Generalizability)
Does the relation of X and Y hold in other settings and with other participants and stimuli? Internal Validity: Degree to which the results of my research are due to the variables being studied as opposed to some other explanation; goal of research is to rule out as many possible alternative hypotheses as possible
External Validity and Generalizability are the same thing; Generalizability is very important because without it, the findings can not be used in any other context (setting or sample) (EX: study stereotypes with Psych subject pool at Purdue and want to generalize to individuals age 18-21 – what is the problem with this?)
Can’t have external validity without internal validity; necessary conditionInternal Validity: Degree to which the results of my research are due to the variables being studied as opposed to some other explanation; goal of research is to rule out as many possible alternative hypotheses as possible
External Validity and Generalizability are the same thing; Generalizability is very important because without it, the findings can not be used in any other context (setting or sample) (EX: study stereotypes with Psych subject pool at Purdue and want to generalize to individuals age 18-21 – what is the problem with this?)
Can’t have external validity without internal validity; necessary condition
16. Threats to Internal Validity History
Instrumentation
Selection
Maturation
Mortality/Attrition
Testing
Experimenter Bias
Awareness of Being a Subject
**There are an infinite number of threats – these are some of the most common
History: what’s happening at the time of the research that might influence the relation of X to Y
Instrumentation: does the relation of X to Y reflect changes in the instrument or measure?
Selection: does the relation of X to Y reflect biases in sampling of participants? Pervasive in quasi-exps.
Maturation: does the relation of X to Y reflect natural growth or development?
Regression to the Mean: SI Curse; does X cause decrease in Y, or is this just a function of previous high performance that automatically decreased
Mortality: does the relation of X to Y reflect differences in drop out rates?
Testing: does the relation of X to Y reflect the use of a pretest? practice effect?
Experimenter Bias: does the relation reflect differences in experimenter behavior as a result of expectations of the outcome (often based on hypotheses)
Awareness of being a subject: does relation of X to Y reflect subject awareness
**There are an infinite number of threats – these are some of the most common
History: what’s happening at the time of the research that might influence the relation of X to Y
Instrumentation: does the relation of X to Y reflect changes in the instrument or measure?
Selection: does the relation of X to Y reflect biases in sampling of participants? Pervasive in quasi-exps.
Maturation: does the relation of X to Y reflect natural growth or development?
Regression to the Mean: SI Curse; does X cause decrease in Y, or is this just a function of previous high performance that automatically decreased
Mortality: does the relation of X to Y reflect differences in drop out rates?
Testing: does the relation of X to Y reflect the use of a pretest? practice effect?
Experimenter Bias: does the relation reflect differences in experimenter behavior as a result of expectations of the outcome (often based on hypotheses)
Awareness of being a subject: does relation of X to Y reflect subject awareness
17. Step 3: Measurement Goal: Quantify the IV and DV
Psychological Measurement – the process of quantifying variables (called constructs)
“The process of assigning numerical values to represent individual differences, that is, variations among individuals on the attribute of interest”
A “Measure” …
Any mechanism, procedure, tool, etc, that purports to translate attribute differences into numerical values
Now we have a question, have identified the IV & DV and we have selected the most appropriate research design for our study, now what?
Now we need to figure out how to turn the IVs and DVs into numbers so that we can conduct the appropriate statistical analyses to answer our question.
Measurement reflects the process by which we attempt to assess magnitude differences in the attribute of interest, and represent those differences as numbers
E.g., a yard-stick assesses differences in length
Now we have a question, have identified the IV & DV and we have selected the most appropriate research design for our study, now what?
Now we need to figure out how to turn the IVs and DVs into numbers so that we can conduct the appropriate statistical analyses to answer our question.
Measurement reflects the process by which we attempt to assess magnitude differences in the attribute of interest, and represent those differences as numbers
E.g., a yard-stick assesses differences in length
18. Step 3: Measurement Two classes of measured variables:
Categorical (or Qualitative)
Differ in type but not amount
Continuous (or Quantitative)
Differ in amount
Categorical
-not inherently numerical
-ex: gender, employment status, job type
Continuous
- objective – naturally numerical – money, time, distance
- subjective – have “amounts” of but don’t naturally come in numbers
-when you think of 2 people, you might say one has more than the other
ex – happiness, friendliness, satisfaction
After we have collected the data we must make sense of it.Categorical
-not inherently numerical
-ex: gender, employment status, job type
Continuous
- objective – naturally numerical – money, time, distance
- subjective – have “amounts” of but don’t naturally come in numbers
-when you think of 2 people, you might say one has more than the other
ex – happiness, friendliness, satisfaction
After we have collected the data we must make sense of it.
19. Step 4: Data Analysis Statistics are what we use to summarize relationship among variables and to estimate the odds that they reflect more than mere chance
Descriptive Statistics: Summarize, organize, and describe a sample of data.
Inferential Statistics: Used to make inferences from sample data to a larger sample or population.
Distributions Descriptive Statistics:describe the population we used
Inferential Statistics: infer whether differences found in sample are likely to occur in population
Distributions (distributions of frequency):
Many naturally-occurring things are normally distributed (height, weight, IQ) – call this the “Bell Curve”
Many statistical principals are based on this assumption
Many things are not normal
Number of thefts among employees tends to be positively skewed
Supervisor ratings tend to be negatively skewedDescriptive Statistics:describe the population we used
Inferential Statistics: infer whether differences found in sample are likely to occur in population
Distributions (distributions of frequency):
Many naturally-occurring things are normally distributed (height, weight, IQ) – call this the “Bell Curve”
Many statistical principals are based on this assumption
Many things are not normal
Number of thefts among employees tends to be positively skewed
Supervisor ratings tend to be negatively skewed
20. Descriptive Statistics Measures of Central Tendency
Mean, Median, Mode
Measures of Variability
Range, Variance, SD
Descriptive statistics used to describe frequency distributions.
Measures of Central Tendency: stats that indicate where the center of a distribution is located.
Mean – mathematical average or scores in a distribution (sum of x’s/sample size).
Median – middle score of a distribution – 50% above, 50% below
Mode – most frequent occurring number (least used)
Measures of Variability – spread or dispersion of values in a distribution (the extent to which scores in a distribution vary)
Range – full range of dispersion – highest minus the lowest
Variance – average distance of scores in a distribution from the mean – have to used square values because regular values would always sum to 0
SD – square root of variance – puts back in original units
(68% fall within =/- 1 SD in a normal distribution; if distribution is normal, SDs give us idea of where a particular score falls)Descriptive statistics used to describe frequency distributions.
Measures of Central Tendency: stats that indicate where the center of a distribution is located.
Mean – mathematical average or scores in a distribution (sum of x’s/sample size).
Median – middle score of a distribution – 50% above, 50% below
Mode – most frequent occurring number (least used)
Measures of Variability – spread or dispersion of values in a distribution (the extent to which scores in a distribution vary)
Range – full range of dispersion – highest minus the lowest
Variance – average distance of scores in a distribution from the mean – have to used square values because regular values would always sum to 0
SD – square root of variance – puts back in original units
(68% fall within =/- 1 SD in a normal distribution; if distribution is normal, SDs give us idea of where a particular score falls)
21. Differences in Variance
22. Inferential Statistics Compares a hypothesis to an alternative
Statistical Significance: The likelihood that the observed difference would be obtained if the null hypothesis were true
Statistical Power: Likelihood of finding a statistically significant difference when a true difference exists Don’t need to know how these are calculated, but these are what are typically used to test hypotheses and will be presented after descriptive stats in a research article.
The smaller the sample size, the lower the power to detect a difference; can do power analyses before conducting a study in order to determine necessary sample sizeDon’t need to know how these are calculated, but these are what are typically used to test hypotheses and will be presented after descriptive stats in a research article.
The smaller the sample size, the lower the power to detect a difference; can do power analyses before conducting a study in order to determine necessary sample size
23. Correlation Correlation
Used to assess the relationship between 2 variables
Represented by the correlation coefficient “r”
r can take on values from –1 to +1
Size denotes the magnitude of the relationship
0 means no relationship
Correlation – the degree of relationship between two variables
Many types but we’re going to concentrate on one – correlation
Sign – denotes positive or negative relationship
Positive relationship: as one variable goes up, the other variable goes up
Negative relationship: as one variable goes up, the other variable goes down
Size – denotes magnitude of relationship
Independent of sign: -.80 is same strength as +.80
Correlation – the degree of relationship between two variables
Many types but we’re going to concentrate on one – correlation
Sign – denotes positive or negative relationship
Positive relationship: as one variable goes up, the other variable goes up
Negative relationship: as one variable goes up, the other variable goes down
Size – denotes magnitude of relationship
Independent of sign: -.80 is same strength as +.80
24. Correlation and Regression Correlation
Scatterplot
Regression Line
Linear vs. Non-Linear
Multiple Correlations
Correlation and Causation Correlation – the degree of relationship between two variables
Use a scatterplot to plot the relationship between two variable
Regression Line – line that bests fits the data; can use to predict other scores
Very hard to “eyeball” the relationship between two variables based on a scatter plot; easier with the regression line
Correlation Coefficient – strength of the relationship
Positive or negative, from -1 to +
The higher the correlation, the more accurately we can predict one score from another
A = .93, B = .71, D = -.51, E = -.04
Linear vs. Non-Linear
A zero correlation indicates that there is no linear relationship, but there may be a relationship not detected by correlation
Take the example of stress and performance; curvilinear relationship, but correlation would be 0
Many relationships in I/O are linear, but some aren’t
Multiple Correlations
The relationship between multiple predictors and a single outcome
Selection – use of multiple correlations; rarely use a single predictor
Correlation and Causation
does not allow for causality; Job Sat and Perf - which way?
Height and weight; correlated buy don’t cause each otherCorrelation – the degree of relationship between two variables
Use a scatterplot to plot the relationship between two variable
Regression Line – line that bests fits the data; can use to predict other scores
Very hard to “eyeball” the relationship between two variables based on a scatter plot; easier with the regression line
Correlation Coefficient – strength of the relationship
Positive or negative, from -1 to +
The higher the correlation, the more accurately we can predict one score from another
A = .93, B = .71, D = -.51, E = -.04
Linear vs. Non-Linear
A zero correlation indicates that there is no linear relationship, but there may be a relationship not detected by correlation
Take the example of stress and performance; curvilinear relationship, but correlation would be 0
Many relationships in I/O are linear, but some aren’t
Multiple Correlations
The relationship between multiple predictors and a single outcome
Selection – use of multiple correlations; rarely use a single predictor
Correlation and Causation
does not allow for causality; Job Sat and Perf - which way?
Height and weight; correlated buy don’t cause each other
25. Prediction of the DV with one IV Correlations allow us to make predictions
26. Interpretation: Evaluating Measures
How do you determine the usefulness of the information gathered from our measures?
The Answer:
Reliability Evidence
Validity Evidence
Consistency
AccuracyConsistency
Accuracy
27. Interpretation: Evaluating Measures Reliability: Consistency or stability of a measure.
A measure should yield a similar score each time it is given
We can get a reliable measure by reducing errors of measurement: any factor that affects obtained scores but is not related to the thing we want to measure.
Errors of measurement
Random factors, practice effects, etc.
Evaluating Measures
Reliability: degree to which a measure is free of random error and consistent in the numbers assigned to objects or events; thermometer that says 30 every time it is a given temperature - nothing to do with being “right”
Person factors: mood, anxiety, hunger
Environmental factors: temperature, testing conditions, time of day
Practice or memory effects
- if give same intelligence measure ten times, people may get better scores but not because more intelligent but because they have had practice at the test – reflects error
- if gave same test twice, may remember the items
- if not trying to measure practice or memory, changes in the scores reflect errors of measurement
Evaluating Measures
Reliability: degree to which a measure is free of random error and consistent in the numbers assigned to objects or events; thermometer that says 30 every time it is a given temperature - nothing to do with being “right”
Person factors: mood, anxiety, hunger
Environmental factors: temperature, testing conditions, time of day
Practice or memory effects
- if give same intelligence measure ten times, people may get better scores but not because more intelligent but because they have had practice at the test – reflects error
- if gave same test twice, may remember the items
- if not trying to measure practice or memory, changes in the scores reflect errors of measurement
28. Evaluating Measures: Reliability Test-Retest (Index of Stability)
Method: Give the same test on two occasions and correlate sets of scores (coefficient of stability)
Error: Anything that differentially influences scores across time for the same test
Issue: How long should the time interval be?
Limitations:
Not good for tests that are supposed to assess change
Not good for tests of things that change quickly (i.e., mood)
Difficult and expensive to retest
Memory/practice effects are likely
Reliability Indexes: correlation between two readings
Test-Retest: relationship between scores at two points in time
those who score high should score high again (i.e., the rank order should stay the same)
Equivalent Forms: requires making two equivalent tests
very hard to make two genuinely equivalent tests
often very useful for test security reasons
Internal consistency: extent to which items measure same thing
assessed by looking at the relationship among all the items
if the responses to items are highly correlated, the test is said to be measuring one thing and to be internally consistent
Inter-Rater:
many assessment exercises use raters to assess performance on the exercise; these assessments are not always objective and can lead to discrepancies; EX: American Idol
use inter-rater reliability to measure agreement among raters; if raters do not agree, we conclude that the behavior was not reliably observed
Reliability Indexes: correlation between two readings
Test-Retest: relationship between scores at two points in time
those who score high should score high again (i.e., the rank order should stay the same)
Equivalent Forms: requires making two equivalent tests
very hard to make two genuinely equivalent tests
often very useful for test security reasons
Internal consistency: extent to which items measure same thing
assessed by looking at the relationship among all the items
if the responses to items are highly correlated, the test is said to be measuring one thing and to be internally consistent
Inter-Rater:
many assessment exercises use raters to assess performance on the exercise; these assessments are not always objective and can lead to discrepancies; EX: American Idol
use inter-rater reliability to measure agreement among raters; if raters do not agree, we conclude that the behavior was not reliably observed
29. Evaluating Measures: Reliability Equivalent Forms (Index of Equivalence)
Method: Give two versions of a test and correlate scores (coefficient of equivalence)
Reflects the extent to which the two different versions are measuring the same concept in the same way
Issue: are tests really parallel?; length of interval?
Limitations:
Difficult and expensive
Testing time
Unique estimate for each interval
Evaluating Measures
Reliability: degree to which a measure is free of random error and consistent in the numbers assigned to objects or events; thermometer that says 30 every time it is a given temperature - nothing to do with being “right”
Reliability Indexes: correlation between two readings
Test-Retest: relationship between scores at two points in time
those who score high should score high again (i.e., the rank order should stay the same)
Equivalent Forms: requires making two equivalent tests
very hard to make two genuinely equivalent tests
often very useful for test security reasons
Internal consistency: extent to which items measure same thing
assessed by looking at the relationship among all the items
if the responses to items are highly correlated, the test is said to be measuring one thing and to be internally consistent
Inter-Rater:
many assessment exercises use raters to assess performance on the exercise; these assessments are not always objective and can lead to discrepancies; EX: American Idol
use inter-rater reliability to measure agreement among raters; if raters do not agree, we conclude that the behavior was not reliably observed
Evaluating Measures
Reliability: degree to which a measure is free of random error and consistent in the numbers assigned to objects or events; thermometer that says 30 every time it is a given temperature - nothing to do with being “right”
Reliability Indexes: correlation between two readings
Test-Retest: relationship between scores at two points in time
those who score high should score high again (i.e., the rank order should stay the same)
Equivalent Forms: requires making two equivalent tests
very hard to make two genuinely equivalent tests
often very useful for test security reasons
Internal consistency: extent to which items measure same thing
assessed by looking at the relationship among all the items
if the responses to items are highly correlated, the test is said to be measuring one thing and to be internally consistent
Inter-Rater:
many assessment exercises use raters to assess performance on the exercise; these assessments are not always objective and can lead to discrepancies; EX: American Idol
use inter-rater reliability to measure agreement among raters; if raters do not agree, we conclude that the behavior was not reliably observed
30. Evaluating Measures: Reliability Internal Consistency Reliability
Method: take a single test and look at how well the items on the test relate to each other
Split-half: similar to alternate forms (e.g., odd vs. even items)
Cronbach’s Alpha: mathematically equivalent to the average of all possible split-half estimates
Limitations
Only use for multiple item tests
Some “tests” are not designed to be homogeneous
Doesn’t assess stability over time
Class of methodsClass of methods
31. Evaluating Measures: Reliability Inter-Rater Reliability
Method: two different raters rate the same target and the ratings are correlated
Correlation reflects the proportion of consistency among the ratings
Issue: reliability doesn’t imply accuracy
Limitations
Need informed, trained raters
Ratings are not a good way to measure many attributes
32. Interpretation: Evaluating Measures Validity:
The accurateness of inferences made based on data.
Whether a measure accurately and completely represents what was intended to be measured.
Validity is not a property of the test
It is a property of the inferences we make from the test scores
VALIDITY: the extent to which the instrument is measuring what you are trying to measure; thermometer that says 30 every time it is 60 is not valid; it is, however, reliable; validity is not a function of the measure but what we use it for
While validity is essentially one thing, we use several indexes to measure it; ideally, should consider all of these
Validity Indexes:
Criterion-Related Validity: how well a measure predicts some criterion/outcome
Probably the most practically significant of the three forms of validity
the one we use most often; easiest to defend
try to show that people who score high on the predictor also score high on the criterion
Use correlation to determine the relationship between variables
use validity coefficient (correlation coefficient) to assess strength of relationship; square r is % variance accounted for
Content Validity: how well a test measures the full range of domain; can’t measure this
Sometimes expressed as a proportion (test covers 80% of the identified content area)
sometimes use SMEs who are experts in the content area to assess content validity
Often use content validity for knowledge tests (police knowledge necessary for entry officers)
Construct Validity:
Construct: theoretical concepts we use to explain behavior; constructs are not tangible but we use concrete tests to assess them (leadership, motivation, extraversion)
Construct Validity: assesses how well measures reflects its underlying construct
Assessing Construct Validity
Convergent Validity: compare measure to other measures of the same construct; should be strongly related, although not so high that this test doesn’t measure anything new
Divergent Validity: compare measure to measures of unrelated constructs; should not be related (a test of physical strength should not be related to test of leadership)
Reliability is inherent in a test, that is a test is either reliable or not; Validity, however, depends on the use
The length of your foot would not be a valid predictor of intelligence; it would be a valid predictor of shoe size
Reliability is a necessary but not sufficient condition for Validity
VALIDITY: the extent to which the instrument is measuring what you are trying to measure; thermometer that says 30 every time it is 60 is not valid; it is, however, reliable; validity is not a function of the measure but what we use it for
While validity is essentially one thing, we use several indexes to measure it; ideally, should consider all of these
Validity Indexes:
Criterion-Related Validity: how well a measure predicts some criterion/outcome
Probably the most practically significant of the three forms of validity
the one we use most often; easiest to defend
try to show that people who score high on the predictor also score high on the criterion
Use correlation to determine the relationship between variables
use validity coefficient (correlation coefficient) to assess strength of relationship; square r is % variance accounted for
Content Validity: how well a test measures the full range of domain; can’t measure this
Sometimes expressed as a proportion (test covers 80% of the identified content area)
sometimes use SMEs who are experts in the content area to assess content validity
Often use content validity for knowledge tests (police knowledge necessary for entry officers)
Construct Validity:
Construct: theoretical concepts we use to explain behavior; constructs are not tangible but we use concrete tests to assess them (leadership, motivation, extraversion)
Construct Validity: assesses how well measures reflects its underlying construct
Assessing Construct Validity
Convergent Validity: compare measure to other measures of the same construct; should be strongly related, although not so high that this test doesn’t measure anything new
Divergent Validity: compare measure to measures of unrelated constructs; should not be related (a test of physical strength should not be related to test of leadership)
Reliability is inherent in a test, that is a test is either reliable or not; Validity, however, depends on the use
The length of your foot would not be a valid predictor of intelligence; it would be a valid predictor of shoe size
Reliability is a necessary but not sufficient condition for Validity
33. Evaluating Measures: Validity Criterion-Related
Predictive
Concurrent
Content-Related
Construct-Related
Reliability is a necessary but not sufficient condition for validity
VALIDITY: the extent to which the instrument is measuring what you are trying to measure; thermometer that says 30 every time it is 60 is not valid; it is, however, reliable; validity is not a function of the measure but what we use it for
While validity is essentially one thing, we use several indexes to measure it; ideally, should consider all of these
Validity Indexes:
Criterion-Related Validity: how well a measure predicts some criterion/outcome
Probably the most practically significant of the three forms of validity
the one we use most often; easiest to defend
try to show that people who score high on the predictor also score high on the criterion
Use correlation to determine the relationship between variables
use validity coefficient (correlation coefficient) to assess strength of relationship; square r is % variance accounted for
Content Validity: how well a test measures the full range of domain; can’t measure this
Sometimes expressed as a proportion (test covers 80% of the identified content area)
sometimes use SMEs who are experts in the content area to assess content validity
Often use content validity for knowledge tests (police knowledge necessary for entry officers)
Construct Validity:
Construct: theoretical concepts we use to explain behavior; constructs are not tangible but we use concrete tests to assess them (leadership, motivation, extraversion)
Construct Validity: assesses how well measures reflects its underlying construct
Assessing Construct Validity
Convergent Validity: compare measure to other measures of the same construct; should be strongly related, although not so high that this test doesn’t measure anything new
Divergent Validity: compare measure to measures of unrelated constructs; should not be related (a test of physical strength should not be related to test of leadership)
Reliability is inherent in a test, that is a test is either reliable or not; Validity, however, depends on the use
The length of your foot would not be a valid predictor of intelligence; it would be a valid predictor of shoe size
Reliability is a necessary but not sufficient condition for Validity
VALIDITY: the extent to which the instrument is measuring what you are trying to measure; thermometer that says 30 every time it is 60 is not valid; it is, however, reliable; validity is not a function of the measure but what we use it for
While validity is essentially one thing, we use several indexes to measure it; ideally, should consider all of these
Validity Indexes:
Criterion-Related Validity: how well a measure predicts some criterion/outcome
Probably the most practically significant of the three forms of validity
the one we use most often; easiest to defend
try to show that people who score high on the predictor also score high on the criterion
Use correlation to determine the relationship between variables
use validity coefficient (correlation coefficient) to assess strength of relationship; square r is % variance accounted for
Content Validity: how well a test measures the full range of domain; can’t measure this
Sometimes expressed as a proportion (test covers 80% of the identified content area)
sometimes use SMEs who are experts in the content area to assess content validity
Often use content validity for knowledge tests (police knowledge necessary for entry officers)
Construct Validity:
Construct: theoretical concepts we use to explain behavior; constructs are not tangible but we use concrete tests to assess them (leadership, motivation, extraversion)
Construct Validity: assesses how well measures reflects its underlying construct
Assessing Construct Validity
Convergent Validity: compare measure to other measures of the same construct; should be strongly related, although not so high that this test doesn’t measure anything new
Divergent Validity: compare measure to measures of unrelated constructs; should not be related (a test of physical strength should not be related to test of leadership)
Reliability is inherent in a test, that is a test is either reliable or not; Validity, however, depends on the use
The length of your foot would not be a valid predictor of intelligence; it would be a valid predictor of shoe size
Reliability is a necessary but not sufficient condition for Validity
34. Content Validity The extent to which a predictor provides a representative sample of the thing we’re measuring
Example: First Exam
Content: history, research methods, criterion theory, job analysis, measurement in selection
Evidence
SME evaluation My test should be designed to assess your understanding and application of these concepts so I should pick test questions that will tap these different areas
These areas are basically defining the construct I’m interested in assessing: stuff from the 1st part of the course
If I only asked you questions about the history of I/O Psychology, my test would not be very content valid because I’m not sampling each of the different areas of my construct
If I gave you this test and got your scores back, I really would not be able to make inferences about how well you know all these different areas because I only asked you about the history of I/O
To make this test content valid, I would have to have questions that tapped into all of the areas
How do we assess content validity?
Evaluations of content validity are made by SMEs – people considered experts on the construct of interest
Strong link to job analysis
Employers develop tests that should assess the KSAOs of a job
Need to assess how much the content of these tests is related to the actual job
Need to show that the test is tapping into those KSAOs that are actually important to the job
A related validity is face validity
My test should be designed to assess your understanding and application of these concepts so I should pick test questions that will tap these different areas
These areas are basically defining the construct I’m interested in assessing: stuff from the 1st part of the course
If I only asked you questions about the history of I/O Psychology, my test would not be very content valid because I’m not sampling each of the different areas of my construct
If I gave you this test and got your scores back, I really would not be able to make inferences about how well you know all these different areas because I only asked you about the history of I/O
To make this test content valid, I would have to have questions that tapped into all of the areas
How do we assess content validity?
Evaluations of content validity are made by SMEs – people considered experts on the construct of interest
Strong link to job analysis
Employers develop tests that should assess the KSAOs of a job
Need to assess how much the content of these tests is related to the actual job
Need to show that the test is tapping into those KSAOs that are actually important to the job
A related validity is face validity
35. Criterion-Related Validity The extent to which a predictor relates to a criterion
Evidence
Correlation (called the validity coefficient)
A good validity coefficient is around .3 to .4
Concurrent Validity
Predictive Validity Concurrent - Validity coefficient – based on a study that measured predictor and criterion at the same time (give test that is supposed to predict pf to people on the job & measure their pf for the same time)
adv: speed, cheap
disadv: restriction of range – so underestimate validity; ex: NBA players and height as a predictor
Predictive – validity coefficient – based on study that measured predictor 1st & then later looked at criterion (give test to applicants & make selection decisions w/o using the info from the test; after on the job for a while collect pf info)
more powerful, but more costly & harder to do
Concurrent - Validity coefficient – based on a study that measured predictor and criterion at the same time (give test that is supposed to predict pf to people on the job & measure their pf for the same time)
adv: speed, cheap
disadv: restriction of range – so underestimate validity; ex: NBA players and height as a predictor
Predictive – validity coefficient – based on study that measured predictor 1st & then later looked at criterion (give test to applicants & make selection decisions w/o using the info from the test; after on the job for a while collect pf info)
more powerful, but more costly & harder to do
36. Construct Validity The extent to which a test is an accurate representation of the construct it is trying to measure
Construct validity results from the slow accumulation of evidence (multiple methods)
Evidence:
Content validity and criterion-related validity can provide support for construct validity
Convergent validity
Divergent (discriminant) validity Convergent Validity - when test correlates highly with other tests of similar constructs
Divergent Validity – when test does not correlate with tests of unrelated constructs
Bushman & Wells (1998)
- ex of predictive criterion-related validity & convergent and divergent construct validity
The idea they had was to try to predict the number of aggressive penalties kids got over a season based on their scores from a measure of trait aggressiveness
- sample – 91 boys in a hockey league in Iowa
- IV – score on the physical aggression subscale of the aggression questionnaire, completed before the beginning of the season
- DVs – number of minutes penalized for aggressive penalties – roughing, tripping, slashing
- number of minutes penalized for non-aggressive penalties – delay of game, no mouth guard
- analysis
- r = .33 b/t trait aggressiveness & number of aggressive penalty minutes
- r = .04 b/t trait aggressiveness & number of non-aggressive penalty minutes
Convergent Validity - when test correlates highly with other tests of similar constructs
Divergent Validity – when test does not correlate with tests of unrelated constructs
Bushman & Wells (1998)
- ex of predictive criterion-related validity & convergent and divergent construct validity
The idea they had was to try to predict the number of aggressive penalties kids got over a season based on their scores from a measure of trait aggressiveness
- sample – 91 boys in a hockey league in Iowa
- IV – score on the physical aggression subscale of the aggression questionnaire, completed before the beginning of the season
- DVs – number of minutes penalized for aggressive penalties – roughing, tripping, slashing
- number of minutes penalized for non-aggressive penalties – delay of game, no mouth guard
- analysis
- r = .33 b/t trait aggressiveness & number of aggressive penalty minutes
- r = .04 b/t trait aggressiveness & number of non-aggressive penalty minutes
37. Step 5: Conclusions From Research You are making inferences!
What if it you’re inferences seem “wrong”?
Theory is wrong?
Information (data) is bad?
Bad measurement?
Bad research design?
Bad sample?
Analysis was wrong? Research is a Cumulative Process – we rarely reach any kind of conclusions from a single study
Dissemination happens through conferences and journals
Every study has boundary conditions for generalizing- book talks about these
Representativeness of subjects (college students)
Fit between subjects and task (relevant to I/O because students have less job experience)
Research method
A lot of research is serendipity – chance occurences
Research is a Cumulative Process – we rarely reach any kind of conclusions from a single study
Dissemination happens through conferences and journals
Every study has boundary conditions for generalizing- book talks about these
Representativeness of subjects (college students)
Fit between subjects and task (relevant to I/O because students have less job experience)
Research method
A lot of research is serendipity – chance occurences
38. Step 5: Conclusions From Research Cumulative Process
Dissemination
Conference presentations & journal publications
Boundary conditions
Generalizability
Causation
Serendipity Research is a Cumulative Process – we rarely reach any kind of conclusions from a single study
Dissemination happens through conferences and journals
Every study has boundary conditions for generalizing- book talks about these
Representativeness of subjects (college students)
Fit between subjects and task (relevant to I/O because students have less job experience)
Research method
A lot of research is serendipity – chance occurences
Research is a Cumulative Process – we rarely reach any kind of conclusions from a single study
Dissemination happens through conferences and journals
Every study has boundary conditions for generalizing- book talks about these
Representativeness of subjects (college students)
Fit between subjects and task (relevant to I/O because students have less job experience)
Research method
A lot of research is serendipity – chance occurences
39. Research Ethics Informed consent
Welfare of subjects
Conflicting obligations to the organization and to the participants Informed consent
Benefits must outweigh any potential harm
Informed consent
Benefits must outweigh any potential harm