370 likes | 383 Views
Learn about standardized scales, their purpose, reliability, validity, and practical considerations. Understand how to evaluate and select scales for measuring constructs. Discover the process of administering, scoring, and interpreting standardized scales.
E N D
Standardization • Use of identical procedures to collect, score, interpret, and report results of a measure • Assures that differences over time or among different people are due to the variable being measured and not to different measurement procedures
What are Standardized Scales? • Set of uniform procedures to collect, score, interpret, and report numerical results • Usually have norms and empirical evidence of reliability and validity • Typically include multiple items aggregated into one or more composite scores • Frequently used to measure constructs
Construct • Complex concept (e.g., intelligence, well-being, depression) • Inferred or derived from a set of interrelated attributes (e.g., behaviors, experiences, subjective states, attitudes) of people, objects, or events • Typically embedded in a theory • Oftentimes not directly observable but measured using multiple indicators
Evaluating and Selecting Standardized Scales • Purpose • Reference populations and normative groups • Reliability • Validity • Practical considerations
Purpose • Identify whether or not a client has a significant problem • Measure and monitor your client’s outcomes to determine if your client is making satisfactory progress
Reference Population Population of people for which a measure is intended and from which a normative group is sampled and norms are created
Normative Group • Representative sample of a reference population, used to estimate norms for that population and, more generally, used to develop and test standardized measures • Also known as a “standardization group” or “standardization sample”
Reliability • Internal consistency reliability (coefficient alpha) (most important) • Interrater rater reliability (sometimes) • Test-retest reliability
Validity • Face • Content • Criterion • Construct • Sensitivity to change especially important
Practical Considerations • Time • Effort • Training • Cost • Availability • Acceptability (e.g., clients, practitioners, etc.)
Decisions, Decisions… • Who • Where • When • How often to collect outcome data
Who • Client • Practitioner • Relevant others • Independent evaluators
Where and When • Private, quiet, physically comfortable location • Complete at about the same time and under the same conditions on a regular basis
How Often • Regular, frequent, pre-designated intervals • Often enough to detect significant changes in the problem, but not so often that it becomes problematic • In general about once per week
Engage and Prepare Clients • Be certain the client understands and accepts the value and purpose of monitoring progress • Discuss confidentiality • Present measures with confidence • Don’t ask for info the client can’t provide
Engage and Prepare Clients (cont’d) • Be sure the client is prepared • Be careful how you respond to information • Use the information that is collected • Be careful how you respond to information • Use the information that is collected
Administering, Scoring, and Interpreting Standardized Scales • Score, scoring formula, composite score • Unidimensional and multidimensional scales • Cut scores • Reverse-worded items • Reliable change, reliable improvement, reliable deterioration • Clinically significant improvement • Expected treatment response
Score • Generic term for a number derived from a measure that represents the quantity or amount of an attribute or observation (e.g., number of times a behavior is observed, value obtained from a standardized scale) • Interpret in context of all available quantitative and qualitative information
Scoring Procedure by which data from a measure are used to produce a score (e.g., number of times a behavior occurs or value on a standardized scale) or category (e.g., diagnostic category)
Scoring Formula A mathematical rule by which data from a measure are used to produce a score (e.g., sum or average of responses to items on a multi-item standardized scale)
Composite Score Score that combines results from two or more related items or other measures using a specified formula (e.g. percentage of items answered correctly on a statistics test)
Unidimensional Scale Scale that measures a single attribute or construct (e.g., depression). (Contrast with multidimensional scale.)
Multidimensional Scale Scale that measures two or more distinct but related attributes or constructs, and measures of the different attributes or constructs are referred to as “subscales”
Cut Scores • Specific predetermined numerical values along a continuum of scores • Used to separate people into categories with distinct substantive interpretations (e.g., clinically depressed or not) • Used to make decisions (provide treatment for depression or not) • Only as good as the normative sample(s) on which it is derived • Interpret in context of all available quantitative and qualitative information
Reverse-Worded Item Item for which smaller numbers indicate a higher score on the measured variable because the item is worded to mean the opposite of the measured variable
Reliable Change Change in a score from one time to another that is more than expected just from random measurement error • Clinical significance.xls
Reliable Improvement Improvement in a score from one time to another that is more than expected just from random measurement error
Reliable Deterioration Deterioration in a score from one time to another that is more than expected just from random measurement error
Clinically Significant Improvement Change that occurs when a client’s measured functioning on a standardized scale is: • In the dysfunctional range before intervention (e.g., greater than 5 on the QIDS-SR) • In the functional range after intervention (e.g., 5 or below on the QIDS-SR) • Change is reliable
Clinically Significant Improvement (cont’d) • Interpret in context of all available quantitative and qualitative information • Does not guarantee a meaningful change in a client’s real-world functioning or quality of life • Only as good as the normative sample(s) on which it is derived • Does not speak to the question of whether it was your intervention or something else that caused the change
Expected Treatment Response • Session-by-session progress is determined in comparison to normative data from ongoing responses to treatment of thousands of clients • Feedback used in real time to monitor client progress and modify services as needed to reduce treatment failures and increase overall effectiveness
Global Rating Single rating based on a rater’s integration of information about numerous factors (e.g., global rating of change, improvement, or social functioning)
Single-Item Global Standardized Scales • Global Assessment of Functioning (GAF) • Children’s Global Assessment Schedule (CGAS) • Social and Occupational Functioning Assessment Scale (SOFAS) • Global Assessment of Relational Functioning (GARF)
Potential Advantages of Standardized Scales • Pretested for reliability and validity • Structured, so information less likely to be missed • Can be used to compare individual functioning to normative group functioning • Can be efficient and simple to use
Cautions in the Use of Standardized Scales • May not measure concept suggested by scale name • Different measures of the same concept may not be equivalent • Sometimes limited information about reliability and validity • Concepts as measured may not be completely relevant to individual clients
Resources • Compendiums of measures • See Appendix B • Web measurement resources • See Appendix B