860 likes | 1.27k Views
PCEPS Person Centred and Experiential Psychotherapy Scale. Field trial results and next steps Robert Elliott , Beth Freire and Graham Westwell PCE 2012, Antwerp, 9 July 2012. Hold theory lightly.
E N D
PCEPSPerson Centred and Experiential Psychotherapy Scale Field trial results and next steps Robert Elliott, Beth Freire and Graham Westwell PCE 2012, Antwerp, 9July 2012
Hold theory lightly “Person-centred theory gives us a sense of direction in what we do as counsellors, but no theory is ever adequate to explain the complexity of human relationships. In a counselling context, theory should be held lightly. It is always inadequate in that it reduces complexity to a series of simple statements.” Tony Merry (2003, p 174)
Overview of the PCEPS The PCEPS operationalizes widely-held competences for humanistic psychotherapy and counselling, and represents an extended effort to create a dialogue between classical person-centred and experiential "tribes" within the humanistic approaches. It was developed to support randomised clinical trials of person-centred-experiential psychotherapy/counselling but could also be used as an outcome measure in training studies. It has many potential uses in professional training, ranging from initial counselling skill practice to professional accreditation and continuing professional development.
Why develop the PCEPS? There is an increasing need for the development of the evidence base of person-centred and experiential therapies Assessment of ‘treatment integrity’ is an essential component of psychotherapy trials: • Therapist adherence to the therapy manual • Therapy is being performed competently “Competence presupposes adherence, but adherence does not necessarily imply competence” (Waltz, Addis, Coerner, & Jacobson, 1993).
PCEPS subscales • Differentiated 2 subscales: • Person-centred processes: • 10 items – focused on therapist’s ‘Person-centred Relationship attitudes’ • Experiential processes: • 5 items – focused on process facilitation of emotional exploration and differentiation
PCEPS design features • A behaviourally anchored rating scale. • Six point anchor scale is the common structure within the instrument: • 1 is always total absence of the quality/skill • 4 is always adequate presence of the quality/skill • 6 is always excellent presence of the quality/skill • High degree of specificity and differentiation within the instrument. • Highly descriptive – giving examples of poor practice and best practice. • Differentiated subscales accommodates theoretic modalities and allows comparisons between the two.
PCEPSPerson Centred and Experiential Psychotherapy Scale Inter-rater reliability study Generalizability study Convergent validity study
How consistent are ratings and raters on the PCEPS? Inter-rater Reliability and Item Structure Elizabeth Freire, Robert Elliott & Graham Westwell
Aim • To assess the reliability of the Person Centered and Experiential Psychotherapy Scale (PCEPS)
Material • 60 therapy sessions in total sampledfrom the archive of the Strathclyde Counselling and Psychotherapy Research Clinic: • 20 clients • 10 therapists (2 clients per therapist). • 3 sessions per client (first, middle, and last third of therapy) • 2 segments from each session (first and second half of the session)
Study Design • 10 Therapists sampled in each protocol • 6 Social Anxiety: • 3 EFT therapists • 3 PCA therapists • 4 Practice Based • 4 PCA therapists 2 clients per therapist sampled (20 total) 3 Sessions per client sampled (60 total) 2 segments per session sampled (120 total)
Therapists • 3 EFT therapists • 7 PCT therapists • 2experienced therapists • 5counsellors in training
Clients • 10Social Anxiety protocol (with experienced therapists) • 10Practice-based protocol (with therapists in training) • Clientssampledifmetfollowingcriteria: • Hadsignedconsentforms for material tobeused • Hadcompletedsufficientsessions • Audiorecordingsofrelevantsessions were present • Relational data (necessary for convergentvaliditystudy) were present
Length of segments • Half of the segments = 10 min long • The other half = 15 min long
Raters • 6 raters • 3 qualified and experienced person-centred therapists • 3 counselling trainees in their first year of training • 2 teams of 3 raters each • Group A (2 qualified therapists, 1 trainee) • Group B (1 qualified therapist, 2 trainees)
Rater training • Initial 12-hour training on the use of the PCEPS • Fortnightly 2-hour monitoring meetings = supervision and feedback on ratings
Procedure • Each rater rated 60 audio-recorded segments (one from each of the 60 sessions). • Segments listened to by the two groups of raters (from the same session) were different segments. • Raters were not informed which audio-recordings were of which type of therapy. • Raters knew some of the therapists being rated, including two of the investigators.
Inter-rater reliabilities • Mean inter-rater reliabilities (Cronbach’s alpha) for individual items varied from .68 to .86 • Average inter-raterreliability across the 15 items was .78 • Inter-raterreliability of the 15 items when averaged together was .87
Inter-item reliability • Inter-item reliability (Cronbach’s alpha) for total scale (item scores averaged across raters) was .98 • High degree of internal consistency for the instrument. • This alpha is too high? (see Robert’s Generalizability study results)
Factor analysis • Exploratory factor analyses revealed: • a 12-item ‘facilitative relationship’ scale that cut across both Person-Centred and Experiential subscales (alpha=.98) • (Items – (PC1) Client Frame of Reference/Track; (PC2) Core Meaning; (PC3) Client Flow; (PC4) Warmth; (PC7) Accepting Presence; (PC8) Genuineness; (PC9) Psychological Holding; (E1) Collaboration; (E2) Experiential Specificity; (E3) Emotion Focus; (E4) Client Self-Development; (E5) Emotion Regulation Sensitivity. • a 3-item ‘non-facilitative directiveness’ factor (alpha=.89) • (Items – (PC5) Clarity of Language; (PC6) Content Directiveness; (PC10) Dominant or Overpowering Presence.
What Affects Ratings on the PCEPS?A Generalisability Theory Analysis Robert Elliott, Graham Westwell & Beth Freire University of Strathclyde
Aims • Carry out a Generalisability Theory/components of variance study of the PCEPS in order to inform decisions about how best to sample psychotherapy/counselling sessions for adherence/competence evaluations.
The PCEPS study - Method • Two teams of three raters • Used PCEPS to independently rate therapy sessions. • Carefully selected from Strathclyde Therapy Research Centre archive. • Complex Generalisability study of method factors that might affect ratings.
Generalisability Study Design: 12 facets of Observation: • 1. Items • 2. Person-centred vs experiential subscales • 3. Raterswithin teams • 4. Rating teams • 5. More vs less therapeutically experienced raters • 6. Early vs late segments within sessions • 7. 10 vs 15 min segments • 8. Early vs middle vs late sessions within clients • 9. Therapists • 10. Clients within therapists • 11. Student vs professional level therapists • 12. Person-centred vs emotion-focused professional level therapists.
1. Items Facet • See reliability study (Beth) • Overall inter-item reliability (scores averaged X 6 raters): • Alpha = .98 • Implication: The PCEPS has over-sampled the item facet and needs to reduce the number of items within the scale. • Our recommendation: Reduce the PCEPS from 15 items to a 10 item short form (7 PC, 3 E items)
2. Subscale Facet: Person-Centred vs. Experiential - 1 • See reliability study (Beth) • Overall inter-subscale reliability • Session-level scores averaged X 6 raters (n =60) • Alpha = .93 (r = .92) • Univariate test for differences (within-participants): • PC subscale: m = 3.81 (sd = .73), greater than: • E subscale: m = 3.40 (sd = 1.03) • t = 6.92 (p < .001) • d = .45 (medium effect size)
2. Subscale Facet: Person-Centred vs. Experiential - 2 • Implication: Person-Centred and Experiential subscales are highly overlapping • But: PC scores are higher than Experiential scores • This reflects domain sampling/content validity • Not an empirically-based distinction • Recommendation: Generate a main score from a 10- item single index: • Person Centred Experiential Subscale: Person-centred (4 items) + Experiential (3 items) • Directiveness subscale (3 items)
3. Rater Facet (within Teams) • See reliability study (Beth) • Overall inter-rater reliability • Scores averaged X 15 items • Session level alpha (6 raters X 2 segments; n = 60): .91 • Segment level alpha (3 raters; n = 60 X 2 segments): .88 (Team A = Team B) • Smaller for individual items (mean alpha = .78) • Implications: • Average across items to increase inter-rater reliability • Three raters is ideal (as a high standard of reliability is needed for such high-stakes testing) • The Spearman-Brown predicted values for fewer raters are: • 2 raters: alpha = .83; 1 rater: alpha = .71 • This is a risky strategy given the high-stakes testing.
4. Rating Team Facet • Compared rating teams between segments, • Scores averaged X 15 items and X 3 raters/team • Confound differences between teams and segments • Overall inter-team reliability (session level analyses, n = 60) • Alpha = .85 (correlation = .77) • Note: no difference in inter-rater reliabilities X teams: • Team A = Team B (Alpha = .88) • Univariatetest for differences (dependent t-test): • Team A: m = 3.55 (sd = 1.0), which is less than: • Team B: m = 3.80 (sd = .73) • t = -2.97 (p < .01); d = -.29 (small effect)
4. Rating Team facet - 2 • Conclusion: excellent consistency across rater teams, in spite of having rated different segments within sessions • Also: identical levels of reliability • But: one team gave higher ratings than the other. • This is problematic for a scale on which absolute levels are important. • Possible sources: rater team culture; confounded factors • Recommendation: Different rating teams produce comparable scores in relative terms but perhaps not in absolute terms • Need to explore sources of difference in rated level • Confounded factors (eg, rater experience level, segment differences) => see following slides
5. Rater Therapeutic Experience Facet: More vs Less - 1 • Overall inter-group reliability: • 3 more experienced raters vs 3 less experienced raters • Scores averaged X 15 items • Session level rater group alpha = .92 (correlation = .85) • Inter-rater reliability (X segments/rater teams) • Experienced raters: alpha = .81 • Inexperienced raters: alpha = .84 • Univariate test for differences (dependent t-test): • Inexperienced raters: m = 3.70 (sd = .86) • Experienced raters: m = 3.64 (sd = .83) • t = 1.0 (NS); d = .07 (< small effect)
5. Rater Therapeutic Experience Facet: More vsless - 2 • Implications: • Across rater therapeutic experience levels, PCEPS ratings highly consistent, comparable levels and reliability • Vs. Carkuff: PCE competence raters don’t need to be highly clinically experienced • Recommendation: Can use either inexperienced or experienced raters
6. Early vsLate segments within sessions - 1 • Overall inter-segment reliability: • Early vs late segments • Scores averaged X 15 items • Session level rater group alpha = .82 (correlation = .71) • Univariatetest for differences (dependent t-test): • Early segments: m = 3.64 (sd = .96) • Late segments: m = 3.70 (sd = .80) • t = -.73 (NS); d = -.07 (< small effect)
6. Early vs late segments within sessions - 2 • Conclusion: • Across segments, PCEPS ratings highly consistent, comparable levels • Recommendation: Can use either early or late segments • Try: segment prior to 5 min before end of session • Cf. Herrmann & Imke, in press
7. 10 vs. 15 min segments within sessions - 1 • Overall inter-segment reliability: (15 items, 6 raters, 2 segments/session) • Short segments: alpha = .90 • Long segments: alpha = .92 • Univariatetest for differences (dependent t-test): • Short segments: m = 3.63 (sd = .83) • Long segments: m = 3.71 (sd = .82) • t = -.37 (NS); d = -.10 (small effect)
7. 10 vs 15 min segments within sessions - 2 • Conclusion: • Across segment length, PCEPS ratings comparable levels of reliability, mean scores • Recommendation: Can use either 10 or 15 segments • Try: “Working segment”: 15 – 20 segment prior to 5 min before end of session
8. Early vsMiddle vsLate sessions within clients - 1 • One-way ANOVA for mean differences in PCEPS ratings: • Early sessions (n = 21): m = 3.69 (sd = .78) • Middle sessions (n = 20): m = 3.65 (sd = .86) • Late sessions (n = 19): m = 3.68 (sd = .85) • F = .01 (NS); eta-squared = .00 (zero % variance accounted for • Overall inter-segment reliability: (15 items, 6 raters, 2 segments/session) • Early sessions: alpha = .88 • Middle sessions: alpha = .92 • End sessions: alpha = .92
8. Early vs middle vs late sessions within clients- 2 • Conclusion: • Across sessions, PCEPS ratings show comparable levels of reliability, mean scores • Recommendation: Don’t need to sample from throughout therapy • Try: Two sessions (eg early – middle)
9. Therapists10. Clients within therapists • One-way ANOVA for mean differences in PCEPS ratings across: • A: Clients X therapists together: eta-squared = .92 (F = 23.6; p < .001) • B: Therapists (n = 10): eta-squared = .88 (F = 40.7; p < .001) • C = A – B: Clients w/in therapists (n = 20): eta-squared = .04 • Conclusion: • Therapist differences overwhelm all other effects including differences between clients • Supports construct validity of measure • Recommendation: For PCEPS, it’s enough to sample one client per therapist.
11. Student vsProfessional level therapists - 1 • Overall inter-rater reliability (15 items, 6 raters) • Student therapists:alpha = .61 • Professional therapists:alpha = .85 • Univariate test for differences (independent t-test): • Student therapists: m = 3.08 (sd = .33), less than: • Professional therapists: m = 4.27 (sd = .72) • t = 8.27 (p < .001); d = 2.12 (=extremely large effect)
11. Student vsProfessional level therapists - 2 • Implications: • Poor reliability for student therapists • Even with 6 raters • Is this an order/practice effect?: • Student therapists were rated earlier in the PCEPS study • Recognisability/rater bias? • Recommendations: • Use raters who don’t know therapists • May need more raters for rating inexperienced therapists
12. Person-centred vs Emotion-focused professional level therapists • Univariate test for differences (independent t-test): • PC therapists (n=12 sessions): m = 4.38 (sd = .55) • EFT therapists (n=18 sessions): m = 4.20 (sd = .81) • t = .66 (NS); d = .27 (=small effect) • Implications: • Little if any difference between PC and EFT therapists • Recommendations: • PCEPS useful for both PCT and EFT • Differences between PCT and EFT may be exaggerated
Beyond ideology,Or: Back to the Process Itself • Is it worth continuing to argue at an ideological level over nondirectivityand process guiding? • Like Psychology, we have been neglecting the study of concrete behavior in favourof the ease of self-report data: • Both quantitative questionnaires & qualitative interviews • The PCEPS study illustrates the value of following the example of early Carl Rogers and colleagues • We need to return to the study of the therapy process.
How well do PCEPS raters agree with client and therapist perceptions of the relationship? A convergent validity study Graham Westwell, Robert Elliott and Beth Freire
Aim • The aim of this study is to measure the convergent validity of the PCEPS by measuring in what ways it correlates with similar client self-report instruments. “Validity is a more difficult concept to understand and to assess than reliability. The classical definition of validity is “whether the measure measures what it is supposed to measure.” Barker, Pistrang and Elliott (2002, p. 65)
Method • Audio segments rated using the PCEPS werespecifically chosen from the Research Centre archive to correspond with available ‘relational assessment’ data: • Working Alliance Inventory (Short Revised version) • Therapeutic Relationship Scale (Client) • Therapeutic Relationship Scale (Therapist) • Revised Session Reactions Scale • Inter-rater reliability analysis and rater scores were correlated against ‘relational assessment’ data using Pearson’s r.
Working Alliance Inventory – Short version Revised • The WAI-SR is a 12-item, self report, short version of the 36-item Working Alliance Inventory (Horvath & Greenberg, 1989) • This 12-item version is also based on Bordin’s (1979) model of the working alliance. • It consists of three 4-item sub-scales, based on a negotiated, collaborative relationship: • The quality of the interpersonal bond between client and therapist. • Agreement between client and therapist on the goals of therapy • Agreement between client and therapist that the tasks of therapy will address the problems the client brings.
Therapeutic Relationship Scale (TRS) • The Therapeutic Relationship Scale aims to measure the clients perception of the quality of the therapeutic relationship (Sanders, Freire and Elliott, 2007) from a specifically person-centred perspective. • The TRS is a 27 item scale, with 6 domains: • Empathy (5 items); Positive regard (3 items), Acceptance (3 items), Genuineness (4 items), Collaboration (3 items) and Deference (9 items). • The TRS has a 4-point rating-scale: ‘Not at all’, ‘A little’, ‘Quite a lot’ and ‘A great deal’ and scored between 0-3: • 13 of the items are reversed, so ‘Not at all’ = 3 and ‘A great deal = 0 • There are two versions (with corresponding items): • Therapeutic Relationship Scale, Client (TRS-C) • Therapeutic Relationship Scale, Therapist (TRS-T)