Karis, Fabiani, & Donchin (1984) demonstrated two important phenomena:

The New Complex Trial Protocol for Deception Detection with P300: Mock Crime Scenario and Enhancements J. Peter Rosenfeld, John Meixner, Michael Winograd, Elena Labkovsky, Alex Sokolovsky, Xiaoxing Hu,Alex Haynes, Northwestern University

Karis, Fabiani, & Donchin (1984) demonstrated two important phenomena: 1) A list of words was learned and then a subset of these was presented among a larger set of other, novel words in a later test. It was found that the recalled, previously exposed, familiar words evoked larger P300s than novel words. 2) Some of the old words were initially presented in an unusual (oddball) font size, which made them more memorable.

OLD 3-STIMULUS, P300-BASED CIT (GKT) PROBE: GUILTY KNOWLEDGE ITEM: $5000 Press non-target button. IRRELEVANT: OTHER AMOUNT: $200 Press non-target button. TARGET: OTHER AMOUNT: $3000 Press target button.

The idea in this test is that P > I, showing recognition even if denied.

Previous P300 CIT protocols used Separate Probe(P),Irrelevant(I) and Target(T) trials. • 80% to 95% correct detection rates….but…. • *Rosenfeld et al. (2004) and Mertens, Allen et al. (2008):These methods are vulnerable to Counter-measures (CMs). • (A CM is an attempt to defeat the test)

Old P300 Protocol… … leads to 2 tasks for each stimulus: 1. implicit probe recognition vs. 2. explicit Target/Non-Target discrimination Possible Result: Mutual Interference more task demand  reduced Probe P300 that is not as big as it could be. This is why CMs hurt Old test.

How to do CMs: • When you see a specific irrelevant, SECRETLY make some response, mental/physical. • After all, if you can make a special response to TARGET on instruction from operator, you can secretly instruct yourself to do the same thing to other irrelevants. • Irrelevant becomes secret target that evokes big P300. If P = I, no diagnosis.

Results from Rosenfeld et al. (2004): Farwell-Donchin paradigm (BAD and BCAD are 2 analysis methods.) Diagnoses of Guilty Amplitude Difference (BAD) method,p=.1 Innocent Group Guilty Group CM Group 1/11(9%) 9/11(82%) 2/11(18%) Cross-Correlation(BC-AD) Method, p=.1 0/11(0%) 6/11(54%) 6/11(54%)

Results (hit rates) from Rosenfeld et al. (2004): Rosenfeld paradigm WeekBAD*BC-AD* 1: no CM 12/13(.92) 9/13(.69) 2: CM 6/12(.50) 3/12(.25) 3: no CM 7/12(.58) 3/12(.25) *Note: BCD and BAD are 2 kinds of analytic bootstrap procedures.

NEW COMPLEX TRIAL PROTOCOL (ctp) First study to follow… it was based on detection of autobiographical information: birth dates

New Complex Trial Protocol (CTP)—designed to resist CMs 2 stimuli, separated by about 1 s, per trial, S1; Either P or I…..then…..S2 ; either T or NT. *There is no conflicting discrimination task when P is presented, there is simple “I saw it!”… so P300 to probe is expected to be as large as possible due to P’s salience, which should lead to good detection; 90-100 % in Rosenfeld et al.(2008) with autobiographical information. It is also CM resistant. (Delayed T/NT still holds attention.) * “I saw it” response to S1. RT indexes CM use.

First Study: Within-subject correct detections of guilty subjects based on bootstrap comparison of probe P300 against the average of all irrelevant P300s over 3 weeks. • WEEKHit Rate • Week 1 (no CM): 11/12* (92%) • Week 2 (CM): 10/11* (91%) • Week 3 (no CM): 11/12* (92%) • Results with innocent (control) group. • Confidence=.9 Confidence=.95 • TestFPsHitsA’FPsHitsA’ • Iall .08 .92 .95 0 .92 .98 • Imax 0 .92 .980 .92 .98

How does this CTP do in detecting incidental mock crime details? The old, 3-stimulus protocol does not do so well with such incidental information • Subjects were divided into three groups (n=12) • Simple Guilty (SG), Countermeasure (CM), and Innocent Control (IC) • All subjects first participated in a baseline reaction time (RT) test in which they chose a playing card and then completed the CTP using cards as stimuli. • SG and CM subjects then committed a mock crime. • Subjects stole a ring out of an envelope in a professor’s mailbox. Subjects were never told what the item would be, to ensure any knowledge would be incidentally acquired through the commission of the mock crime. • All subjects were then tested for knowledge of the item that was stolen. There were 1 P (the ring) and 6 I( necklace,watch,etc). • CM subjects executed covert assigned responses to irrelevant stimuli in an attempt to evoke P300s to these stimuli to try and beat the Probe vs. Irrelevant P300 comparison.

A CTP Trial

Results: Grand Averages: SG, CM, IC, all P

Guilty Diagnoses • Condition Detections Percentage SG 10/12 83 CM 12/12 100 IC 1/12 8

RTs to S1 (P or I)

Conclusions • As with autobiographical information, the CTP was found to be highly sensitive at detecting incidentally acquired concealed knowledge in a mock-crime scenario. • Detection rates using the CTP compare favorably to similar polygraph CITs. The main advantage of the CTP over the old P300 or polygraph CIT is its resistance to CM use. The traditional covert-response CMs used to defeat past P300 CITs were found to be ineffective against the CTP, and actually led to larger Probe-Irrelevant amplitude differences and detection rates. • CM use was also easily identified by a large increase in RT between the baseline and experimental blocks.

New study with autobiographical Information (Bdates again), 2 mental CMs to 4 irrelevants. • So now we have a 5-button box for the left hand. The subject is instructed to press, at random*, one of the 5 buttons as the “I saw it” response to S1 on each trial with no repeats. T and NT (S2) stimuli and responses are as previously. • We also hoped that this would make CMs harder to do. It didn’t, but we caught the CM users anyway. • * We have done other studies with non-random, explicitly assigned responses also.

Design: • Autobiographical information (birthdates): One P and 4 I (other, non-meaningful dates). *3 Groups as before: SG,CM, IC. *NEW: mental CMs to only 2 of the 4 Irrelevants: Say to yourself your first name was the CM1, your last name as CM2. These are assigned prior to run. *Why 2 irrels? Meixner &Rosenfeld (2010) showed countering all Irrels, not probe gives probe extra, special significance. We did a study with only irrels (5), one of which was not countered. It had big P300. So doing CMs to all irrels is not a good strategy from perp’s perspective. *Why mental CMs? They should be faster and a bigger challenge for our CTP. • Only one block per group (no baseline).

Results: Grand Averages (Pz, 2 uV/ division)

Detection rates: • GroupBT/Iall.9BT/Imax.9 SG 13/13 (100%) 13/13 (100%) IC 1/13 (7.6%) 1/13 (7.6%) CM 12/12 (100%) 10/12 (83%)* *These are screened via RT, which still nicely represents CM use within a block.

RTs (to “I saw it”) in this study clearly index use of CMs:

New ERP: “P900—the CM potential” :largest at Fz, Cz(P=black, Iall=red, 2uV/division)

New study: Effects of various numbers of CMs, 1-5, with 5 total stimuli Elena Labkovsky & Peter Rosenfeld

A Mock Terrorism Study John Meixner & Peter Rosenfeld How do you catch bad guys before crimes are committed, and before you know what was done, where, when?

A Mock Terrorism Application of the P300-based Concealed Information Test Department of Psychology, Northwestern University, Evanston, IL 60208-2700

Table 1. Individual bootstrap detection rates. Numbers indicate the average number of iterations (across all three blocks) of the bootstrap process in which probe was greater than Iall or Imax. Blind Imax numbers indicate the average number of iterations in which the largest single item (probe or irrelevant) was greater than the second largest single item. Mean values for each column are displayed in bold above detection rates.

So…………. • CTP is a promising, powerful paradigm, against any number of CMs, mental and/or physical and RT reliably indicates CM use. The new “P900” might also. • jp-rosenfeld@northwestern.edu

So far, all CMs are done separated from and before “I saw it” response. • Separated or split away from are called “splitting CMs”. • What happens if subjects are instructed to do CM and “I saw it” response at the same time? They lump these acts together. This is called “Lumping CMs.”

Here’s what happens: P3 still detects (83%) P vsIall (b), but RT no longer indicates CMs!!

Note that this means you can no longer screen irrelevant comparison waves associated with large RTS. • Xiaoxing Hu to the rescue! (with Dan Hegeman and Elizabeth Landry). • He simply increased irrelevants from 4 to 8, which should increase demand and RT…

Here are RT results with 8 irrels and 2,4,6 lumping CM groups here combined

RTs sorted by lumping CM groups.

Tabulated data…the more CMs you do, the harder the task, the more likely that RT will expose even lumping CM use…

P300 still catches CM users…

We were actually able to do some screening with 6-CM subjects which improved hit rate to 77%, A’ to .91

Remember, Allen Hu gave the CMs to Ss in advance and let them rehearse. • And his subjects were geniuses, like you all…

So we are now working with 10 Irrelevant items… and 3,5,7 CMs.

BUT… • … it is obvious that having to form—on the spot-- and hold 6 CMs for 6 of 8 Irrels in your head –as must happen in the field--is probably too hard for most bad guys to do.

Karis, Fabiani, & Donchin (1984) demonstrated two important phenomena: