780 likes | 1.03k Views
Analysing Eye-Tracking Data. Hayward Godwin University of Southampton. Outline. Part 1 Eye-tracking measures – an overview Data Viewer reports The Organise-Analyse-Visualise approach in R Part 2 Try it yourself!. Eye-Tracking Measures. An Overview
E N D
Analysing Eye-Tracking Data Hayward Godwin University of Southampton
Outline Part 1 • Eye-tracking measures – an overview • Data Viewer reports • The Organise-Analyse-Visualise approach in R Part 2 • Try it yourself!
Eye-Tracking Measures An Overview for a detailed review, see Rayner (2009)
“Global” versus “Local” measures • Global measures are computed at the overall (or global) level of a trialand ignore what was being fixated at any point in time • e.g., mean fixation duration for a trial • Local measures are computed for each object or stimulus in a trial, paying attention to what was being fixated at any point in time • e.g., mean fixation duration for target words in a reading study • Many measures can be computed at both a global and a local level
Mean Fixation Duration (global)(Mean duration of fixations) “Search for a blue square target” Mean Fixation Duration = (130+125+110+90+150+190)/6 125 130 110 90 190 150
Mean Fixation Duration (local)(Mean duration of fixations on a specific object type) “Search for a blue square target” Mean Fixation Duration for target = (110+190)/2 125 130 110 90 190 150
Number of Fixations (global)(Mean number of fixations) “Search for a blue square target” Number of fixations = 6 125 130 110 90 190 150
Number of Fixations (local)(Mean number of fixations on a specific object type) “Search for a blue square target” Number of fixations for target = 2 125 130 110 90 190 150
Total Gaze Duration (global)(Sum of fixation durations) “Search for a blue square target” Total gaze duration = 130+125+110+90+150+190 125 130 110 90 190 150
Total Gaze Duration (local)(sum of fixation durations on a specific object type) “Search for a blue square target” Total gaze duration for target = 110+190 125 130 110 90 190 150
First-pass Gaze Duration(sum of fixation durations on the first visit or pass of an object) “Search for a blue square target” First-pass gaze duration for target = 110 (the second fixation of 190ms duration occurs on the second pass so is excluded) 125 130 110 90 190 150
Single Fixation Duration(mean of fixation durations when an object is only ever fixated once) “Search for a blue square target” This is one of the cleanest measures there are in eye-tracking since only fixating an object once means we can chart the time taken to fully process that object Here, only two objects are ever fixated once. These are highlighted to the left. Since the target object is fixated twice, this trial would be excluded from the single fixation duration calculations. 125 130 110 90 190 150
Proportion of objects fixated (global)(Proportion of objects directly fixated) “Search for a blue square target” Proportion fixated = 3 / 5 = 0.6 125 130 110 90 190 150
Proportion of objects fixated (local)(Proportion of objects directly fixated, broken down by object type) “Search for a blue square target” Proportion of distractors fixated=2/4=0.5 Probability of fixating target = 1/1 = 1 125 130 110 90 190 150
Saccade onset latency(Time from display onset to start of first saccade) “Search for a blue square target” If display occurs at time 0, then this is 130ms 125 130 110 90 190 150
Mean number of visits(Mean number of times each object is visited) “Search for a blue square target” Count up number of times each object is visited and then divide by the number of objects that were visited Do NOT include zero values for unvisited objects 1 + 2 + 1 = 4 / 3 = 1.3 125 130 110 90 190 150
Saccade Amplitude(Mean amplitude of saccades) “Search for a blue square target” Mean length of all saccades = (1.2 + 1.4 + 2.2 + 0.2 + 3.4) / 5 125 1.2 1.4 130 110 2.2 90 0.2 190 3.4 150
Verification Time(Time between first fixating and button press) “Search for a blue square target” Find when button press occurred. If we find that it occurred 150ms into the second fixation (of 190ms) on the target, then verification time = 110 + 90 + 150 + 190 150 A better way to do this is to find the time the first fixation starts on the target and take this value away from the RT 125 130 110 90 190 150
Scanpath Ratio(sum of saccade lengths to target divided by shortest distance to target) “Search for a blue square target” Scanpath ratio = (1.2 + 1.4 + 2.2 + 0.2 + 3.4) / 5.2 125 1.2 1.4 130 5.2 110 2.2 90 0.2 190 3.4 150
Notes on Measures • Many, many measures that can be run • Just because you can run these, it doesn’t mean that you should • Focus on running only the measures that address your research questions and avoid doing or reporting additional ones for the sake of it (i.e., avoid fishing!)
Fixation Report • One row of data for every fixation in your study (per trial, per participant) • You will typically need to use the fixation report if you are running visual search/scene perception studies • Use fixation reports to filter out fixations that coincide with other events, such as display changes, button-press responses, etc • This can be done by filtering using the Interest Period (as you’ll see in the tutorials) but often you’ll end up removing some fixations you still want • Fixation reports can also be used to re-compute the size of interest areas and capture fixations that fell just outside of interest areas
Fixation Report – Important Columns • RECORDING_SESSION_LABEL: The recording session ID • TRIAL_INDEX: Trial number • CURRENT_FIX_INDEX: The fixation ID for the current • CURRENT_FIX_DURATION: The duration of the current fixation • CURRENT_FIX_BUTTON_PRESS_X: The time during the current fixation that a button was pressed • CURRENT_FIX_INTEREST_AREA_LABEL: The interest area label of the current fixation (“.” if the eyes are not on an IA) • CURRENT_FIX_NEAREST_INTEREST_AREA_LABEL: The nearest IA to the eyes • CURRENT_FIX_NEAREST_INTEREST_AREA_DISTANCE: The distance to the CENTRE of the nearest IA • Can also get NEXT_ and PREVIOUS_ versions of all measures
Interest Area Report • One row of data for every interest area in your study (per trial, per participant) • Reading researchers typically use this type of report • They typically change the interest period to be set to the time period of the trial itself, enabling the filtering out of any unnecessary fixations
Interest Area Report – Important Columns • RECORDING_SESSION_LABEL: The recording session ID • TRIAL_INDEX: Trial number • IA_DWELL_TIME - Total time spent on the IA (sum of all fixations on IA) • IA_FIRST_FIXATION_DURATION - Often referred to as First Fix Duration in reading research. The duration of the first fixation of the interest area (only on first pass, if the target region is skipped this will have no value) • IA_FIRST_RUN_DWELL_TIME - Often referred to as Gaze Duration in reading research. A sum of all fixation on the IA for the first pass. You also use this column for calculating Single Fixation Duration, but remove all occurrences where the IA region was fixated more than once. • IA_ID/IA_LABEL - The ID number and label for the interest area • IA_REGRESSION_IN - Returns 0 or 1 • IA_REGRESSION_IN_COUNT - Returns the number of regressions in • IA_REGRESSION_OUT - Returns 0 or 1 • IA_REGRESSION_OUT_COUNT - Returns the number of regressions out • IA_REGRESSION_PATH_DURATION - Often referred to as Go Past Time in reading research. Sum of all fixations that occur before passing to the right of the target interest area (to a greater numbered IA_ID). • IA_SKIP - Returns a 0 or 1
Message Report • One row of data for every message that occurred during the study (per trial, per participant) • If you want an accurate view of when things happened during your study, the message report is the one to use • This is particularly important for gaze-contingent studies where display changes occur • You can technically get most of the messages that occur from the fixation report. However, some messages do get missed from the fixation report
Message Report – Important Columns • RECORDING_SESSION_LABEL: The recording session ID • TRIAL_INDEX: Trial number • CURRENT_MSG_LABEL : message text details • CURRENT_MSG_TEXT : message text details • CURRENT_MSG_TIME : the time the message occured
Sample Report • One row of data for every sample recorded by the eye-tracker during the study (per trial, per participant) • If you have your Eyelink running at 1000Hz, that gives you 1,000 rows of data per second of recording • Sample reports typically are tens of millions of rows in size • You’ll only need to use a sample report if you have certain highly customised setups (e.g., moving displays) or want to get an idea of millisecond-by-millisecond pupil size (as is the case in pupillometry)
Data • In the past, data could easily be organised in Excel, Analysed in SPSS and Visualised in SPSS/Excel/Sigmaplot • With the size and complexity of eye-tracking studies, this is no longer really possible • We can now do all three steps in R, making the transition between them easier: • Organise: data.table • Analyse: ezANOVA • Visualise: ggplot
Organising your Scripts for Reproducible Results • However you do things, it’s best to have a consistent approach to organising your R scripts • I have two types of script: • ORGANISE__XYZ.R scripts that organise the data • ANALYSE__XYZ.R scripts that analyse and visualise the data • However you set up your own R scripts, find an approach and stick to it • This then makes it easier to copy and paste existing scripts, and being consistent means you can go back to old stuff and understand it more easily
Organise: the data.tablepackage • Why use data.table? • It does things very quickly • It extends (builds upon) data.frame objects, meaning that everything you can do to a data.frame object, you can do to a data.table • Now going to go through some examples of what it can do and how to use it • I’ll be giving out the example code later, so no need to type or run through it now
Create a data.frame Create a normal data.frame It will look something like this on the right It lists different trials for a bunch of participants and gives you their RT (Reaction Time) in ms
Add Keys • For large data sets you will want to set keys • When data are keyed, they can be processed faster • A key is set to various columns in your data.table • When a column is associated with a key, it will be able to group the data by that column more rapidly • In our example, let's set participant id (ppt) and trialType as keys so we can group the data by these values more rapidly using the setkey command
Basic Syntax • {WHERE} allows you to select only certain columns. In other words you can get the command you run to focus only on the data cells WHERE certain conditions are met • {SELECT} is where you tell data.table what columns or values you want back. In other words you SELECT certain values • {GROUPBY} allows you to group the output data in different ways. This is a bit like pivot tables in Excel.
Getting means • How about the mean RT overall? • Gives us: • In other words we are SELECTing the mean of the RT column
Getting means • Overall RT isn’t the interesting. Let’s GROUP BY trialtype: • Gives us: • In other words we are SELECTing the mean of the RT column but GROUPING BY the trialType column
Getting means • Now let's group by participant and trialType: • Gives us: • In other words we are SELECTing the mean of the RT column but GROUPING BY the trialType and ppt columns
Getting means • But what if we want to only obtain the means for trials 3 and 4? How do we do that? We use WHERE ! (Reminder “==“ means “is equal to”) • Gives us: • In other words we are SELECTing the mean of the RT column but GROUPING BY the trialType and ppt columns but only including values WHERE trial is 3 or 4
Adding Columns • Data.table also offers more convenient syntax for adding columns • If you run: • You add a newColumncolumn with a value of 1. You can combine this with WHERE and GROUP BY commands. If you run: • You get:
Joins and Merges • Suppose we forgot to include information relating to which condition each participant was in. How do we get that in there? • We can use a join! • A join in data science is a special type of operation that combines two datasets • To do this, create a new data.table, listing the participant id and the condition and follow the steps in the next slide • Joins (or merges) hunt down identical column names and then join the data from one table with that from another
Performing the Join • Create new data.table containing condition information and set the keys • To perform a join, it’s one simple command
We then have our joined-up data DT joinedDT cDT
We then have our joined-up data DT joinedDT cDT
Other Types of Join • We’ve just done our first join! • Note that we’ve just joined one column with one other column, but there is no theoretical limit to how many columns you can join by at once • There are many types of join, which you may want to use (e.g., left, right, natural, outer, full, Cartesian product, etc.) • The main point is making sure that the column names match in the tables you are trying to join, or else things will go horribly wrong
Analysing Data Worked Example
Worked Example: Mean Fixation Durations (global) • Let’s begin by taking data from a fixation report • We’ll analyse it, compute mean fixation durations (global), run an ANOVA, and then plot a graph • The data and scripts required are on the website but let’s walk through it together first
Computing Mean Fixation Durations (global)Example from a fixation report • First we compute the by-trial, by-participant means: • This gives us the mean fixation duration for each participant and each trial • Then we take the mean of these to get means by participant:
Computing Mean Fixation Durations (global)Example from a fixation report • This is what we now have: • Each participant (RECORDING_SESSION_LABEL) grouped by TRIAL_TYPE with a DV (mean fixation duration) • What next?