230 likes | 327 Views
T. Wandmacher , J.Y. Antoine, F. Poirier, and J.P. Departe . 2008 . Sibylle , an assistive communication system adapting to the context and its user . ACM Transactions on Accessible Computing (TACCESS), 1(1):6:1–30.
E N D
T. Wandmacher, J.Y. Antoine, F. Poirier, and J.P. Departe. 2008. Sibylle, an assistive communication system adapting to the context and its user. ACM Transactions on Accessible Computing (TACCESS), 1(1):6:1–30. Benjamin Blankertz, Matthias Krauledat, Guido Dornhege, John Williamson, Roderick Murray-Smith, and Klaus-Robert Müller. A note on brain actuated spelling with the Berlin Brain-Computer Interface.In C. Stephanidis, editor, Universal Access in HCI, Part II, HCII 2007, volume 4555 of LNCS, pages 759–768, Berlin Heidelberg, 2007. Springer. S. Harada, J.A. Landay, J. Malkin, X. Li and J.A. Bilmes. The Vocal Joystick: Evaluation of voice-based cursor control techniques for assistive technology. Disability and Rehabilitation: Assistive Technology, January/March 2008; 3(1 – 2): 22 – 34. Interface Variations
Main topics of papers • Main problem: traditional computer input methods are not necessarily effective for or even usable by users of AAC devices • Wandmacher (2008): A highly configurable system for inputting text; language models; language-specific difficulties; preview of a clinical study • Blankertz (2007): brain computer interface (BCI); asymmetric bandwidth; spelling system usable with extremely low bandwidth and minimal cognitive overhead • Harada (2008): replacement for mouse (not keyboard); continuous control; assessment of efficiency of this mode of input and comparison to other input devices; novice performance; comparison to other voice-based cursor control methods
Wandmacher (2008) • Seems to be a well established alternative input system (used since 2001) • 4 standard components: • Input device • Virtual keyboards • Text editor • Can actually be used with any Windows application • Text-to-speech system • Designed for a single switch interface • New version incorporates sophisticated word-prediction model
User Interface • Input depends on the user’s motor control • Can be any single switch device, combined with a finger guide or a grid keypad. • Optional 3-way (short, long, very long) durational contrast • Has multiple customizable keyboards, accessible with a jump switch. • Letters – can be dynamically reorganized • Numbers and punctuation (miscellaneous) • Predicted words • Navigation • Single item scanning if the user can’t control a mouse • Configurability • Fonts, font size, colors • Layout of keypads • Scanning mode, timing • Dynamic keyboard reorganization
Single item scanning • Chosen over row/column scanning because it is not as tiring • A timing line provides visual feedback indicating how much time remains until the next cell is selected • Dynamic keyboard reorganization makes this efficient • When tested on 50k words, on average the needed letter takes 3 shifts in this mode, while it takes 9 shifts in row/column mode • Keyboard reordering is based off of a 5-gram letter model (including spaces) • Trained for French, English and German • 4-gram backoff • Keyboard reordering doesn’t bother users in single scan mode because they are only looking at one item at a time
SibyWord • Word prediction system • 4-gram language model trained on newspaper corpora in French, English and German (44m, 49m and 37m words respectively) • Predictor performance may decrease by up to 30% when used with a different genre (Trnka, McCoy 2007) • Made using the SRI toolkit with a vocabulary of ~140k words • Smoothed with modified Kneser-Ney discounting and reduced the size of the language model with Stolcke pruning • List is filtered based on letters chosen • With ‘Most children’ entered, predicted words are: in, and, are, who, with • When l is entered, this list becomes: learning, like, live, learn, love • Optional filtering which changes all of the predicted words which were not chosen • Depends on cognitive ability of user
Language Model Adaptation • Users talk about different things in different ways • User adaptation • 4-gram language model • Linear interpolation of user model with base language model • Dynamically adapted parameters • Long term adaptation • If this is enabled, then either all text that the user has entered is used, or the user grants permission after each session
Semantic Adaptation • Cache models (recently used words are more likely) have a slight but constant effect • Topic detection is extremely difficult • Topically assigned corpora are not available in many languages, so this is not a particularly promising approach at the moment • Latent semantic analysis • Bag of words model • Assumes that the text is semantically cohesive • Predicts related words, but not always one which is syntactically appropriate • Trigger model • Having seen “foul”, words such as “referee”, “penalty” and “odor” are more likely than other words
Language Specific Difficulties • German compounding is productive leading to an infinite number of compound words. Most compounds are OOV • Predicting parts of compound words (head and modifier) based on a statistical model has been ineffective (Baroni et al. 2002) • Sibylle predicts words as usual • To compound, user enters backspace after selecting a word • A joint morpheme can be entered, then another word entered to form the compound • Ex: hund.e.nase (dog.JOINT.nose) ‘dog nose’ • A similar approach could work well with other languages with long words and rich morphology
Evaluation • Keystroke savings rate (KSR): 1-(keys needed with prediction / keys needed without)100 • Having a huge number of predicted words would increase cognitive load and KSR • Not too much benefit to increasing predicted word list past 3-7, so 5 word list was chosen • Cognitive load seems reasonable and 5 words should maximize benefits • Without filtering out seen words, KSR = 56.9% within genre in French • Automated assessment of each component • Tested with 4 genres in each language: news; literature; transcribed speech; email • Dynamic user model • Expected improvement of 2% after 2k words entered, 5-6% after 20k • Semantic analysis • LSA works better than cache model • Small but consistent gains across genres • Also serves as a thesaurus to users, providing unquantifiable benefit • Compound word treatment • Doesn’t always help, evaluation only chose this strategy if it did • Small, stable gains
Evaluation (cont’d) • The gains are nearly additive • All strategies can be used at the same time for maximum gain • Does not appear that any combination of strategies will hurt performance • KSR improves least for most different genres with all strategies • Speech and email most different • KSR still improves by anywhere between 1.6%-9.6% above baseline to 51.8% (lit-de) - 59.4% (news-fr)
Field Study • Had been used for 7 years in a rehabilitation center in Kerpape (France) • If the patient needs an AAC system, they work with staff to configure a system • Anecdotal evidence thus far, but plans for a rigorous study • Sibylle has been used with >20 patients, of whom only 2 have had problems with it • These 2 have visual impairments • Users were able to add functionalities as they mastered simpler ones • In contrast to expert-type devices which have a steep learning curve • Decrease in orthographic and grammatical errors • Predicted word list may have too high of a cognitive overhead • Considering having in-line word prediction • Or including predicted words in letter keypad, which will lower keystrokes • Many users have cognitive impairments in addition to motor ones
Blankertz (2007) • Brain Computer Interface (BCI) may be the only option for people with severe impairments • BCI can be invasive or non-invasive • Need to extract as much information as possible from the limited signal • Language model allows the information provided by the user to be used more effectively • Describes Hex-o-Spell
Previous Approaches • Binary search: 0.5 char/min • Other ‘assembly line’ type approaches: 1.6-2.3 c/m • Flashing stimuli on the screen then checking brain potential: 6c/m, up to 15c/m • Difficuties: • Bandwidth imbalance • Lots of information in the display, only a tiny amount coming from the brain • Error prone • Low SNR • Difficulties with timing • Stimulus to signal delay >=750ms with large variability • Some people just can’t use BCIs for unknown reasons
The Berlin BCI • EEG based, uses machine learning to adapt to different users • Detects signals from user imagining moving a particular hand or foot • Information rate 6-40bits/min • Uses Hex-o-Spell, which incorporates a language model
Hex-o-Spell • Originally for mobile devices with accelerometer • (video) • Right hand moves an arrow clockwise • Right foot extends the arrow • The letters in the selected hexagon replace the letter groups, and the process repeats • Turning/growing speed can be altered • Chunked interactions, so a user can pause • Advantage over systems like Dasher • If you added a pause button to Dasher… • Uses a partial predictive match language model • Combined trigram and ‘k-gram’ • Only used to determine arrangement within a hexagon • Doing before would result in cognitive overload • Getting this arrangement right means user rotate, extend, extend (otherwise rotate, extend, rotate, extend) • Could be extended to use a T9 like interface • Would likely be faster, but it’s not clear if BCI users would want this
Evaluation • Demoed at a fair • Not an experiment, really just a real-world evaluation • Environment was noisy and potentially stressful • Low humidity dried out EEG electrode gel • Psychological pressure to perform well • Users had only used it once before, and used it for around 7 hours (also played ‘brain-pong’ during this time) • Not clear who the subjects were, but given that this is a true BCI, it shouldn’t matter • Subject A: 2.3-5 c/m; Subject B: 4.6-7 c/m • Wide range with only 2 subjects
Harada (2008) • People who can’t use their hands can’t use mice easily • Therefore can’t provide continuous input to computer for drawing, etc • Imperfect alternatives • Keyboard: slower, less pleasurable to use (ex. Drawing) • If you can’t use a mouse, unlikely that you could use a keyboard very well • Eye/head tracker or mouth joysticks • Require special hardware, which is expensive, limited effectiveness • Vocal joystick (VJ) attempts to address some of these • Low cost: uses a microphone • Continuous signal with real-time processing • 4 questions: • Can this way of controlling the cursor be modeled by Fitt’s Law? • How does expert control of VJ compare to mouse? • What Is novice performance like? • How does it compare to other voice-based controllers? • Can be used with any Windows/Linux based application • Has 4 and 8 direction modes (novice and expert) • Transforms frames into cursor movement based on: pitch, power and vowel quality • /k/ to click
Mouse Alternatives • Mouth Joystick • Click with sip/puff • Fatigue and keeping the joystick in the mouth are problems • Eye tracker • Only need to move eyes • May click unintentionally • Can be difficult to pause • Head with reflective dot • Dwell/puff/sip to click • All of the above require expensive hardware • Single switch • Doesn’t provide continuous control • Cheap
Other Voice Devices • Igarashi et al. • Commands followed by a vocal joystick like interface • Ex. Map scrolling • Sporka et al. • Humming and whistling using pitch and pitch gestures to control cursor • Not well tested • Only 4 directions • Gestures are discrete, so control is as well • Migratory cursor • Grid with vocal commands for precision • Suite Keys • Constant velocity cursor with ‘start’ and ‘stop’ commands • Dai et al. • Recursive grid • Good for placement, but doesn’t provide continuous control • Dragon • Commands for cursor direction, speed • Jerky • Voice mouse • Vowel to start and stop; auto-acceleration; 4sec to start moving;2V click
Fitt’s Law • Models the human motor system as a channel with finite bandwidth used to transmit information in performing a movement task of a certain index of difficulty (ID) • Movement time = a + b log2(ID) where a and b are regression coefficients • ID = (2 * distance to target / width of target + 1) • Allows: • Prediction of performance in future tasks • (Indirect) comparison to other devices • Experiment • 4 unimpaired users had to hit a rectangular target with the cursor and click • Results • ID significantly predicted task time and explained much of the variance • Mouse: 5.48 bps; VJ: 1.65 bps • VJ is less efficient than mouse, head/eye tracker • About as efficient as joystick • VJ still has some advantages (ex. cost), and there is room for improvement
Comparison to Other Voice Devices • Only compared to Mouse Grid (MG) and Dragon • Both MG and VJ are faster than Dragon in Fitt’s task • People preferred MG, but the difference between that and VJ was not significantly different • People noted that Dragon’s controls were easier to remember, but was more frustrating • Possibility of combining MG and VJ for improved discrete control while retaining continuous control • Unknown learning curve, but authors note that novice users were able to learn to use VJ as well as MG within a short time