480 likes | 662 Views
simon listens Non profit organization for research and training www.simon-listens.org. Development of simon listens. 2005: Identifying the problem 2006/07: Conception and basic programming of the open source software simon by Grasch Peter and a group of students of the HTBLA
E N D
simon listens Non profit organization for research and training www.simon-listens.org
Development of simon listens 2005: Identifyingtheproblem 2006/07: Conceptionandbasicprogrammingofthe open sourcesoftwaresimonby Grasch Peter and a groupofstudentsofthe HTBLA 2007: Foundationof "simonlistens" 2008/09: Programmingofthefirststable prototype financedbytheBMfVIT
Research projects 2010 yearofprofessionalism FFG – BENEFIT simon – verbal control of ICT-applications for elder people ASTROMOBILE EU - PIC 987058033
simon – scientific network HTBLA –Higher Technical School Kaindorf Graz University of Technology -Signal Processing & Speech Communication Laboratory – Prof. Kubin Graz University of Technology – Institute for Software Technology (Robocup) – Prof. Wotawa University of Graz – Austrian German Research Center – Prof. Muhr Installation über Workshops IndividuellerAuftrag an Vereinsimon listens oderFa. Cyber-Byte EDV Services
Simon expertise • Reasons for the fascination of simon listens • Open Source character • Synthesis of AI technologies • Definition of concrete use cases and promoter of research themes • Professional conceptual work • Interdisciplinarity • Preference for pedagogical and social solutions
Open Source Speech recognition system Based on Julius, HTK KDE4 / C++ „Use-case-packages“ for download Vocabulary, Grammar, Commands, Trainings texts Acoustic model: Base models Static, Adapted, User generated Simon listens products: Simon
Simon listens products: Simon • Who benefits from simon? • Sensorimotor disabled elderly persons • Physically disabled people of every age • Quadriplegic people after an accident • Minimum requirement • Conscious articulation of words or phoneme constellations • Conscious determination of numbers from 0 to 9 • No computer literacy necessary
Simon listens products: SSC SSC: simon sample collector ssc is a tool for large scale sample acquisition. Using ssc multiple teams can gather training data from potential end users or professional speakers and collect them on the central sscd server.
Simon listens products: SAM SAM: Simon Acoustic Modeller SAM is a tool to create and test acoustic models. It can compile new speech models, use models created by simon and produce models that can be used by simon later on.
Use case: Basic Autonomy Speech control of a media center It is possible to listens to music, watch a slide show, TV or videos or listen to the radio just with a few – free eligible – words like “right”, “left ”, “up”, “down”, “ok”, “stop” etc
Use case: Basic Autonomy Speech control of the firefox browser Daily reading of newspapers and surfing the internet is easy and uncomplicated. The number Plug-In allows you to click links just by entering numbers.
Use case: Basic Autonomy Speech control of email clients You can write to predetermined e-mail addresses using numbers and with the use of expandable text modules you can ask basic questions.
Use case: Basic Autonomy Speech control of skype It is easy to establish connections to relatives and friends using Skype or other Voice-Over-IP solutions.
Use case: Basic Autonomy Desktop Grid Navigate the mouse with voice control easy and fast and do simple clicks, double clicks and similar actions. Calculator You can do several arithmetic operations in your daily routine and print either the result or the operation with the voice controlled calculator. Keyboard The voice controlled keyboard allows easily to insert code words, TAN-Numbers etc.
Research Projects • FFG – BENEFIT simon – verbal control of ICT-applications for elder people • http://www.youtube.com/watch?v=35tyZntA9j4
Analysis of disfunctions Movement disorders are dominant Visual and auditive disorders are medial Speech ability remains longest
Current Projects ibi – I’m informed Development of dialogues-integration of moving and speaking avatare (Persons, Comics! )
Current Projects App112 – Security Connections using Keyword spotting
Planned Projects • Voice control via dialogues for clinical rooms (beds, TV, light, etc.) • Voice control via dialogues for home automation • Smartphone apps for android, windows mobile and iOS • Specific Austrian speech model for elder people • Voice control via dialogues of set top boxes and television sets
ASTROMOBILE: simon tasks • To fulfil the mentioned task within the project ASTROMOBILE we had to work on the following different sub-tasks of different scientific requirements and not only technical issues like: • Programming • Development of scenarios • Development of dialogues • Speech modelling • Signal processing
Programming the D-Bus Interface • The current draft identifies seven dedicated components: • Navigator: Provides high level navigation including obstacle avoidance and path planning • Locator: Locate the robot and the person using the sensory network • Sensors: Integration of Boolean sensors (bed sensor, smoke sensor, etc.) • Speech Recognition: Command and control system utilizing simon • Text-To-Speech: Synthesize a given text in German, Italian and English • AstroLogic: Logiclayer
Scenarios: User - Robot • General offers, when the robot stays in front of the User after calling him: • Weather information • news based on RSS feeds with speech synthesis to listen the news • Multimedia offers like: • Photos • Music • Videos • Communication offers like Skype calls, Phone calls, SMS, Mail • Organization offers: scheduler • Calculator • Keyboard
Scenarios: User - Robot • Control functions in the natural environment ordered by the user and configured feedback by the robot using the recording of a 10 second video and presenting it to • the user, when the robot comes back like • Control of the water in the bathroom • Control of the doors in the environment • Control of the cooker • Control of the gas and other critical functions • Request functions: With the help of the simon touch platform the user should be able to initiate some requests like • Request of new medicine • Request of food • Request of acute help • Request of general help by the caregiver • Request of cargiver transport to the doctor or other events • Pre-established SMS-Service with the list PlugIn
Scenarios: Robot - User • Reminder functions with request of help are prepared for the following situations like • Alarm in the morning • Reminding of the hygiene and dressing in the morning • Reminding of the hygiene and facing in the evening • Reminding of taking the ordered drugs • Reminding of periodic drinking • Reminding of eating in the morning • Reminding of eating in the noontime • Reminding of eating in the evening • Reminding of coffee time • Reminding of periodic Skype calls
Scenarios: Robot - User • Simple reminder functions without request of help are prepared for the following situations like: • Reminding of events ( Based on calendar ) • Reminding of birthdays • Reminding of appointments like • Meeting with friends • Consultation with doctors • Visit of events • Personal appointments in the calendar • Dialogue-actions: ( skype and mailing ) (simple reaccion yes or no! ) • Incoming Skype calls with the possibility to accept or refuse the call • Incoming mails with the possibility to allow or refuse that the robot reads the message • Incoming appointment requests with the possibility to allow or refuse the appointment
Scenarios: Caregiver – Robot - User • Control functions: • Caregiver have access to the information of the sensors in the environment • Caregiver can administrate the dialogues, appointments and reminder functions for the user on the calendar • Caregiver can activate the robot to transmit a visual impression of the user in case of emergency • Communication functions • Caregiver can call the user using the Skype dialogue • Caregiver can sent an appointment to convene with the user using the robot and the calendar • Caregiver can sent an information with E-Mail using the mail reading dialogue
Simon Touch Architecture Like the rest of the developed solution, Simontouch uses C++, Qt4 and the KDE libraries. In particular, we are using the Akonadi PIM service, the Nepomuk / Strigi search, the KLocale framework for localization and the Phonon multimedia system.
Simon listens – Simon touch Simon touch – voice controlledtouchscreen interface • Main screen • Information centerwith • Slideshow, Music, Video, News with speech output • Optional functions • Touchscreenkeyboard • Touchscreen calculator • Touchscreen calendar
Simon listens – Simon touch • Communication centerwith • Skype, Phone, SMS, Mail • Control centerwith video recording and playback • Control of water, doors, cooker, gas • User can activate the control function • Caregiver can activate the control function from outside and take a look using the integrated video stream
Simon listens – Simon touch • Request centerwith direct phone calls or mail order • Shopping system, transport and support calls
The dialogue • system of Simon Dialogues in the Astromobile project
The dialogue • system of Simon The dialogue system of simon was implemented as a command plugin. You can basically speak of an ultimately robot. States Every state consists internal of: The current dialogue text. Every State can have several texts to give the dialogue a natural flow. Dialogue texts can use bounded values and templates (see below).
The dialogue • system of Simon Avatar A state can be linked with an avatar (e.g. the face of a nurse, an icon, etc.)
The dialogue • System of simon Options Through triggering the options (e.g. by a speech-command) a state can go over into another or commands can be executed. Options have a trigger, a name, an optional icon and can be automatically initiated after some time from entering the state.
The dialogue • System of simon Bound values Variables in the dialogue system will be shown as bound values. So for example the name of the user could be represented as $name$. The variable will be triggered to the duration with the list of configured bound values. There are four types of bound values: Static Connection of the variables with a text; e.g. Name of a patient QtScript This variable takes the result of the given Qt-Script (ECMAScript; also known as “JavaScript”) at the evaluated run time. Output options A dialogue can be shown graphically on the screen or through the integrated speech synthesis system (TTS) with the speaker.
The dialogue • System of simon • Implementation in simon • The dialogue states can be taken through • the schematic diagram above. So it results in: • Three states (Reminder, already • taken or not taken) • One time-driven trigger (who starts • the dialogue on a specific time) • 2 speech transitions (“Yes”, “No”) • One time-driven transition • (renewed reminder after time lapse)
The dialogue • System of simon
The dialogue • System of simon
The dialogue • System of simon • Schedule based appointments or dialogue actions
Simon listens – simon speech modelling Speech models in English, German and Italian • English: Adaption of the open source speech model of Voxforge • German: Adaption of a self produced speech model of elderly people • Italian: • Recording speech data of 46 persons of the region of Pontedera • Modelling of a specific Italian speech model forelderly people
Simon listens – signal processing Actual Solution Calling Astro with a Nokia N9 MeeGo mobile phone from everywhere in the natural environment (A MeeGo client was developed within this project ) Controlling Astro with a mounted gooseneck microphone in front of the robot
Simon listens – signal processing Better Solution – theoretical conception CallingAstro with installed microphones from everywhere in the natural environment Using differentzones of communicationlike • Call zone • Communication zone • Comfort zone ControllingAstro with a mounted gooseneck microphone in front of the robot
Simon listens – signal processing Natural speech communication from distance between humans and machines A very good solution would need: • A combination of speech recognition, tools of artificial intelligence and speech synthesis • A combination of identification of the direction of sound, localization of the user, voice identification, face detection etc. • A very good sound segmentation • Different zones of communication like call, communication and comfort zone with special speech models and intelligent microphone management The development of this solution would require a very great multidisciplinary project
Synthesis of AI-Approaches Multisensory natural speech communication between Humans and Robots/Ambient Assistant Living scenarios would need a high level of interdisciplinarity and