300 likes | 401 Views
Combadge: A Voice Messaging Device for the Masses. Berkeley UNIDO Conference Information & Communications Technology (ICT) Workshop April 23, 2005 James L. Frankel Mitsubishi Electric Research Laboratories Cambridge, Massachusetts. Combadge. A speech-enabled communications device
E N D
Combadge: A Voice Messaging Device for the Masses Berkeley UNIDO Conference Information & Communications Technology (ICT) Workshop April 23, 2005 James L. Frankel Mitsubishi Electric Research Laboratories Cambridge, Massachusetts
Combadge A speech-enabled communications device • Functionality: Two-way voice messaging with simple spoken commands and a one-button interface. • Platform: Basis for new handheld research • Goal: Bring state-of-the-art wireless communication and services to the less-wealthy in the world with a simple, low-cost device. • Advantages: Offers new services, yet is unimposing and non-intrusive, with low device and low ongoing infrastructure costs. • Contact:frankel@merl.com
Asynchronous Operation (1 of 2) • Users decide when to listen and respond • Messages are sent to and from device when connected • Device can be very small • Has no display • Requires only one button • Need not reach from mouth to ear • In the future, it will be feasible to be packaged in a watch • Voice interface makes Combadge usable by illiterate users • Can use better compression • No need for real-time compression • Can fully utilize available spectrum (packet switched)
Asynchronous Operation (2 of 2) • Graceful degradation of service during network overload • Users less aware of dead spots in network • Functional without any connectivity • Messages are cached in the Combadge • All functions that don’t require communication are useable • Reduces peak power demand, allowing much longer battery life • Speech recognition, compression and radio not used simultaneously • Can operate radio less frequently (it's like voice IM, not a phone) • Can use Internet for cheap global connectivity (like e-mail or IP telephony) • Makes group messaging easy
Simple • Single button, push-to-talk: no keypad, no display • Reduced manufacture cost and reduced power used • Simple interface using speech, e.g.: • “New message for Peter" • "Play New", "Reply" • Talk immediately: no waiting for a dial tone, for someone to answer, or for a menu • After adding another Combadge to the phonebook, there are no phone numbers to memorize • Everyone is identified by spoken name (or nickname) • For children, restrictions applied on adding new Combadges • Optionally, no messages from people you don’t know
Customer Base • Appeal to new users: • The less-privileged and less-educated in the world (including developing countries) • Designed for illiterate users • Lower cost device • Lower cost service • The cost conscious, such as youth (ages 8-14) and the elderly • Those irritated or intimidated by cell phones • Use cellular networks, but create a low bandwidth, low cost service • Use 802.11a/b/g for campus or village/town/city connectivity • Can use DakNet-like network for transport
Interaction with Services and Other Devices • Open-ended opportunity to create new services, providing simple spoken interfaces to the entire digital universe • “Weather for Boston” • “Market price for rice” • “Calendar: Am I free Friday afternoon?” • “Traffic on the Mass. Pike” • Voice control of devices • “House: Turn garage lights on” • “HVAC: Set living room temperature to 20 degrees Celsius” • Integration with e-mail, telephones, voice mail, etc.
Hardware (Introduction) • Hardware component is code-named “Dilithium” • Back side of main board
Hardware (Introduction) • Front side of main board
Hardware (Daughterboard) • Daughterboard
Hardware (Case Components) • Some Case Components
Hardware (In Case) • Dilithium in Case
Hardware (1 of 4) • Processor is Intel XScale StrongARM running at 206 MHz • Moving to Intel XScale at 400 to 624 MHz and faster • Memory • SDRAM: 64 Mbytes; Flash: 64 Mbytes • Integrated GSM/GPRS Modem for Wide-area Networking • On-board SIM Socket • Optional Daughterboard Provides One or Two Compact Flash (CF) Slots • 802.11b Local Area Networking • Many Other CF Peripherals (Ethernet, CF Memory Cards, Additional I/O Ports, CF Disk Drives) • Two On-board SiSonic Silicon-MEMS Microphones • On-microphone preamp • Can perform active noise cancellation
Hardware (2 of 4) • Flexible CODEC sampling rates • 11.025, 22.05, 44.1 (CD), 8 (telephony), 16, 32, and 48 KHz • LED’s • Two banks of blue LED’s under the translucent side buttons • Two bi-color LED’s on front • One LED for bi-directional communication using LEDComm • Two-axis Accelerometer • Gesture detection • Vibrator (for silent new message indication) • JTAG Connection • USB Port • Serial Port with on-board RS232 drivers • Two Stereo 2.5mm Phone Jacks for Audio In and Audio Out
Hardware (3 of 4) • Pushbuttons • Left and Right Push-to-Talk • Power On • Reset (Accessible through hole) • Real-time Clock • Dense component packing; Small overall size • Heavy use of BGA components • Processor, Four memory chips, and CPLD • Design of case • SolidWorks • SLA Master (Stereolithography) • Limited-run Rubber Molds
Hardware (4 of 4) • Hardware Revisions • Rev. 1 • Fabricated one device • This device has had a fruitful life • Still functional today • Rev. 2 • Fabricated five devices • These are the devices in the demo • Rev. 3 • Power management hardware added • Real-time clock added • Ground planes to attenuate audio noise added • Fabricated twenty-five devices to date • XScale Revision (StrongARM has been discontinued)
Software (1 of 5) • Initialization • JTAG Programming Utility • Initializes Flash memory using JTAG interface to StrongARM • Boot Loader • First Program running on StrongARM • Initializes memory and I/O devices • Provides debugging tools • Loads Operating System • Linux Operating System • We ported Linux 2.4.19 to Dilithium • Started with the Compaq “Familiar” Linux port
Software (2 of 5) • Linux Porting Issues • Our New Dilithium Architecture • New Flash memory chips • Custom Device Drivers • Accelerometer, buttons, LED’s • Combadge Voice-Messaging Application • Initial development on iPAQ PDA running Linux • Developed in Python, C, C++, and Shell Scripts • Voice Recognition • Two Recognizers (Using SDX from SpeechWorks/ScanSoft): • One for speaker-independent tokens • One for speaker-dependent name tags such as the name given to phonebook entry
Software (3 of 5) • Grammar used for Combadge commands • Play new messages; Play again; Play next; Play previous • New message for <name> • Reply • Create contact • Phonebook • Status all; Status ID; Status connection; Status messages; … • Profile normal; Profile meeting; Profile silent • Volume 1; Volume 9; Volume off; … • Delete contact <name>; Delete all contacts • Shutdown; Restart; Configure MERL; Configure adhoc; Configure GPRS; … • Version; Utility ping
Software (4 of 5) • Combadge application complexities • Heavily multi-threaded • Barge in capability • Extensive logging • Graceful handling of exceptional events • Power-down components when not used • Amplifier • GSM/GPRS modem • 802.11b interface • More work is needed to cause Combadge to sleep to extend battery life when device is inactive • Audio messages are now PCM files; will transition to WAV files • Gateway from voicemail system at MERL to Combadge
Software (5 of 5) • Voice messages are delivered using SMTP and IMAP • A custom “cbd” protocol is used to communicate from the Combadge to a “cbd” server • The “cbd” server actually sends messages via SMTP and gets messages via IMAP • SMTP is also used directly by the Combadge to verify valid phonebook entry addresses (using VRFY) • The Combadge application does the management of three categories of messages • Recorded to be sent, but not yet sent to server • Received from server, but not yet heard • Received from server and already heard • The Combadge maintains a cache of messages in its own memory • Combadge is fully-functional without any connection to a network
Deployment Connections • U. C. Berkeley • Eric Brewer • Divya Ramachandran, Graduate Student • Voice recognition for Tamil • Integration with Berkeley’s network transport for intermittent connectivity and long-distance 802.11b • Deployment in Tamil Nadu in India • Media Lab at MIT • SMART Group – EKG information transmission in ER or disaster situation • Mike Best – Potential developing world deployments • World Bank
Server Environment • Server runs Linux with dhcpd, sendmail, imap (invoked by xinetd), and cbd (the Combadge server daemon)
Research Directions (1 of 3) • User studies in developing world deployments • User studies in deployments in urban/suburban settings in the United States • Investigate mesh networking • Combadge as an infrastructure-less voice messaging consumer appliance (like a walkie-talkie/FRS/GMRS) • Forward messages through other Combadges toward the destination • Attention needed to patterns of physical location of Combadge over time (i.e., usual weekday daytime location, usual weekend daytime location, usual nighttime location) • Utilize connection to Internet when present
Research Directions (2 of 3) • Develop services for Combadge users • Traffic reporting • Weather information • Schedule/appointments • Stock quotes • Continue to Integrate with other Communication Paradigms • Telephone • Speech synthesis • E-mail • Pagers
Research Directions (3 of 3) • Develop as an audio home appliance remote control • Audio and video systems • Security system • HVAC • Audio interface to use as an MP3 player • Utilize Dilithium platform for other MERL projects • Microphone and audio processing server
Credits • Early work • Barry Perlman • David Anderson • Current work • Daniel Bromberg