Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship Clifford Nass

Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship Clifford Nass Stanford University

Speaking is Fundamental • Fundamental means of human communication • Everyone speaks • IQs as low as 50 • Brains as small as 400 grams • Humans are built for words • Learn new word every two hours for 11 years

Listening to Speech is Fundamental • Womb: Mother’s voice differentiation • One day old: Differentiate speech vs. other sounds • Responses • Brain hemispheres • Four day olds: Differentiate native language vs. other languages • Adults: • Phoneme differentiation at 40-50 phonemes per second • Cope with cocktail parties

Listening Beyond Speech is Fundamental • Humans are acutely aware of para-linguistic cues • Gender • Personality • Accent • Emotion • Identity

Humans are Wired for Speech • Special parts of the brain devoted to • Speech recognition • Speech production • Para-linguistic processing • Voice recognition and discrimination

Therefore … Voice interface should be the most Enjoyable, Efficient, & Memorable method for providing and acquiring information

Are They? No!Why Not? • Machines are different than humans • Technology is insufficient But are these good reasons?

It’s Easy to Create Rich Interactions

Critical Insights • Voice = Human • Technology Voice = Human Voice • Human-Technology Interaction = Human-Human Interaction

Where’s the Leverage? • Social sciences can give us • What’s important • What’s unimportant • Understanding • Methods • Unanswered questions

Examples of the Power of Social Science

Male or Female Voice? • Is gender important? • Can technology have gender?

The Case of BMW

Brains are Built to Detect Voice Gender • First human category • Infants at six months • Self-identification by 2-3 years old • Within seconds for adults • Multiple ways to recognize gender in voice • Pitch • Pitch range • Variety of other spectral characteristics

Once Person Identifies Gender by Voice • Guides every interaction • Same-gender favoritism • Trust • Comfort • Gender stereotyping

Gender and Products • Gender should match product • More appropriate • More credible • Mutual influence of voice and product gender • Female voices feminize products (and conversely) • Female products feminize voices (and conversely) • “Match principle”

Research Context • “Gender” of voice (synthetic) • Gender of user • “Gender” of product • E-Commerce website

Examples of Advertisements • “Female” voice; female product • “Male” voice; female product • “Male” voice; male product

Appropriateness of the Voice

Voice/Product Gender Influences • Female voices feminize products;Male voices masculinize products • Strongest for opposite gender products • Female products feminize voices;Male products maculinize voices • Strong preference when voice matches product

Results for User Gender • People trust voices that match themselves • Females conform more with “female” voices • Males conform more with “male” voices • People like voices that match themselves • Females like the “female” voice more • Males like the “male” voice more

Other Results • Participants denied stereotyping technology • Participants denied harboring stereotypes!

People stereotype voices by gender • Voice “gender” should match content “gender” • Product descriptions • Teaching • Praise • Jokes

Gender is Marked by Word Choice • Female speech • More “I,” “you,” “she,” “her,” “their,” “myself” • Less “the,” “that,” these,” “one,” “two,” “some more” • More compliments • More apologies • More relationships between things • Less description of particular things • “They” for living things only • Voices should speak consistently with their “gender”

Selecting Voices • Voices manifest many traits • Gender • Personality • Age • Ethnicity • Voice traits should match content traits • Content • Language style • Appearance (e.g., accent and race) • Context • Voice traits should match user traits

If Only One Voice • Consider stereotypes • Masculine vs. feminine (same voice) • Boost high frequencies (feminine) • Boost low frequencies (masculine)

Emotions

Emotion and Voice • Voice is the first indicator of emotion • Voice emotion has many markers • Pitch • Value • Range • Change rate • Amplitude • Value • Range • Change rate • Words per minute

Emotion is always relevant • User has initial emotion • Interactions create emotions • Voice is particularly powerful • Frustration is particularly powerful

Emotion and Technology • Could technology-based voices exhibit emotion? • Could technology-based voice emotion influence people?

Research Context • Create upset or happy drivers • Have them “drive” for 25 minutes • Female voice gives information and makes suggestions • Upbeat • Subdued

Number of Accidents

Results • People speak to car much more when emotion is consistent • People like car much more when emotion is consistent

Implications • User emotion is a critical part of any interaction • Emotion must match content • Perception of voice • Trust • Intelligence • User • Performance • Comfort • Enjoyment

One Voice Emotion: Select for Goal • Overall liking • Slightly happy voice • Attention-getting • Anger • Sadness • Trust and vulnerability • Sadness (mild)

If You Can’t Manipulate Voice Emotion • Manipulate content • Manipulate music

Using the First Person: Should IT say “I”

Should Voice Interfaces say “I”? • When should a voice interface say “I”? • Does synthetic vs. recorded speech affect the answer to the previous question?

The Importance of “I” • “I” is the most basic claim to humanity • “I think, therefore I am” • “I, Robot” • Dobby and monsters don’t say “I” • “I” is the marker of responsibility • “I made a mistake” vs.“Mistakes were made”

Research Context • Auction site • Telephone interface with speech recognition • Recorded bidding behavior • Online questionnaire

Average Bidding Price

Results • When “I”+Recorded or “No I”+Synthetic • System is higher quality • Users were much more relaxed • “No I” is more objective • “I” is more “present”

Results • “I” is right for embodiments • Robots • Characters • Autonomous intelligence (“KITT”) • “I” is wrong when voice is second fiddle to technology • Traditional car • Heavily-branded products

Design • Text-to-Speech is a machine voice • Recorded speech is a human voice • Design questions are • Not philosophical questions • Not judgment questions • Experimentally verifiable

Mistakes are Tough to Talk About

Who is Responsible for Errors? • Recognition is not perfect • When system fails, who should be assigned responsibility? • System • User • No one

Responding to Errors • Modesty • Likable • Unintelligent (people believe modesty!) • Criticism • Isn’t really constructive • Unpleasant • Intelligent • Scapegoating • Effective • Safe

System Responses to Errors • System blame (most common) • No blame • User blame

Research context • Amazon-by-phone • Numerous planned interaction errors

Book Buying

Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship Clifford Nass

Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship Clifford Nass

Presentation Transcript

Adult Speech Perception

The Speech Mechanism

Computerized Speech Lab CSL

Speech Segregation

Elements of Voice

Natural Language Generation An Introductory Tour

Therapeutic Communication in Psychiatric Nursing

Free Speech, Religion, Press, Assembly, Petition

Speech and Language Modeling

Recent advances in MRI Breast and Future

Farewell Speech

Reported speech / Indirect speech

Human-Machine Dialogue Espere and Reality

UbiCom Book Slides

Part-of-speech tagging

Speech Segregation

Chapter 2 Speech Sounds

Introduction to Computer programming

New Advances in Measurement

VoiceStack Review & GIANT bonus packs

VoiceStack review-$26,800 bonus & discount

Human-Machine Coevolution