Say What? Respect Your OS

The most engaging human-computer interaction in the movie Her is the embodied, almost immersive gameplay, punctuated by the hilarious interactions with the foul-mouthed cartoon kid with the giant head.  But the most pervasive input type by far in the movie is conversational; almost the entire movie is a conversation between the main character (the amazing Joaquin Phoenix) and the voice of Samantha, the ever-learning, ever-evolving operating system.  The two talk and fall in and out of love, and at no time does the interaction require manual intervention (not even, apparently, when they both–man and OS–orgasm following “phone” sex).

Hilarious, potty-mouthed kid

Theodore and Samantha’s interaction is conversational, which makes sense since Samantha is a new version artificial intelligence OS. But when Theodore is composing letters for work he has to be much more instructional.  Theodore writes intimate, heartfelt, “handwritten” letters for clients through the company “Beautiful Handwritten Letters,” and he writes by dictation.  In this not-too-unimaginable future of Dragon-like technology, Theodore’s letters emerge fully formed as virtually perfect poetry, transcribed by the computer.  Theodore then says “Save” and “Print” and the finished letter appears.

These two examples of voice interaction–conversational and instructional–provide interesting challenges for design.  Consider the mental model a user needs to use most contemporary voice systems: users must speak briefly, uttering one of a small number of statements, with no embellishments.  That makes sense, since that is where the technology is today.  Even Siri, who (which?) can recognize a lot of variation in language, has a limited knowledge base and no understanding of subtleties or nuance.  What’s more, Siri is designed for people who speak standard American English, and is hilariously mystified by most accents.

Phoenix, in Her.

Beyond ethnic variation in language, consider another model that limits contemporary voice input.  The brevity of most voice recognition systems requires a highly directional type of speech: “Call Raymond.”  This type of speech is typically male.  It may be found most frequently in male managers in corporate America–especially male executives of (as they say) “a certain age.”  Most women are likely to append a “please” onto most requests, or to form the direction as a question (“Could you please call Raymond for me?”).  Not that these women can’t learn the abbreviated, instructional language required of contemporary voice recognition–of course they can.  But the model for the system doesn’t accommodate the wide variation in language patterns.

So, when Theodore was ordering around his computer at work, I really wanted him to be a little more polite.  His brusqueness mimics the stereotypical male executive/female secretary of (not very many) years ago, and makes me uncomfortable.  The difference in the way he talked with Samantha, who (which?) had little apparent work to do beyond being Theodore’s companion, made the difference stark.  Theodore worked very hard to charm Samantha, making her (it?) laugh and taking her out to see the world.  Really?  You can be charming to the new, sexy Scarlett Johannson-voiced OS, but bossy (and borderline rude) to the “woman” at the office?  Come on: say “please.”

Leave a Reply