Despite being one of the first voice assistants for the masses, Siri has struggled. That’s the message from an in-depth look at the development and future of Apple’s voice assistant published this week in The Information.
While the in-depth story was focused squarely on Apple, there are lessons that every company looking to enter into the voice market should consider - and every company should be thinking about how to get into the voice market. More than 35 million Americans are using smart speakers, diverting time away from smartphones, desktop, and television.
In the battle for market share in the voice assistant wars, the winner will be the brand who can build the most trust with consumers. Building trust over voice requires careful thought, diverse data sets for training conversational AI, and above all else accuracy. It can’t be rushed, hurried, and launched for the sake of launching, as we’ve seen in the dismal response to the launch of Apple’s Home Pod and the continued challenges with Siri.
Voice Assistants and skills built for them must follow a clear road map to build trust with consumers because many consumers will flat out remove a skill if they experience a bad interaction with it. In fact, 70% of smart speaker users have experienced problems or frustrations and 1 in 4 people do not believe makers are considering their needs when developing the smart speakers - proving there is room to grow. Here’s how you build trust:
“To err is human...” and we can expect no more from technology designed and built by us. Users will forgive many technical limitations and errors if the system responds in a way that helps them to understand what happened and what to do next. The way any digital technology handles fail states is critical to users’ perception of the experience—and this is especially true with the intimacy of voice. By using clarifying questions or varying the error response (e.g., “Forgive me, I don’t understand,” or “Can you say that again?”) users will feel less frustrated and improve the overall interaction. Ultimately however, no matter how carefully scripts are drafted and tested, things can—and will—go wrong.
It helps to think of voice applications a little like a host would: welcome people, make them feel comfortable, and get them where they’re trying to go as elegantly as possible. In designing tech experiences, making users feel comfortable should include many of the same practices that go into web and software design: consistency, clarity, and comprehensibility. Users should know where they are and what they can do. They should have as much information as they need and only as much information as they need, and they should have it when they need it.
In the fall, we released a report based on a survey of 1,000 smart speaker owners. Nearly a quarter of respondents said they had a hard time remembering the right commands for their voice assistant. In traditional web design, good design focuses on recognition over recall, but with voice interactions, this gets flipped. Since there’s no visual stimuli to drive recognition, the user is forced to recall important aspects of the interaction. To make it easier on users, make the commands as basic as possible so they mimic natural conversational patterns.
It’s important for smart speaker applications to let the user know they have understood their command. We call this “confirmation.” Confirmations can take many forms, such as an explicit confirmation like, “I heard you say [X]. Is that correct?” For interactions where a mistake would be significant, it is critical to make sure the user’s command was understood. Placing an order that would result in the user being billed or calling someone from their contacts list are important to get right. Sometimes the confirmation can take the form of a simple acknowledgement like “okay”. Or the confirmation can even be a beep or light - consider how much R2-D2 and BB-8 conveyed without words. Developers should consider which confirmation is most suited to the interaction in question. Our overall guidance is that confirmations should be used sparingly to keep the conversation moving but at critical moments when misunderstandings could result in a negative interaction for the user.
Unlike “voice” used to describe writing, users look to voice interactions to be more entertaining than screen-based interfaces, perhaps to fill in for the lack of visual stimuli, perhaps because an audible voice is so personal that many anthropomorphize the voice-powered assistant. For this reason, so many of our survey participants talked about the voice interface’s “personality.” Our research showed clearly that users do respond to the personality of voice interfaces, and your voice application will have a personality whether intentional or not. Experiment with humor, but use caution. No one minds a little sass when asking about surf conditions, but dire news alerts or financial information might not be the time to joke around.
Despite the internal and external challenges facing Siri, voice assistants will be a growing factor in how many Americans interact with technology over the coming years. The same way companies thought about digital transformation a decade ago, we now need to think about voice. As more and more developers work on applications for voice, they will not be successful without the foundation of any good relationship: trust.
Originally published on LinkedIn: To Win in Voice, Build Trust