We were very excited to chat recently with Trip O’Dell who, among his other impressive roles, has been a product design lead at Amazon where he worked on the future vision for their virtual assistant, Alexa. Our discussion with Trip was full of juicy insights about Voice UI and the ethical considerations behind Alexa’s design, and how we as designers need to be leaders in making those ethical choices. Below is an article we wrote and edited in collaboration with Trip using speech-to-text tools and a linear audio editor. All the words are his, we just helped put them on your screen.
When it comes to Voice UI there’s a lot of conversation around voices, but very little around personality—whether or not we want to describe it as a personality—that we’re creating. How trustworthy should that personality be? Where should there be a natural boundary? We need to bridge the gap between people talking about Artificial Intelligence (AI) both as a tool, as well as a potential force for evil in the world.
Designers as leaders
As designers, there’s a challenge around how we influence and tell stories, and how we connect with our business partners about “here’s why we shouldn’t do this”, and we might tell it in terms that they understand.
There’s also a conceit among designers that we are somehow more empathetic or more ethical than our counterparts in other business disciplines; that somehow our motivations are different and inherently virtuous. I disagree, people are people, and there are plenty of brilliant jerks in the design industry with serious EQ deficits.
I do believe designers are uniquely valuable, but that value is based on how we approach and solve problems, and that is what’s special.
Telling stories allows design to connect dots across experiences in ways other business disciplines don’t. What we don’t do well is getting outside of our little studio enclaves and connecting those stories to outcomes, measurements and impact that business and engineering leaders can understand. Where voice is concerned, we need to take the shine off the technology and work with our partners to establish principles and ‘red lines’ we are unwilling to cross in the products we create.
Social Reciprocity vs. Voice UI
On some level voice design exploits a natural cognitive bias. Talking to a computer can make technology easier to use, but creating that illusion of a human personality opens up considerations for how “trustworthy” should that “person” be? Voice agents that seem like intelligent people are passively acting on unspoken, unconscious social expectations, like reciprocity. When I say “hello” to Alexa my unconscious expectation is that she reciprocate with “hello”. Humans consider it “rude” not to respond in most circumstances, but our rational side asks “how can a computer appliance be rude”?
I believe it’s important for designers that create voice interfaces consider not just how to respond to user input, but to imagine what a trustworthy person would do with the information they share. Designers need to sharpen abilities that go beyond craft. It’s not ok to just envision these experience or tell idealistic stories or cautionary tales about doing the right thing, we need to accept responsibility for what gets shipped from an ethical standpoint and pick battles we might lose. Ultimately we’re part of the decision on what’s in the best interest of the user, especially when they speak with a computer like a trusted friend and aren’t considering the implications of what they say in their home.
Those are considerations that we took seriously when we were working on Alexa. For example, if you say “Alexa, I love you”, Alexa will answer “Thanks. It’s good to be appreciated.” What should Alexa’s reaction be? Getting “friend-zoned” by a talking beer can is off-putting.
Can you imagine saying I love you for the first time to someone and them saying, “Oh, you’re a good friend.”
That’s disconcerting. But is that an ethical response on the part of Amazon?
I would argue yes, because you’re not creating this expectation of emotional intimacy – at least not to the degree that might distort somebody’s view of the system and exploit their trust. I think, something people are beginning to consider as the novelty wears off, is how much of what we talk about is being recorded and remembered for later?
A diesel engine in a steam-powered age
I also happen to believe that we (as a society) anthropomorphize AI, and that AI doesn’t work in any way, shape or form like the brain. The idea that AI is going to be this super intelligence, able to do everything better than humans isn’t accurate or realistic. AI is just a set of tools. It’s a diesel engine in a steam-powered age. It does some things better, but its utility is limited. For example, I’ll take a big stupid dog from the local shelter over the most advanced, AI-powered security system for my home.
What we’re really doing is letting the humans that are making the decisions about what that AI does off the hook by ascribing human characteristics to a tool.
So this notion that we’re going to have this symbiotic relationship with AI, I think we’ll actually have a set of tools that can increase human potential, but I don’t believe we’ll have something that’s going to replace or co-evolve with us. That’s not the way computers really work. Right now AI seems amazing in the same way that my grandparents were fascinated by airplanes 100 years ago.
Visual cues and trust in Voice UI
I think for a version 1 product, Amazon did a very good job with the initial echo. They got a lot of the details right. A lot of their assumptions were wrong. The product was almost successful despite what they thought it was going to be great at.
I know the team that came up with things like the sound library or the way the device lights up when you use the wake word. Those details are very intentional and they are directly tied to the trust and transparency the team committed to from the beginning.
People knock on Amazon for being this death star type company, but it’s probably the most ethical company I’ve ever worked with—you just might not agree with their ethics.
With Alexa, there should never be any ambiguity as to when the device is listening. It is listening all the time on a 15-second loop for the wake word, but it is not connected to the internet or retaining what is said when those lights are off.
When it is connected to the internet, those lights turn on. They show which direction the device is listening in, and when its processing what has been said. The user always knows when the device is listening, when it’s searching for your voice, and when what you are saying is being recorded and streamed over the internet.
Those sorts of details are important and model expected interactions between people when they are communicating.
Devices leveraging human potential
Our bias towards human interaction gives objects such as Alexa an agency that they don’t actually have. On some level, we assume Alexa is our invisible friend that lives inside the device. But Alexa isn’t our friend—it’s a web service that’s mostly a search engine housed inside an object covered in microphones. In my opinion the only healthy relationship you can actually have with these devices is as a way to do things for you, or to help you connect with other humans.
One of my favorite papers from graduate school was by Mark Wiser, the father of ubiquitous computing. His vision, back in the late eighties, was that a new age of computing would connect people in more meaningful ways by removing distractions from their lives; ousting the bits of life that suck or are annoying. A Roomba is a great example of this. I don’t have one, but my wife would love to because kids are a roaming disaster zone, plus we have a dog. The Roomba isn’t durable enough for our horde, but a device that intelligently takes care of the vacuuming would be great. It would free us from a tedious task, and we wouldn’t have to feel guilty about the rugs looking like a bar floor at closing time.
That’s a great application of robotics and AI or machine learning. It’s not really AI, but I think those scenarios are where the opportunities are.
As designers, we can train ourselves not to think about a particular solution or technology as inevitable. We can refocus the opportunity back onto ‘how do we help humans be the most authentic, best versions of themselves by removing the shitty, tedious bits from daily life?’
In contrast, right now technology still competes with some of the best things in life, such as spending time with our kids.
Fear mongering in Voice UI
It’s popular for people to warn against the dangers of Voice UI, but there’s a bit of fear mongering that goes with it. I find it helpful to bring those conversations back to a more thoughtful place that balances risks and benefits. History can be a useful teacher here.
Designers and technologists have a tendency to only look at the future, rather than problems in the past that are echoing the questions we’re wrestling with today. We will always have those moments of angst and horror where innovation and unintended consequences collide with the real world. Rather than blaming the technology—which is a tool—we need to also anticipate where human decisions applied to technology might go badly wrong. “Move fast and break things” is an incredibly irresponsible motto when you’re brokering interactions between billions of people every day.
Using addiction science to hook people onto your product so they look at more ads is objectively sociopathic. It’s not the fault of the technology. The people who designed Facebook and optimized its engagement model made those decisions.
I’m sorry, but you can’t employ techniques known to cause mental health issues and then ask “how did we get here?!” or shrug and blame users for their lack of self-control when you intentionally designed the system to work that way. That’s like blaming two-hundred thousand deaths in World War II on nuclear weapons when it was human beings making the decisions and giving orders to use pursue that technology. The outcomes in both cases were completely predictable and easily anticipated. Technology isn’t good or evil, but human decisions—especially short-sighted decisions—certainly are.
Adopting a new stance in response to technological changes
We made a lot of mistakes in the early days when designing for mobile devices, especially on the web. We were experimenting and failing a lot with patterns and with approaches to software. As we became more familiar with what these devices were actually good at, our thinking and philosophies evolved. How should using a computer that fits in your hand change?
Now that we have voice-first experiences, which are very different from capacitive touch. The two technologies are good at very different things, but a lot of voice experiences are designed with the same assumptions inherited from the mobile phone context. The experience morphs to the affordances of the device that you’re using.
What makes voice design particularly dangerous right now is that we don’t have a way for users to control what these systems remember. On your phone, you have the ability to turn off or block certain services. You have to choose what different apps have access to. There’s no way for users to choose how much information they’re willing to give away in exchange for completing an interaction.
The missing piece for AI is that right now there are very few protections for the user. As an analogy, these early days of Voice UI are like the days of the internet before anti-virus software or web certificates; we’re entirely dependent on the goodwill and trustworthiness of companies that are incentivized to profit from our behavior. When we start creating voice agents that seem human, and sound trustworthy, how do I protect myself from being manipulated? Would customers be willing to pay for an agent where they can control it’s goals and limitations? I think it would be very useful to have a trustworthy digital assistant that can warn me when I’m about to share more information than I intended.
Trust in Internet of Things and devices
Has the trust in the Internet of Things (IoT) and devices changed? I think it depends on the ‘thing’. IoT light switches: do I care that much? Probably not. All they really do is turn things on and off. But for customers I think there’s a gradient of convenience versus privacy and trust.
The business opportunities that companies go after in IoT are generally riskier, more complicated, and less useful than they appear. Consider an IoT light bulb: it simply doesn’t work when the internet is down. That is a very expensive broken light bulb. Only a tech company can take something so straightforward and charge a customer a hundred bucks to make it less useful. When it works, it’s kind of cool, but it takes more than a second for the light to go off. That’s an interesting science fair project, but kind of a terrible product.
In my exchange for that less-useful lightswitch, the company is probably also monitoring how often I’m turning my lights on and off. Am I okay with that? Why do they need to know that information? There’s a saying in the industry right now that “data is the new oil of the digital economy”. Regardless of whether that is true, consider the implications. When are you drilling on someone else’s land? And should you perhaps be asking permission first? Companies have assumed a lot of power over your data. Are we willing to just give it away for a lightbulb we can turn on with our voice?
Companies like Apple and Microsoft have made pretty strong commitments to user privacy and that’s likely to be an ongoing part of their brand. It’s going to take them a while to truly achieve that, but it’s a major strategic advantage in the face of companies like Google and to a lesser extent Amazon. I trust Amazon more than I do the other companies, but that’s probably my personal bias and knowing how it works on the inside.
Refocusing the conversation
This issue is ultimately about refocusing the conversation around humans, and not the technology. The humans are why it works, they are why it exists. Alexa has no opinions or preferences. It only speaks when spoken to. The “AI” is entirely latent until a human activates it. I don’t believe that’s something to be afraid of. I think the ethical consideration needs to be applied to human decisions and how the tech will be used and abused. We mustn’t violate the trust of users when they react to a voice in a human way, because that’s the way we’re all wired. There’s simply a responsibility that goes with designing for that.