AI-powered products are more prevalent than ever before: automatically organizing your photos, driving a car without any human input, and helping you craft a perfect email. These three examples all use AI but the product experience and constraints for each are vastly different. As more and more products use AI, it’s important to ensure the product experience and constraints align with the AI's application.
For example, consider the automatic organization of photos. Here, accuracy doesn’t need to be perfect. If you are searching for “dogs,” it's ok to return lots of of dog-like-photos as it is relatively easy to scroll and find the exact photo you're looking for - this is called a “high recall” approach.
In this post, we'll cover three key product design concepts to consider when developing an AI product:
To illustrate the concept of precision vs. recall, let’s use a common example: image classification. Say you have a cutting-edge state-of-the-art AI model that classifies images as pizzas or not pizzas. With this model, you have a set of test data to help you evaluate how good your model is. You show it 50 images that are pizzas and 50 that are not pizzas. You then run the model on these images and look at the results.
Recall asks: How many images did the model classify as a pizza in total?" Say your model classified 40 as pizzas, recall would be 40/50 = 80%.
Precision takes all of the images you classified as a pizza and asks: How many of them were actually a pizza? If only 30 out of those images classified as pizzas were actually pizzas, precision would be 30/40 = 75%.
High precision means serving only relevant answers at critical points. High recall means not missing correct answers. The needs of your AI application will shape the priorities of your model.
High recall is used in applications where the user picks a result from a select few sample sizes. The cost of missing an answer is greater than the cost of getting it correct on the first try. Google photo search for example serves many results, one of which is the correct answer, and the user can select the result she is looking for. User success in this scenario is finding the right photo.
Tesla’s self-driving cars drive on the other hand, are examples of high precision design. These cars need to avoid accidents despite not having access to a large database of sample crash data. Considering the size and weight of a car, a car's onboard systems must take extremely severe actions, and do so with significant foresight, in order to avoid an accident. For this reason, the system must precisely predict an accident is going to take place.
Spoiler alert: AI systems aren’t perfect. If your product doesn’t account for this, “you’re gonna have a bad time.” It’s important to consider the alternative outcomes for when your AI predictions fail.
Let’s look at two examples: Gmail’s Smart Compose and Amazon Echo.
Smart Compose is a feature in Gmail that helps complete phrases as you are typing. If there is a bad prediction, say you are saying “Good morning” and after “Good” smart compose comes back and shows “evening”, it’s not a big deal. You can just continue typing “morning” and it won't negatively impact the experience.
For Amazon Echo on the other hand, as an audio-based system the fallback is more difficult. Echo has to understand what you are saying. If you speak to it and say “Play the latest Drake album” and Alexa is unable to understand you, there isn’t an alternative. The voice fallback is much harder to deliver and very frustrating when it goes wrong. For this reason, the best fallback is often silence. For automated phone systems, another example of an audio-based system, the fallback is being transferred to an operator.
It's common to say AI is “like a black box” - a model returns a prediction but does the user understand why? This is where the concept of control comes into play. Offering users some insight and input into the system is a great way to let humans participate in the intelligence.
A good user experience with control is Netflix recommendations. Netflix helps users understand why they are getting results. By presenting users with a list of previously watched titles and letting users rate titles, Netflix offers users a feedback loop to influence and improve their product experience. More so, the product experience gets even better the more you engage with ratings.
Counter to this example are ride sharing services like Uber and Lyft. When it comes to shared rides, these services do not allow users to influence their rides. This leads to a rather painful rider experiences -- many people know the feeling of passing their house to pick up another passenger, only to be dropped off at home 5 minutes later.
One potential feedback loop ride sharing services could introduce is to allow users to section off streets the rider wants to avoid on their daily commute. Another option is to more stringently stay on the previewed route that users see when they initially book their ride. If ride sharing apps let users select rides based on a route preview, users could make more informed decisions on the rides they take (instead of sitting through poor experiences and asking for refunds).
So how do we apply these principles at Cresta? At Cresta, our live coaching AI provides sales and support agents recommended responses and behavioral coaching in real-time, during customer conversations. For us, it’s critical that our users trust the insights our AI generates. In terms of AI guidance - relevancy increases trust, and trust increases usage.
Here are some examples of these principles being applied in our product:
In summary, as more and more products utilize AI, it’s important to keep three key user considerations in mind:
Not properly considering these can lead to frustrating user experiences and in some cases (such as self-driving cars), the stakes could be really high.
At Cresta we are using these principles to inform a great user experience and help agents become experts on day one. If you are interested in defining cutting edge AI + HCI paradigms, please explore our careers page.
Thanks to Jessica Zhao, Amy Lee, Alex Roe, and Osman Javed for edits and reviews.