The future of human machine interaction will be voice based. But how will we get there?
In 1955, a group of scientists submitted a request to fund a ten person summer research project at Dartmouth College in New Hampshire, USA. As part of their paperwork, they hypothesised that ‘every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.’
This proposal is often cited as the birth of Artificial Intelligence. Within it, the ability for computers to understand language is the first such feature of intelligence mentioned. Even earlier than that, in 1950, Alan Turing decided to devise a test for intelligent conversation. His test calls for a human judge to evaluate the dialogue between a machine and a human and to attempt to distinguish between the two. If the human judge fails to identify the machine, this ‘Turing test’ has been passed by the machine, with the ability to understand language being the litmus test for intelligence.
What perhaps none of these scientists could have predicted is just how incredibly hard the problem of understanding natural language would end up being. To date, no computer program has passed the Turing test. Those that get close employ a set of elaborate tricks that say as much about their creators’ ability to manipulate the human psyche as the program’s ability to understand the context and purpose of a conversation. It’s not surprising that one of the earliest conversational programs, Eliza, was built to mimic the approach of a Rogerian psychotherapist. Here is some sample dialogue from Eliza:
Eliza: Don’t you ever say hello?
Eliza: Tell me more.
Eliza: You are being a bit negative.
Eliza: Why did you repeat yourself?
As you can see, Eliza is employing a number of answering techniques to give the illusion of an ongoing conversation, yet there is no real understanding of context. The program quickly gets trapped in the uncanny valley – that unsettling state when we can tell something is wrong, even if we can’t quite put our finger on what it is exactly.
Despite the challenges presented by the task of conversing with users, the opportunities are immense. While we might not yet be able to create machines that pass the Turing test, we are still able to do a lot of very useful things, with the opportunities boiling down to ‘location, location, location’.
Companies and organisations need to employ conversational AI because it places them where users currently are, where devices are heading and on a road towards a future where user interfaces will adapt to the needs of humans, not the other way round.
It’s where the users are
The first – and arguably most important – location that conversational AI gives us access to is that of the users’ current location. The real story of digital interactions over the past five years is not about social media, it’s about messaging.
It’s estimated that there are 2.9 billion active users across WhatsApp and Facebook Messenger compared to 2.4 billion monthly active users on Facebook social. Across the top eight messaging apps, there are 5.8 billion active users, with an estimated 2.5 billion individual users. No wonder that after Facebook went out of its way to ensure a foothold in the messaging world. After its 2014 purchase of WhatsApp for $19bn, it announced in 2019 that it would be building a single messaging layer that would connect all its apps, from Messenger to WhatsApp and Instagram. In this way, even if people use different apps to compose the messages, they will be interacting through the same space.
These numbers indicate that businesses need to consider messaging apps a new space to move into, and indeed they are doing so in droves. Over 20 million businesses respond on Facebook Messenger, while WhatsApp is offering a dedicated business app and API. While users still value human interaction and prefer it to be available for more complex situations, over half of users still choose a chatbot to save time.
Businesses are striving to scale their interactions across the messaging space with the correct mix of automated interactions powered by conversational AI and human support for more complex situations. Those that succeed will have an advantage over those that are simply not present in the channel or, if present, are not able to respond to users’ requests in a helpful or timely manner.
It’s where the devices are
Smart appliances are showing up everywhere – in our homes, our cars, in hotel rooms and in business. They are spreading at an incredible rate – almost 25 per cent of the US population has a smart speaker in their home. Add to that doorbells, fridges, cookers, TVs and all the other devices that power our lives and it is safe (although slightly disconcerting) to say that a large number of us are always within listening distance of an internet connected smart device.
So how do we interact with these devices? Currently, this involves a confusing series of smartphone apps, alongside some voice interaction through a ‘hub’ device such as an Alexa-enabled speaker. Ultimately, the simplest way to interact hands-free with these devices in the car, while we are preparing dinner or in countless industrial applications, is to simply speak to them all.
Conversational AI is the pathway to our seamless interaction with the technology layer that will pervade every aspect of our lives.
It’s where the future is
Since the widespread introduction of the graphical interface with the Macintosh in 1984 and up until at least 2019, the predominant computer interaction paradigm has been to point at something and then click. Pinch, zoom and swipe have all upgraded this basic experience by making it richer and smoother but they haven’t radically changed it. It took us 35 years to get from underpowered processors on greyscale screens to blazingly fast machines and millions of colours. Nevertheless, we are all still pointing and clicking.
It is time to turn the tables on computers and the way we interact with them. Until now, we have all learned the ‘magic incantation’ – the sequence of clicks that achieves our goal. But instead, why can’t we tell computers what we want and have them do it? While this has always been the vision, with conversational AI, interface designers now have the tools to make it a reality. Those companies that crack conversational AI for wider human computer interaction will lead the next 30 years of operating systems.
A framework for success
As we said at the start, conversational AI is an incredibly hard problem to solve. The advances made so far, however, have been nothing short of staggering. One of the first voice recognition devices was Shoebox, an IBM device introduced at the 1962 Seattle World Fair that could recognise 16 spoken words. Currently, all major platforms are reporting recognition error rates below 5 per cent, which is more than enough to call voice recognition a viable technology.
Of course, conversational AI is much more than just converting speech to words. In many ways, the real challenge comes after that. The device needs to understand the context of the conversation both at a global level (the user’s ultimate goal) and within different stages of the conversation (the tasks to be achieved in each step of a process). This is where the current challenges lie. Advances have been rapid and impressive but people are still reporting their frustration with chatbots and intelligent voice assistants because they are “just stupid” or they “don’t understand what I am asking”.
At my company, GreenShoot Labs, we believe that the correct approach to capitalise on the potential of conversational AI is to view the technology as an augmentation layer that works together with humans and other types of interfaces. The key is to provide a better solution to user problems through better interactions by adding conversations wherever they can save time and improve the overall experience. However, we must be ready to fall back to other solutions where a conversation is unlikely to achieve the goal.
We developed a conversational application method that analyses the problem space to identify where conversations can add value and what type of conversations those should be. In brief, we use the method to address the following characteristics:
Audience: Who is the conversational application being designed for? How does it integrate with other activities and tasks they perform?
Platforms: On what platforms are the conversations taking place? Are we dealing with a single platform or do we need to adapt conversations so they can be effective across multiple platforms?
Capabilities: What should the conversational application be able to do? What types of information does it need in order to be able to hold the required dialogues and any related actions?
Interaction style: What sort of interaction style is most appropriate for different capabilities? Where could we benefit from open-ended natural language exchanges? Where instead should we structure conversations and delimit the domain with fixed options through user interface elements such as buttons?
Adaptability to context: Should the context in which conversations take place (location, user, past history, etc) influence the conversations?
Systems integration: With what backend systems do we need to integrate in order to enable the chatbot to interact with its human users? Will we need integration with an identity service, a CRM, a support system, etc?
Improvements: What systems should we put in place to support the improvement of the chatbot? To what extent can we take advantage of learning techniques for self improvement and what instead should depend on curated or semi-automated training?
We then support application development through a technological platform, called OpenDialog, that directly addresses the issue of context. It does this by relying on work in artificial intelligence around an abstraction called Electronic Institutions, an EI being defined as ‘an organisational structure for coordinating the activities of multiple interacting agents.’
The simplest way to understand how OpenDialog models conversations is to think of it as writing a theatre play. You, as the scriptwriter, define the overall arc of the story (the global context), set up the scenes (local context) and also the actors (the bot and the users).
As the story evolves, your scene participants move from one scene to the other until the play eventually reaches the end. OpenDialog adds an automation layer on top of that to help deal with all the possible ways one of the actors (hopefully just the user!) tends to improvise and not follow the script. Since the overall context is understood, we can keep driving the conversation towards a useful outcome.
Conversational AI can already be incredibly useful. It can allow businesses to scale interactions with customers, provide a better experience and reduce costs. However, as with any nascent technology, it has to be approached carefully and realistically with the appropriate mindset and tooling.
We are just starting to explore what will be possible. While the problems we are currently solving might seem simple compared to the richness of spoken language, progress means that intelligent assistants are weaving their way into our everyday habits and are shaping the next generation. My two kids are ten years apart. My daughter grew up with touch devices, so assumed from a very young age that any screen should be a touchscreen. She would touch our TV to try and change channels. My two year old son sees us talk into a TV remote or ask Siri to start a timer, so assumes that every device is something you can talk to.
We are just a few steps away from entering the point of no return, where conversational AI is no longer a novelty. The path will be a challenging one. We need to deal with privacy and security concerns to ensure that the technology is not misused. As with all great innovations, from printing onwards, the genie is out of the bottle. It is our responsibility to make sure it now grants us the wish of the sort of technological world we want.
For regular insights from D/SRUPTION’s expert guest contributors, sign up to our newsletter.