The Wonders of Visual Search

Discovering more about the world we see

A guest post by Omaid Hiwaizi and Sam Ashken at Blippar

Technology has shaped human development since the dawn of time, from cave drawings that allowed stories to be recorded, to combustion engines bringing people closer. The World Economic Forum’s Professor Klaus Schwab summed it up this way: “The first Industrial Revolution used steam power to mechanise production. The second used electric power to create mass production. The third used electronics and information technology to automate production. Now a fourth Industrial Revolution is building on the Third. It is characterised by a fusion of technologies that is blurring the lines between the physical, digital and biological spheres.”

We’re at the start of an age in which AI – artificial intelligence – will enable machines to help us to enhance how we communicate, think, feel and see. And since vision accounts for more than 80% of our engagement, computer vision will become critical. It’s the branch of AI that’s teaching computers to see the way we do, yet it’s proving to be a tough nut to crack. Sight uses as much as 50% of our brain function but much of it is in ways which aren’t clear. By comparison, chess utilises just 5% of our brain capacity!

Reverse engineering sight has been worked on since 1966, when Professor Seymour Papert, a founding father of modern AI, issued a project brief to solve it over the summer break, which he thought would be plenty of time. Yet it took until the 1990s for the breakthrough of face detection and recognition that is now commonplace in cameras and smartphones. And since these algorithms can only detect human faces, in order to recognise other objects, a new approach has been needed for a long time.

The quest for recognition

Several competitions have been launched to assess success, with ImageNet the most successful initiative, providing millions of tagged images to allow projects from companies and universities to test their technologies. The format of an annual competition has added a fun element but has also shown that over the years, improvements have been in incremental steps rather than the huge leaps forward everyone was hoping for. It’s clear that while it’s easy for humans to recognise a banana, a bunch of bananas and a drawing of a banana as the same thing, it’s far, far harder to teach an AI to do the same thing.

Yet in 2012, the introduction of deep learning on GPUs – Graphical Processing Units – proved to be a game changer. GPUs had been developed to drive the huge mathematical needs of real-time 3D gaming but their supercomputer processing power was soon identified as a way to exponentially increase the deep learning abilities of AIs, increasing the rate at which they could be taught.

A tipping point was reached in 2015 when the abilities of the best systems finally beat human recognition at the ImageNet competition. Since then, we’ve been in an arms race, with multiple companies working on applications ranging from production line systems through to consumer applications such as driverless cars or visual search.

Visual search vs text search

Question: how is text-based searching helpful? There’s no doubt that search on smartphones gives people great access to information, particularly given the powerful machine learning algorithms that prioritise internet content to keywords. Text entered gives a very clear intent signal on the part of the searcher, which also creates a clear question for relevant advertisers to address.

But words do not always have the ability to describe the thing we’re curious about. If you see an unfamiliar flower, breed of dog or unusual street food, what words can you choose in order to learn more? Visual search allows someone to just point a smartphone at each thing in order to get an answer. As with text-based search, there is a clear intent so again, any advertisers with brands, logos or related objects in the image can follow up with a targeted message.

Drawing comparisons between visual and text searches only hints at the scale of this opportunity. Consumers inhabit a world of overwhelmingly abundant content – social media, video, gaming, VR. . . There’s so much competing for consumers’ attention that it can be hard for brands to be noticed. We feel that visual search will be crucial to the ability of brands to fight for that attention.

At root, branding is about highlighting or reinforcing some benefit of the product so that consumers are more likely to remember and buy it. Audience fragmentation across mass and social media channels, as well as the high cost of advertising, have make any fresh approach highly attractive, while the perfect moment to deliver any message is when the product is in the customer’s hand. Visual search will make it frictionless for brands to nudge customers towards a point of purchase by making the in-store moment of decision a valuable, more fulfilling one.

Where is this taking us? Visual search will enable quick serve restaurants (QSR) to turn their outdoor advertising into actual incentives to go in-store. . . right now. Once shoppers are in the store, the brand will then be able to serve personalised incentives to increase basket size. For cosmetics brands, visual search will show customers what looks a specific mascara can create just by pointing their phone at the product.

For food and drinks brands, visual search will deliver nutritional information which goes beyond what’s on the label. It will provide recipe inspiration, offer complementary products and, ultimately, will be the route into queue-less, mobile enabled payment.

For car brands, visual search will turn the whole world into a showroom. Car buyers will get performance information about models by pointing their smartphones at them in the street. Then, using the phone’s GPS, local dealerships can attempt to engage each one with a deal.

Visual search will change shopper journeys higher up the funnel too, in the inspiration phase. When a garment catches the eye of a fashionista, they will be able to find where it comes from and what other pieces of clothing are like it. In the same vein, if a colour takes their fancy – it could be a flower, a painted wall or a sunset – they will be able to search for shoes or clothing in that exact shade.

These use cases are only the start. We can’t anticipate all the ways that brands will be able to use visual search but what’s certain is that this new way to link the digital and physical worlds will offer a wide range of a opportunities.

Creating knowledge parity

We’ve discussed how visual search will change how brands engage with consumers. However, perhaps the biggest area in which it will transform our lives is the issue of ‘knowledge parity’ – the uneven global distribution of expertise and information.

A seventh of the world’s population can’t read or write and two sevenths have only basic literacy. Even the ‘educated’ four sevenths only use 12% of the languages they speak. The potential applications of visual search and computer vision are therefore broad – from helping illiterate farmers increase their crop yields to the distribution of medical expertise.

Humans are a visual species yet we’ve grown up with the idea that digital is predominantly a ‘text first‘medium. In the future, the current era in which we access information primarily through the keyboard will seem so anachronistic as to be almost unimaginable.

Omaid Hiwaizi and Sam Ashken are global head of brand experience and strategic planner at AR and computer vision company Blippar.