Media

Visual Search: The Big Picture

There’s a myth out there that proclaims humans process images 60,000 times faster than text. It’s since been refuted, but still points to the basic truth that images are more easily digestible than text.

We’re surrounded by imagery. It’s pervasive in every facet of our lives. It’s only natural to want to learn as much as we can about the world around us from the most readily available and easily discernable stimuli. Visual search – a means of connecting real-life imagery with online information to help us better understand the physical world – lends itself to this inherent need.

Imagery as a Search Mechanism

The beginnings of visual search stretch all the way back to 2008 with the introduction of reverse image search engines. What set these apart was their ability to conduct searches via image recognition, using an image to locate similar imagery on the internet instead of keywords and metadata. Google would introduce reverse image search functionality based on mathematical modeling in 2011, but not before rolling out a true precursor to visual search two years earlier with Google Goggles.

Going Beyond Image Recognition

Google Goggles was an image recognition app – officially killed a few months ago – that was able to recognize imagery and landmarks via pictures taken on mobile devices. Once the image was recognized, the app could then bring up relevant information about it. An incredibly novel idea at the time, and one that would pave the way for visual search’s evolution.

Despite its ingenuity, better options like CamFind would soon become available that would signal the eventual demise of Goggles. Unquestionably, though, these tools were ushering in a new era, one where visual search technology extends beyond basic image recognition by weaving into our daily lives.

Accessing Your Point of View

Google Glass, although considered a quasi-failure, is another trailblazing milestone in visual search. It introduced many to the idea of tech that can see what you see. To accomplish this, it integrated with a visual search app developed by the people behind the CamFind app. The app allowed Google Glass to actually take in the world around the user. The possibilities, if fully realized, could have been limitless. Wearable, intelligent technology that can recognize, interpret and categorize what’s in your field of vision – that is next level stuff.

In reality, Google Glass came just a few generations (product, not human) early. Like Apple’s Newton, which gave way to the PDA and eventually the iPhone, Google Glass could be viewed in a positive light for its influence on future technology. A nostalgic misstep that was ahead of its time, but behind in adoption. One that would eventually pave the way for bigger, better things.

Visual Search and the Physical World

For the longest time, Google and other search engines had been relegated to staring directly at us. Waiting for a query. A command. Something to answer or fetch. The idea of letting a search engine or machine extend beyond the limits of the screen and into the physical world without waiting for human interaction was just a pipe dream.

The version of visual search we know today – rooted in tools like Google Goggles, Glass and CamFind – will allow the eventual expansion beyond what we allow machines to search for. Soon, artificial intelligence and machine learning will bring us information before we’re even aware we need it.

Current Competitive Landscape

Fast forward to the more recent technological advances in visual search. The focus remains on Google, but additional high-powered entrants like Bing, ASOS, eBay, Amazon and Pinterest have jumped into the mix.

All of these players have a major stake in the game. They’re all vying to create the technology that turns the entire world into one giant retail shelf. From a commerce perspective, that’s the strongest allure of visual search. It puts a for sale sign on almost everything you can point your phone at. It’s a tool that can entice the purchase of those in the moment, impulse-heavy items that just as easily could have gone unpurchased.

That’s the weight visual search carries for the tech giants. It’s the retail equivalent of a Craigslist missed connection, except on a global scale and nowhere near as creepy. It allows users to more readily act on the impulse to buy, as opposed to letting the opportunity pass by.

Where Each Player Stands

When it comes to offerings, it’s still a point-at-something-and-see-what-pops-up approach, the promise being that the tech will provide results that are as similar as possible to what you’re trying to find.

ASOS – an online fashion and cosmetic retailer – is extremely product focused. Their Style Match visual search tool functions similarly to the aforementioned reverse image search engines by matching static images to their image catalog. It can match the pattern, color and type of clothing depicted in the image and make match recommendations from there. Earlier this year they boasted 85,000 items, noting upwards of 5,000 items were being added on a weekly basis. ASOS does have its limitations; its matching capabilities only extend as far as its product catalog does.

Pinterest and eBay have created a similar functionality, with Pinterest integrating their visual tech, known as Lens, into Target’s retail offering last year. eBay’s Find It On eBay feature allows users to upload a photo or click their Find It On eBay button while visiting other sites. Bing has been investing heavily in its visual capabilities over the last few years, recently announcing an updated version that lets you snap a photo to get information or additional links. Pinterest, eBay and Bing’s visual search tools extend beyond their own platforms, but none have the capacity to recognize real-time imagery.

Google Lens Stands Alone

This quote from Google’s blog sums up their offering’s main differentiator in visual search. It’s the window into your physical world. With Google Lens, users can point their device at an image, object, text, barcode and more and the tech will serve up information on the item in question in real-time. By employing deep, machine-based learning, Google is able to translate the physical world into an online environment. Meanwhile, artificial intelligence continues to improve the system as its exposure to content increases.

While the technology itself is exciting, its applications for everyday usage give it true value. It’s the type of offering that can become an engrained, trusted tool for the average consumer rather quickly. The potential benefits extend well beyond the shopping cart, which is a noteworthy win-win for both the users and developers of the tech.

“Lens now works in real time. It’s able to proactively surface information instantly—and anchor it to the things you see. Now you’ll be able to browse the world around you, just by pointing your camera.”

So is Visual Search the Next Big Thing?

There have been a lot of “next big thing” contenders in the search industry over the past couple years. There was the goal of providing answers instead of results, the inclusion of audience-based buying data, a heavier focus on user-level context, and the rise of voice search (which we covered a few months ago). At one point, all of these have been hailed as the next evolution of search.

But the truth is, instead of overthrowing the current process, they’ve been incorporated into what we now know as the evolving search landscape. They’ve formed an amalgam of approaches and tactics that have helped significantly bolster search and the methodology behind it.

Visual search will be no different. It will serve as an added, highly interactive layer that will fold into the current search ecosystem. It won’t signal the end of keywords, as many have predicted, as there are still too many elements visual and voice search can’t account for yet. Visual and voice, however, will continue to evolve into more prominent roles in the search fold.

The Omniscient Watcher

Visual search is just beginning to bring aspects of its environment into the equation. The eventual goal will likely be devices that are always interpreting their surroundings. Gone will be the days where they’re waiting for us to engage them. This requires a device that could share the user’s point of view and constantly interprets the same visual stimuli and cues, all while formulating answers to potential questions before they even arise.

Imagine a person walking down a crowded street when a new car drives by and catches their eye. Before they could recognize the make, the device would be compiling all pertinent information and tailoring what it presents – possibly via paid ad – to the user. The information would be comprehensive. Gas mileage, safety ratings, seating capacity, average local purchase price, etc.

It’s the idea of visual search as an omniscient watcher. One that’s waiting in the wings to provide answers to questions prompted by external stimuli. To some, this may be the very definition of invasive tech. But to others, it’s something that can change how we view the world on a daily basis. It can interact, educate and inspire awe all at the same time.

Back to Latest

Get Our Best Insights

Updates delivered straight to your inbox