A Picture is Worth a Thousand Words: Image Search Technology

LTU Technologies
By Ned Smith
Historically, Internet search has been largely a text-centric process: Plug some words into the search box that you hope describe what you’re looking for, hit enter and wait to see what the lottery in the cloud coughs up. Sometimes what is retrieved is spot on. Other times, it’s off the mark but still somewhat useful. But more often than we’d like, what turns up is bubkes.
This, mind you, is when you have some idea of how to describe with words what you’re looking for. If words fail you, though, and all that comes to mind are terms like “thingamajig” or doohickey,” things get much dicier. And, if you want to find something that looks like something else, you’re really out of luck.
We’ve learned to live with this gnarly version of search, more or less, but in our hearts we know there’s got to be a better way. What if we could search visually? Not by word tags that describe what we think we see in an image but by the actual visual content of that image? Why can’t a computer be more like us? That day has come.
It’s not as if this notion has been ignored by the giants of search such as Google, Microsoft and Yahoo. But to date their stabs at visual search such as Google Image Search have relied largely on how websites describe images in tags, which tends to produce random results. Google recently presented a scientific paper on a new technology to build a landmark recognition engine that could potentially yield significant improvement in the way Google processes images, but that technology is a long way from seeing light of day as an actual product in the real world.
“Google and Microsoft have been hiring the best researchers in the field for about five or six years now,” says Alexandre Winter, CEO and co-founder of LTU Technologies, an image search and recognition company based in Paris and New York. They have made significant progress, he says, but they have yet to develop a technology that delivers accurate results. “Disruptive innovations don’t always come from market leaders,” he adds. “Keywords attached by people aren’t always accurate and categories they’re assigned to aren’t perfect either. Searching by images offers another way.”
LTU, which originated in a research lab in France in 2000, began its commercial life working with the French police to help them track down child pornographers. “They had a problem very similar to someone searching in a catalogue,” Winter says. “They had a huge database of images that had been seized.” Because child pornographers take great pains to mask their identities and addresses, he adds, the only way the police could discover clues was to find common points in the images, such as similar furniture or backgrounds.
“Our technology examines the pixel content of images, the different shapes, the structure, the texture, the colors, the arrangements,” Winter says. “We encode that into a bit of binary code that we call the image DNA. That image DNA is sort of a mid-level description of the image. We use that data to compare images and classify them and track them. We can actually compare image DNA pretty easily.”
Mid-level analysis, Winter believes, is the key to producing tangible results in real time. If you compare images at the pixel level to create a low-level view, analogous to working with the 1s and 0s of machine language, the descriptions would be too granular to be useful. And comparing them at a high-level, analogous to working with semantic tags, would produce results too abstracted to be meaningful.
“If you take two images and compare the pixels,” Winter says, “you won’t get much information on how similar the two images are. We do a little bit of what your eye and your brain do. If you take two identical images and smooth one, on a pixel level they’re entirely different. But if you project them on a screen, your eyes and brain will see the similarity.”
Though LTU still generates half of its revenues from work with government agencies here and abroad, including the U.S. Department of Homeland Security and the Federal Bureau of Investigation, the company is expanding into the realms of ecommerce, consumer applications and mobile marketing. Their business model, Winter says, is “to provide technology to people who want to build applications that have image recognition.”
Eyealike, headquartered in Bellevue, Wash., took a different path into the world of visual-based search. “We got started in the digital similarities and recognition area in the dating world and avatar world,” says John Hafen, CEO of the privately held company, which develops solutions for facial recognition, image detection and video copyright surveillance. Today, social networking sites and retail applications are the largest user of his company’s technologies. Like LTU, Eyealike isn’t customer-facing. “We are the man behind the curtain,” Hafen says. “We sell the software equivalent of a circuit board.”
What makes visual search so difficult? “The big difference between visual search and text search is that it’s impossible lexicographically to sort visual information the same way you can for text-based stuff,” Hafen says. “It’s a huge difference, because you can’t have a sequentially ordered index. Image processing is very resource-intensive.”
Hafen likens their process to gold-panning, a giant screening process from coarse to fine based on trial-and error that eliminates as much unessential information as possible. “There’s a lot of psychology baked into it,” he says. “A good analogy might by MP3 compression. Based on psychology, there’s certain information in music and audio you can’t hear anyway, so we can just throw it out. That’s why they call it psychoacoustic compression. It’s not a direct comparison, but it’s kind of the same sort of thing. We try to study the way the brain works and how the science of subjectivity works.”
Hafen believes a yet-to-be tapped market for visual recognition technology is advertising on social network sites that use visual search to match ads to user-generated content. “If you think about the potential value, it’s enormous,” he says. “Think of the additional value if you could target people, not based on any tags or anything they’ve put in about their photos or videos, but just with us mining the data in those photos. That’s pretty valuable.”
Like LTU and Eyealike, Toronto-based Idée tries to identify those characteristics that make an image unique. Idée develops advanced image identification and visual search software. “For us to make an image or video searchable, we have to index it with our image recognition technology,” says Leila Boujnane, Idée’s CEO and co-founder. “We take what makes that image unique and distill that into a unique digital fingerprint. The fingerprint is like a keyword, but more precise,” she says. “It has a lot more information than a keyword.”
Idée’s technology is used by a number of media organizations such as Agence France-Presse and The Associated Press to track and monitor usage of their images, but the company now is developing retail applications and has launched a public beta version of a reverse image search engine, TinEye. “Finding an exact copy of something is fairly straightforward,” she says. “Finding a copy that has been modified is more complicated. Your fingerprints have to be more robust.” The robustness of image fingerprints received its baptism by fire when the AP used Idée’s technology to track down the original image source for artist Shepard Fairey’s iconic “Hope” collage portrait of Barack Obama that now hangs in the National Portrait Gallery in Washington. “They used our image recognition technology to identify the actual image that had been used,” Boujnane says. That identification played a key role in the copyright violation suit that the AP brought against the artist, which is still in litigation.
Like some autistic people, computers have problems registering emotion and nuance. This softer side of search is one of the next challenges facing the visual search. “An algorithm will never be able to look at an image and say, ‘Here is a fingerprint that points to the fact that the two individuals in the image are smiling or unhappy,’” Boujnane says. “The biggest challenge with a fingerprint is that it doesn’t have human descriptors.”
Those human descriptors are what enable us to categorize images. In visual search today, LTU’s Winter says, “What we’re doing is matching an object with a reference database. You have to have the object in the database. The next big challenge is to be able to do a bit of softer recognition, which is recognizing objects of the same family just like you can do. Maybe you’ve never seen the latest model of Toyota, but you can tell it’s a car. What’s hard is to train a system to recognize an object it hasn’t seen before. That’s the final Holy Grail.”
The key to that, says Idée’s Boujnane, is to design algorithms that more effectively mimic the background processing that goes on in our brain when we look at images. “Algorithms that actually understand what they’re looking at,” she says. “Not the subject matter, because there is no such thing as a technology that understands it’s looking at a bird, but really understand what it’s looking at as far as the actual features.”
Ned Smith is a New York-based writer who reports on business and technology. He can be contacted at nedsmith@gmail.com.


Comments
5 Responses to “A Picture is Worth a Thousand Words: Image Search Technology”Trackbacks
Check out what others are saying about this post...[...] See Also: A Picture is Worth a Thousand Words: Image Search Technology (via Digital Buzz) [...]
[...] mavrasin wrote an interesting post today onA Picture is Worth a Thousand Words: Image Search Technology <b>…</b>Here’s a quick excerpt [...]
[...] cloud coughs up. Sometimes what is retrieved is spot on. Other times, itâs off the mark but st click for more var _wh = ((document.location.protocol==’https:’) ? “https://sec1.woopra.com” : [...]
[...] but the company now is developing retail applications and has launched a … View post: A Picture is Worth a Thousand Words: Image Search Technology … Share and [...]
[...] mavrasin wrote an interesting post today onA Picture is Worth a Thousand Words: Image <b>Search</b> Technology <b>…</b>Here’s a quick excerpt [...]