Hulu Hops Onto Caption Search Technology

Image courtesy of Hulu
By Dave Fidlin
With the Internet maturing and online video become the norm - not a novelty item - enhancements are being created for streaming content. Caption searching, one of those enhancements, was added recently to the popular site Hulu. But a smattering of smaller companies had already been offering a similar service before Hulu added the feature to its site in late December.
Hulu Labs, the online video portal’s development arm, is running the feature in beta while engineers continue to work on perfecting the technology.
In a statement, Eugene Wei, Hulu’s vice president of product, says the feature will enable users to search for captions of thousands of videos across hundreds of shows. For now, Wei says users can access the feature through Hulu Labs’ website. Eventually, a feature tab will be included on all Hulu shows that have captions.
Wei points out how captions search might be beneficial to a viewer of an online video. Wei says he was watching an episode of “House” recently, and a joke between main character Dr. Foreman and Pittsburgh Steelers coach Mike Tomlin took place. Wei says he was in a quandary when he wanted to watch the episode a second time.
“I couldn’t remember which episode it was in, let alone which moment within the episode,” Wei says in the statement. “With the new caption search, I just type in ‘Mike Tomlin,’ and voila.”
Jason Blackwell, practice director with ABI Research, says features like Hulu’s caption search will be an asset to users with online content growing exponentially each year.
“It’s going to be important for all service providers,” Blackwell says. “I think (Hulu’s caption search) is an important step in what will become the future.”
Blackwell says ABI will soon begin to study features like caption search and examine specifically how it can be beneficial not only to online video, but other specialized services, such as pay TV and other cable and satellite services with a user interface.
While Hulu’s caption search is breaking new ground for the company, it is not a pioneer in the technology. Numerous smaller companies have offered similar services, some as long as five years.
One of those companies, Realtime Transcription Inc., offers a service called Transendia. Tanya Ward English, technology director for Realtime Transcription, says Transendia offers a video search based on the full text of a transcript, in addition to glossary tags for non-spoken information within a video.

Image courtesy of RealTime Transcription
Ward English says Transendia caters to a higher-end market. Many of the company’s clients do not want videos hosted on such public sites as YouTube. Instead, she says, those clients opt to use a customized player with full search and playing options from Transendia.
“One of the most useful and unique things about our technology, I think, is something upon which we have a patent pending,” Ward English says. “That’s the ability go pin-point search video or audio files directly from a text search engine like Google or Bing.”
Ward English, a self-described advocate for people with hearing loss, says caption search technology is especially beneficial for online users who have such a condition.
Blinkx, another smaller, specialized company, was launched in 2005 and went public two years later. Suranga Chandratillake, CEO and founder of Blinkx, says his company offers an advanced video search engine feature that gives online video users an opportunity to not only search for captions within a program, but to look for titles and episode names of particular programs.
“We can extract a lot of information from what’s going on inside a video,” Chandratillake says. “Our video search engine doesn’t just work on our own site. It can also get results from other sites like YouTube, or any other site with video out there.”
Chandratillake says Blinkx has been a popular service with advertisers, and revenue has doubled in the past three years.
More companies are sure to join Hulu, Realtime Transcription, Blinkx and others as the quest continues to make caption search technology an integral part of online video.
UK Launches Semantic Data Site: Will the Rest of the World Follow?

Image courtesy of http://www.opte.org/maps/tests/
By Linda Broughton
Sir Tim Berners-Lee, credited with conceptualizing if not inventing the World Wide Web, is not finished yet.
Lee and Professor Nigel Shadbolt, both appointed as Information Advisors to the UK Government, are coordinating Data.gov.uk. Data.gov.uk plans to provide the general public with a single access point to all the United Kingdom’s national, regional and local statistics, surveys and studies - all the information in the UK collected under the umbrella of the national government.
In a passionate speech at TED, Berners-Lee explains that no data is an island - it’s the relationships between different data that make the data valuable, insightful and useful. And these relationships are not often visible to the human eye - if anything, our own preconceptions and individual personalities often interpret rather than understand the relationships between different data, closing us off to real social trends and concerns.
Berners-Lee wants to use the Web to bypass the human attempts to dress up data for personal or political purposes. Instead, he wants the Data.gov.uk platform to expose the real significance buried within the data-to-data relationships through encouraging the digital mapping of raw, unadulterated social, political and economic data. Berners-Lee calls this ‘linked data,’ a precursor to the semantic Web that he believes will one day explain the meaning behind how we collect, calculate, evaluate and use the data that we post online.
Not all of these data relationships will be obviously significant - imagine an application that maps the relationship between data discussing the annual amount of Cadbury chocolate sold in Yorkshire and the corresponding annual number of failed marriages within the same geographical location and time period. However, the project is expected to help identify unexpected and insightful data relationships, insights that would normally take several decades and hundreds or thousands of brilliant socialist scientists, statisticians, psychologists, focus groups and public policy experts to simply suspect.
The automatic data relationship mapping will allow the UK government and the UK public to discover connections, trends and causal relationships that will inform public policy for decades to come. If implemented properly, Data.gov.uk could come very close to comprehensively mapping and explaining the past and real-time behavior of the public - and thus allow platform users to accurately plan the future of a society.
Of course, the key to the concept is the raw data. Currently, there are already third-party applications that map roads and potholes in the UK, provide statistics and information about the location of doctors throughout the UK, and give up-to-date information about local schools. This is useful data to aggregate but it is not yet generating anything that a few quick, targeted searches online can’t. The next push will encourage interested developers to create applications linking the different data to generate data relationship maps that give researchers at think tanks and academic institutions something to ponder and investigate.
Supporters of the open data movement are urging private businesses to follow the UK government’s example and release their raw data to the public. If the private sector keeps its data too close and too secret, the sector risks losing potential profits that would arise from information generated by an external comparison and review of their aggregated data. Moreover, the private sector makes up an important part of modern society. Accurately identifying, explaining and impacting public trends requires more than the government’s analysis of the population’s behavior, it requires an accurate understanding of the public’s practices in commerce and industry.
Data.gov.uk may soon be the way for the UK government (and anyone else interested) to keep several fingers not only on the pulse of modern UK society, but on its stomach, windpipe, eyes, mouth, ears, etc. The platform and its third-party applications may soon provide an in-depth and automatic monitor of modern British, Scottish and Northern Irish daily public life. Do the industries want to jump in now, developing their own applications and supplying their own data to complete the public picture, or will there always be a yawning gap in the data buried in the private sector’s own digital databanks?
http://www.opte.org/maps/tests/
A Mind of Its Own: Search Engine Technology Ever Pervasive

By James Zipadelli
Americans performed more than 15 billion searches in January, which is up 3 percent from the month before, the audience measurement service comScore says. The latest search engine rankings show that Google is still king when it comes to search engines. “Google Sites accounted for 9.9 billion searches, followed by Yahoo! Sites (2.6 billion), Microsoft Sites (1.7 billion), Ask Network (574 million) and AOL LLC (375 million),” the release says.
Although Google spokesperson Nate Tyler declined to comment on Google’s numbers, he did say that Google Suggest Technology is an effective way to help users search for what they are looking for.
“As you type into the search box on Google Web Search, Google Suggest offers searches similar to the one you’re typing. Start to type [ new york ] — even just [ new y ] — and you’ll be able to pick searches for New York City, New York Times, and New York University (to name just a few). Type some more, and you may see a link straight to the site Google thinks you’re looking for — all from the search box,” Google’s Help Forum says. (Ask.com and Microsoft were not available for comment at press time.)
Kevin McFall, co-founder of the vertical search engine RushmoreDrive.com, says the level of difficulty “is pretty high” for anyone trying to gain a share in the search engine market because established search engines spend large amounts of money on marketing and advertising. However, he says there are ways a new search engine can differentiate themselves from their competition. RushmoreDrive.com was a sister site of Ask.com and was shut down in June 2009 due to the recession.
“One must position the value of one’s search in such a way as to change existing behaviors and habits of those who already use Google, Yahoo, AOL or Bing by offering them a reason to change and then delivering a rich enough experience to warrant their frequent return,” McFall says. “One must also realize that instead of taking on the major search players head on, one must find a way to backdoor them to get a slice of the market share instead of trying to compete directly.”
According to McFall, he was able to do this with Rushmore Drive.com by marketing his website as a discovery engine and a search engine. “We achieved the ability to deliver a richer and more relevant set of results through our unique index and page ranking algorithm, along with a distinguished universal results page, which delivered text, image, video and blog results all in one page,” McFall says.
He also suggested search engines that have a social component would be more successful long-term.
There are also specialized websites that find search engine technology useful. For example, Healthline Networks uses search engine technology to help customers with health and drug information.
Healthline Networks CEO West Shell says, “We’ve found out that consumer search can be complicated when it comes to health. Consumers and doctors speak different languages, and often consumers don’t know what to look for when they start.”
Shell says the technology Healthline Networks uses is based on “semantic taxonomy,” or classification, of health information. He also says the technology is always being updated to ensure customers have the latest information available and that they are partners with health carriers like Aetna.
Rich Kahn, CEO of the search engine eZanga.com, says his search engine is being redesigned and should be finished by late 2010.
The redesign allows eZanga.com to “significantly increase the number of sources we pull information from, improve our relevancy algorithm so that our results will be more accurate to the queries performed by our users [and] designing new technologies, that are not used by any other search engine at present, that will improve how we display our results to users in a way that will be more useful to our users,” Kahn says.
Google Steps Up White Space Chase

Image courtesy of Google, Inc.
By John Greaves
Google recently made a significant move in the battle over white space by proposing to the FCC to be one of several administrators of a TV bandwith’s geo-location database. This is in response to FCC filings requiring such a database to minimize signal interference between white space devices and broadcast signals.
This is a key component of the fight to make unused white space available for bandwidth expansion, as signal interference is one of the key arguments made against such a move by white space opponents like the National Association of Broadcasters.
According to the proposal, Google would build and oversee the database as well as provide data repository, registration and determination of available channels/query process.
Having Google, a major industry player, be one of the administrators begs the question of whether letting Google keep track of what channels are available for bandwidth as well as the location and identification of all white space devices might not run counter to an open and competitive marketplace.
However, Rick Rotondo, CMO for Spectrum Bridge, which deployed the world’s first TV white spaces network and also applied to be a database manager, says Google’s proposal does not surprise him.
It’s a competitive environment,” Rotondo says. “But I think Google is actually trying to make the playing field more level.”
In a Jan. 8 blog on the Spectrum Bridge site, Rotondo had previously explained that the FCC is not granting unlimited authority in authorizing white space database managers.
“Specifically, in the case of TV white spaces database managers, the FCC is authorizing companies to represent themselves as being able to meet the minimum requirements the FCC has set out in its previous ‘Report and Order,’ as well as some new requirements spelled out in the recent Public Notice,” the blog states.
Rotondo’s comments echo Google’s official position on this issue. “I see myself here, in D.C., as playing kind of a defensive role, in terms of maintaining open platforms where they exist today, and more offensively, trying to be constructive in terms of creating new platforms and particularly ensuring that those new platforms also remain open to innovation and consumer choice,” says Richard Whitt, Google’s Washington Telecom and Media Counsel, in a January 2009 interview for CircleID.com.
That view has not changed. A Google spokesperson who asked not to be identified tells DMB, “We strongly supported the FCC’s decision to adopt an open, unlicensed model for white spaces, similar to what exists for Wi-Fi. Much as open, unlicensed access to the Wi-Fi spectrum has led to its widespread use, open, unlicensed access is also crucial to fulfilling the potential of white spaces.”
Whitt further supports the argument. “From the Google perspective, the C-Block, to us, was a successful story. We came into it with the hopes of triggering the openness conditions, by making the bid that would enable that to happen, which we did.”
Some may wonder if Google’s move signals the impending doom of 3G and 4G technology. However, such fears are unfounded, according to Digital Society analyst George Ou, a former network engineer who built and designed wired network, wireless network, Internet, storage, security and server infrastructure for various fortune 100 companies.
“The idea that you’re going to take unmanned white space and replace 3G and 4G is wrong. The guys that like white space always talk about propagation, and how Wi-Fi doesn’t have enough propagation,” Ou says. “What they don’t understand is propagation sounds like a good thing but it’s a bad thing. The more you can propagate the less you can use. The lower the power, the more the signal dies as it gets further away, the more you can use the same spectrum channels. That’s why Wi-Fi has such pathetically low signal strength, so you can reuse the same signals.”
Google tends to take the opposite view. A Google spokesperson says, “This spectrum is extremely valuable and has the potential to transform the way we connect to the Internet. As Larry Page has put it, ‘white spaces’ are like ‘Wi-Fi on steroids’ - wireless Internet with much faster speeds, stronger signals and more affordable costs.”
As far as the database is concerned, Spectrum Bridge’s Rotondo points out that the vague nature of the FCC’s ruling has left the door wide open for interpretation. “There is nothing saying you have to tell a device the best channel, you just have to give a list of available channels,” he said.
Google says it has yet to hear from the FCC regarding its proposal, which is understandable due to the enormity of creating a national broadband plan and presenting it to Congress in time for the new March 17 deadline.
John Greaves is a writer living in Dallas, Ga. His work has appeared in newspapers, magazines and websites.
Competing With Google is Like Dancing With Elephants

Google image courtesy of Google, Inc.
Google image courtesy of Google, Inc.
By Ned Smith
Google. It’s the master of all it surveys, an 800-pound elephant that’s not afraid to flex its muscles. “No company inspires more awe, or fear,” wrote Ken Auletta in Googled: The End of the World as We Know It. No one knows that better than the companies large and small that have had the temerity to challenge Mountain View’s dominion.
The landscape is littered with names of wannabes who became also-rans when they attempted to compete with Google in the area of its core competency, Internet search. Names like Alta Vista, Lycos, InfoSeek, Excite and Inktomi. They may still be around, but their glory has been eclipsed.
Search was just the beginning for the Google octopus. Google’s interest are universal and the boundaries of the realm of products and services it offers expand daily; if it’s got a pulse and can be digitized, Google’s there. Gmail, Google Docs, Picasa, YouTube, Google Maps, Google Voice, Chrome, Android Nexus One, etc., etc. You get the picture.
Is the past prologue for companies audacious enough to compete with Google’s far-flung empire, or is peaceful coexistence a possibility? Can you dance with an elephant without getting your toes stepped on?
Sridhar Vembu knows all about competing on the periphery of the Google empire. He is the CEO and founder of Zoho, a company that offers a comprehensive suite of Web-based apps ranging from word processing to spreadsheets that encroaches on the turf of Google Docs. Yet Zoho maintains cooperative, albeit competitive, relations with Google and, in fact, has integrated Google Docs with Zoho CRM, Zoho Mail, Zoho Docs and Zoho Projects.
“It works both ways, ” Vembu says. “The elephant helps pave the way through the jungle, as long as we avoid getting crushed. And as you will notice, we dance with the elephant well.”
That kind of cordial entente may be possible when you’re not targeting the heart of the empire. But what happens when you’re aiming squarely at the secret sauce. Search, for example. Do you still want Google on your dance card?
That’s an issue faced by Jodange, a recent entrant into the world of sentiment analysis, a subset of the search universe that is gaining traction. In the not-so-distant past, there was a quasi-paranoid fear in the search community of being gobbled up by Google. “I think a lot of people get really nervous about interacting with Google, much in the same way they got nervous about interacting with Microsoft,” says Larry Levy, Jodange’s co-founder, chairman and CEO. The fear, he says, was that the larger player would just copy what you had because they had more resources and clout.
That was then. “The thought today,” Levy says, “revolves around the fact that there are all these partnerships that these companies need to engage in so they can push various capabilities out there with some degree of timeliness. If they see something they like they almost always deal with small companies as an external R&D resource.” The “not invented here” barrier to adoption is largely a thing of the past.
But what is in store for the future? Is Google too large to fail or can it be toppled by a more nimble organization that’s forming in a garage somewhere? “It’s been done before,” Levy says, citing Google’s successful assault on Yahoo, the previous 800-pound elephant in the search world. “The question is, can it be done again? It’s the young guys that think they can rule the world. Some of them will. The vast majority won’t.”
Facing off with Google is always a complicated calculus. Being acquired by the giant is one option, but not one that’s universally attractive. “Our preference and intent is to stay independent, competing on value, depth and integration,” Zoho’s Vembu says. “Cloud computing will see consolidation. That is inevitable in any industry. Our own focus is to navigate these waves with a focus on solid engineering, outstanding product execution and a strong focus on serving our customers well. We believe those are the traits that will help us stay vibrant and healthy long term.”
And there will be casualties, companies that flare like sunspots in the Google universe and then quickly fade into oblivion. “There are many companies that have a built-to-flip model, with an extreme short-term focus,” Vembu says. “Some of them get lucky and hit the jackpot, while others peter out. We believe a strong R&D and engineering is essential to survive long term. All too [often], vendors focus too heavily on marketing, at the expense of R&D. We believe that business model has a short shelf life.”
“It’s an ecosystem,” Levy says. “With an ecosystem, you get the people who are producing and the people who are consuming. Small companies just need to be these producers and get themselves to the point where they’re either big enough to break out or get consumed by one of the big boys.”
If you look at Google’s acquisition pattern, it’s easy to see how good, small ideas can be more powerful when folded in with larger ideas. “Let’s talk about Wolfram Alpha,” Levy says, referring to the “computational knowledge engine” launched last year by Stephen Wolfram, a British physicist, software developer, mathematician and computer programmer. “Do you think they have the potential to be successful on their own? I question that. It could be quite compelling, but it’s not compelling on its own.”
The problem is when companies think they can build a business, but in actuality they are part of another, larger solution, Levy sys. And he’s clear-eyed about Jodange’s role in a firmament where the Google sun is the main source of light. “There are pieces of Jodange that fit into that category. We’d probably be more powerful as part of a broader offering than to try to do it on our own. This is a struggle small companies have continuously as they’re trying to define who they are,” Levy says.
Ned Smith is a New York-based writer who reports on business and technology. He can be contacted at nedsmith@gmail.com.
Is Real-Time Search Working?

Image courtesy of OneRiot
By John Greaves
Real-time search lets the user seize timely information from a quick-flowing stream of data. According to analysts Matt Booher and Ilona Vijnik, “It’s a step toward giving users the ability to access highly relevant and fresh information, delivered in real time from across the Web.” From the proliferation of real-time search engines like Wowd and Collecta to Google’s recent entrance into real-time search, it seems companies are leaping at the chance to tell us everything immediately.
Booher and Vijnik both of Bridge Worldwide, a Cincinnati-based digital marketing agency say the main question to consider when asking if real-time search is working is to consider how the information provided by real-time search furthers a specific objective.
This includes users who tracked the development of the Kanye West-Taylor Swift story on Collecta and those who shared YouTube videos through OneRiot after learning of Michael Jackson’s death, as well as those who followed weightier issues.
“What’s happening in Yemen right now you can’t find through traditional search,” says Gerry Campbell, CEO of Collecta.
So is real-time search succeeding? According to Tobias Peggs, GM of OneRiot, “Our definition of success for real-time search is ‘Is it adding user value?’ and from a business perspective can that user value be monetized?”
One monetization model involves having clients pay for real-time services. That’s the model used by Jobvite, a real-time search recruitment tool, which boasts it can give job recruiters a real-time picture of prospective candidates based on their total Web presence. Jamie Glenn, chief products officer at Jobvite, says real-time search is crucial to the company’s success. “Recruiters can use our tech to go across social Web to find people who fit a job to see if they’re interested. Once they find that person on open Web or social network they can see that person’s profiles on Facebook, Twitter and LinkedIn.”
Adam Hyder, CTO of Jobvite, says using real-time search enables recruiters to begin to build relationships with candidates. “We had a recruiter who saw a candidate’s wish list on Amazon and he sent a book from the wish list as a gift to that candidate,” Hyder says.
However, not just vertical search engines want to monetize real-time search. According to OneRiot’s Peggs, monetization makes relevancy crucial to making real-time searches more than a novelty item. “If you’re delivering a fire hose of information with no relevance attached to it, it’s kind of hard to figure out what advertising should go with that,” Peggs says.
Analysts Booher and Vijnik say current hot searches include whale wars, Richard Blumenthal and Nexus One for Verizon while hot topics include Microsoft Office 2010 and Nexus One Google phone. This is important data for marketers who use it to tailor ad campaigns to popular searches.
Sites like OneRiot are actively trying to monetize those searches and willingly collaborate with marketers to do so. “If you’re a developer you can grab our API and use our real-time search results from it, we provide tech support, we in addition offer real-time ads, if you’re a developer who’s proved your users like real-time ads we can do a 50/50 revenue share,” Peggs says.
Monetization requires consumer loyalty. Mark Drummond, CEO of Wowd, says providing relevant information brings users back to your site. “People want to be engaged with material events as they occur. A lot of search engines say we’ll vomit at you all the tweets with these words in it. If I want tweets, I’ll go to Twittersearch. ”
Unfiltered searches not only detract from relevance but also attract spammers according to David Evans, CTO of deeplocal.com, which produces the newspaper search engine NewspaperNinja.com. “The problem here is mostly dataset. If we’re seeing a hard time detecting spam in email, we’ll see a really hard problem for Bayesian detection algorithms at about 20 words to sample from. Especially when we’re abbreviating everything and jargon is changing at an alarming rate,” Evans says.
Add to that tweets from disgruntled customers according to Dave Conklin, president of Internet Marketing for ProspectMX. “As human beings we don’t tweet when something is working. The second something goes bad, you tweet about it and you get a skewed point of reference in real time,” Conklin says.
Some sites fight spam by ranking links based upon user feedback. This is the approach used by Jobvite and Wowd. OneRiot’s approach indexes results based upon investigating links to see what’s behind them. Various websites have documented Google’s real-time vulnerability. Google acknowledges the danger from spam but a Google spokesperson says, “Google search aims to show users the most relevant results for a given query. We apply the same high standards to real-time that we do to the rest of the web. There are always unscrupulous people who will try to game our ranking systems, but as always we are uniquely equipped to suppress spam content.”
Spam is also an issue for Internet marketers who post multiple times to keep up with random tweets. They have to be careful to avoid looking like spam. Bob Bentz, who monitors placement for Advanced Telecom’s mobile marketing product www.84444.com, says unstable rankings are behind multiple postings in Google, Bing and Yahoo on many of their company’s most important keywords. “Google rankings are changing sometimes several times per day! Yahoo is not changing daily and Bing rarely changes, although more often than Yahoo does,” Bentz says. “What it says to me is that we need to continue to send new content out there several times per day to keep our rankings active on the first page.” Bentz hopes filtering systems such as Wowd’s panel rank can differentiate between robotic postings and original content.
“I think sometimes we’re concentrating on SEO and we don’t see the forest for the trees. If your stuff is seen in blog posts and articles and someone sees it and wants to come buy your product that’s the goal, not ranking,” Bentz says.
Profitability is what the future of real-time search is about, according to OneRiot’s Peggs, “2009 was about proving that real-time search adds value. 2010 is about monetization.”
John Greaves is a freelance writer based just north of Atlanta, GA. His work has appeared in newspapers and on the Web.
