Search Takes a Sentimental Journey
“Entities are things like persons, locations and companies,” says Guillaume Mazières, vice president of sales and marketing for Temis. “Tokenization is what you need to understand the language.” Temis, headquartered in Paris, was founded by a team of IBM text mining researchers and is a nine-year veteran in the text analytics game.In a very general way, this is the text mining heavy lifting that has to take place before text analytics or sentiment analysis can kick in. Entity extraction is the core offering in Temis’ Luxid line of solutions for discovering information and extracting knowledge from unstructured data. “In Text Mining 360, our base product, we have 40 types of entities,” says Mazières. For Temis, he says, sentiment analysis is “how to automatically read a text and rate it in terms of positive, negative or neutral.” But value, he says, comes from taking it beyond simply questions of polarity. “What the customer is asking for is one step beyond, which is linking the thinking to a specific topic,” Mazières says. “Polarity is a good start, but you need to link it to a specific topic.”
Just as in cooking or carpentry, it is the prep work that determines the quality of the outcome. In the case of text analytics and sentiment analysis, this means selecting the right data sources and loading the right taxonomies ― definitions, classifications, list of acronyms, etc. ― into the software. “You need to understand the data to fine-tune the software,” Mazières says. “It’s not pure magic. You need to have someone define the right sources, to define the right topics and provide some value-added intellectual analysis on top of the software analysis. Figuring out what is positive and what is negative is where it gets interesting. That’s where the semantics come into play. Disambiguation comes from determining the context; it’s all about parsing.” The tab for a minimum Temis solution to do that would start at $150,000.
The oldest hand in the field of computational linguistics is Teragram, a division of SAS formed in 1997 in Cambridge, Mass. The company’s tagline, “Practical Solutions to Monstrous Amounts of Information,” frames the issue in terms that echo the way Jodange’s Levy defines the information overload problem.
Teragram’s Sentiment Analysis Manager (SAM), launched in the third quarter of 2009, is the newest entrant in the field. “Teragram’s SAM has a hybrid approach that essentially looks at using a statistical and linguistic approach to sentiment analysis,” says Manya Mayes, SAS’s chief text-mining strategist. ‘We’re not aware of anyone else in the industry who’s doing that.”
Mayes agrees with Mazières that it’s the upfront work that can make or break sentiment analysis. “You have to use the right sentiment analysis model,” she says. “For example, ‘long hair’ would probably not register as a positive if you were looking at a military population, where your hair is expected to be quite short.” All sentiment analysis solutions need to be particularized for context to be successful, she says.
By combining rules-based and linguistic methodologies, Teragram believes it has created a customizable solution that offers the best of both worlds. “A rules approach has lots of limitations,” says Gaurav Verma, SAS’s global products manager. “Like any rules-based methodology, it requires governance. Most importantly, rules are really very good for recurring events. By taking a hybrid approach with SAM we’re looking at rules-based but we’re also looking at more linguistics-based techniques, so we’re able to refine those rules. The rules-based approach means you’re defining things you know. The statistical linguistic approach allows for discovery of information you might not necessarily have thought to look for. We can’t expect domain experts within companies to think of every single rule that would be necessary for building the sentiment analysis model.”
Though SAS’s marketing team is positioning SAM initially as a social media analysis tool, that’s more a reflection of customer demand than SAM’s analytical potential. ‘Right now there’s a lot of push toward social media,” Mayes says, “but the tool can be equally useful with other sources.” The starting price for SAM is $160,000.
The future? Is search convergence on its way? “Sentiment analysis is no short-term hot trend,” Richard MacManus writes in ReadWriteWeb. “It will eventually become a key feature of search engines, which will integrate the aggregate sentiment of the crowd into search results.”
That’s as good as it’s going to get until we reach the transparent information nirvana of the semantic web as envisioned by Tim Berners-Lee.


Comments
2 Responses to “Search Takes a Sentimental Journey”Trackbacks
Check out what others are saying about this post...[...] Search Takes a Sentimental Journey | Digital Media Buzz (tags: sentimentanalysis) [...]
Social comments and analytics for this post…
This post was mentioned on Twitter by Jodange: Another nice article on Jodange http://bit.ly/HlUnX...