Why should I care The contemporary business world is a place where huge success and failure sit side by side. In traditional market research, business spends a huge amount to analyze customer’s opinions through continuous surveys and consultants. But nowadays social media empowers business a lot. Most the existing and potential customers are generating a treasure trove of data through Twitter, Facebook, LinkedIn and so on. Sentiment analysis is a powerful tool for mining the gold beneath the social media landslide. The goal of sentiment analysis is to identify the opinions expressed in a text.
It seems easy, right?
- I’m happy to watch a new movie
- I hate war
- I’m happy and excited to be going to watch a new movie
- I hate war and the violence it makes
- Ice cream shops are doing great even when the weather is bad
When SA turns hard
Sometimes it’s not possible to identify the opinion by just analyzing the polarity of words. Language usage and sarcasm are some of the reasons why sentiment analysis turns hard. It’s tough to analyze mixed sentiments in a text. Sometimes it’s difficult to classify sarcastic statements as positive or negative. As a human being, we can understand what sarcasm is and how it actually makes sense. If you put this into a neural network or any machine learning framework that come up with a simple classifier to just understand the sentiment, this would fail miserably. Another point to be considered is local dialects. If you train any neural network on data about local dialects, it would invariably not understand what is trying to say. Because some of the words in local dialects may not have any sense and its tough to train anything and everything. So if you have a pre-trained model doing some sort of test on these data and then, it would completely fail. This is one of the reasons why it’s important to understand the local culture and some companies are setting up local data centers, where local sentiment is captured. Natural language has a lot of ambiguity. From the above examples, it’s clear that words make sense contextually in natural language which humans can comprehend and distinguish easily, but machines can’t. This makes Natural Language Processing one of the most difficult and interesting tasks in AI.Using Natural Language Processing
- Spell check and grammar check
- Predictive text
- Auto summarization
- Machine translation
- Sentiment analysis
Some common approaches to sentiment analysis
Various methods in Machine learning and Natural Language Processing for sentiment analysis. Some of the most effective approaches we have today rely on the human-in-the-loop-approach: learning from the user feedback. The combination of machine-driven classification enhanced by human-in-the-loop approach increases acceptable accuracy than pure Machine Learning based systems.Tools & Libraries
Python’s scientific calculation libraries such as SciPy, NumPy have strong support from the academic world. It’s a very well established library that was chosen for its expressiveness, ease-of-use.We too have tools……
Much of the data that machine learning algorithms need for NLP tasks such as sentiment analysis, spam filtering all come from the web. Ruby has a web framework that is quite popular and generates massive amounts of data. While it doesn’t have the same vast academic network that Python or R has, it does have tools and has the added benefit of being easy to learn and comprehend.Sentimental gem
https://github.com/7compass/sentimental Sentimental gem was introduced for simple sentiment analysis with Ruby. It implements a lexicon-based approach to extract sentiments, where the overall contextual sentiment orientation is the sum of sentiment orientation of each word(tokens). The overall sentiment of a sentence is output as positive, negative or neutral. It uses a dictionary consisting of pre-tagged lexicons. The input text is converted to tokens by the Tokenizer and is then matched for the pre-tagged lexicons in the dictionary. To classify sentiments we can set a threshold value. Values greater than the threshold is considered as positive and less than that is considered negative. The default threshold is 0.0. If a sentence has a score of 0, it is deemed “neutral”.gem install sentimentalConsider the following example
require "sentimental" analyzer = Sentimental.new analyzer.load_defaults sentiment = analyzer.sentiment 'Be the reason someone smiles today' score = analyzer.score 'Be the reason someone smiles today' puts sentiment, scoreIt outputs
positive 0.225It works well for a simple sentence. Consider another example with mixed polarity. But, consider another example with mixed polarity.
require "sentimental" analyzer = Sentimental.new analyzer.load_defaults sentiment = analyzer.sentiment 'Icecream shops are doing good even at bad weather' score = analyzer.score 'Icecream shops are doing good even at bad weather' puts sentiment, scoreWe get the output as
negative -0.4194We expect a positive result here, but it failed. The overall score is determined by the sum of the scores of each opinion words. In its lexical dictionary, good is assigned a score 0.6394, bad is assigned -0.5588, and the token weather is assigned a score of -0.5. Hence the overall sentiment scores -0.4194. The gem was found to work well for simple sentences, but failed to give accurate results for sentences with mixed polarity.
Sentimentalizer gem
https://github.com/malavbhavsar/sentimentalizer Implements sentiment analysis in Ruby with machine learning. It’s basically training a model to use in the application. Machine learning based analysis gains more interest of researchers due to its adaptability and accuracy. It overcomes the limitation of the lexical approach of performance degradation and works well even when the dictionary size grows rapidly.gem install sentimentalizerWe need to train the engine in order to use it.
require "sentimentalizer" Sentimentalizer.setup class Analyzer def initialize Sentimentalizer.setup end sentiment = Sentimentalizer.analyze('I love Ruby', true) puts sentiment endThis outputs as
Training analyser with +ve sentiment +ve sentiment training complete Training analyser with -ve sentiment -ve sentiment training complete {"text":"I love Ruby","probability":0.8568115588520226,"sentiment":":)"}Overall sentiment is positive which is indicated as 🙂 But this method faces challenges in designing classifier, availability of training data, the correct interpretation of a new phrase which is not in the training dataset.