Sentiment Analysis- A Competent Tool in Data Mining
Author: Sanket Kulkarni, May 1, 2015 – Posted in: Big Data – Tags: , , ,

Sentiment Analysis- A Competent Tool in Data Mining

As more and more devices are getting access to the web the data produced has also increased enormously. Of all the total data produced till now 90% of it is produced in last two years, this stat itself shows how the revolution of internet is producing vast amount of data which if used effectively can do wonders. People now-a-days communicate, participate on many social websites, blogs, forums etc. from which can offer great opportunity to analyze the data, apply theories, algorithms and technologies that search and extract relevant data from huge quantities of data available from various websites and mine them for opinions thereafter. Data analysis is widely growing as a field and sentimental analysis is an important feature involved in it. Sentimental analysis is basically determining the attitude/ judgment/ evaluation/ emotional state or intended emotional communication of the speaker or the writer with the use of natural language processing, text analysis, computational logistics and various algorithms.

The World Wide Web is growing at an alarming rate not only in size but also in the types of services and contents provided. Each and every users are participating more actively and are generating vast amount of new data. In this era of automated systems and digital information every field of life is evolving rapidly and generating data because of which huge amounts of data produced in field of science, engineering, medical, marketing, finance etc. Automated systems are needed automated analysis and classification of data which help to take enterprise level decisions.

There are three main classifications levels in sentiment analysis: 1. Document level classification. 2. Aspect level classification.3. Sentence level classification Document-level aims to classify an opinion document as expressing a positive or negative opinion or sentiment. It considers the document a basic information unit. Sentence level aims to classify the sentiment expressed in each sentence. However, there‟s not much difference between document level and sentence level because sentence are just short documents.

Point-wise Mutual Information (PMI) is a basic example of Sentiment Analysis.

The mutual information measure provides a formal way to model the mutual information between the features and the classes. This measure was derived from the information theory. The point-wise mutual information (PMI) Mi(w) between the word w and the class iis defined on the basis of the level of cooccurrence between the class iand word w. The expected co-occurrence of class iand word w, on the basis of mutual independence, is given by Pi * F(w), and the true co-occurrence is given by F(w) *pi(w). The mutual information is defined in terms of the ratio between these two values and is given by the following equation: Mi(w) =[ log (F(w) * Pi (w) / F(w) * Pi ] = log [ pi (w) / Pi]

The word w is positively correlated to the class i, whenMi(w) is greater than 0. The word w is negatively correlatedto the class iwhen Mi(w) is less than 0. PMI is used in many applications like developing a contextual entropy model to expand a set of seed words generated from a small corpus of stock market news articles. Their contextual entropy measures the similarity between two words by comparing their contextual distributions using an entropy measure allowing discovery of words similar to seed words. Once the seed words has been expanded words are used to classify the sentiments of new articles.

Sentiment mining research is of utmost importance not only for commercial establishments but also for the common man. With the World Wide Web offering various ideas and opinions it is very important to be aware of the malicious opinions also.This also opens a new challenge to researchers in order to build lexica, corpora and dictionaries resources for other languages.

Click here for government certification in Big Data