Text Mining has wide industry applications. We cover this topic in our R Program and even do live text mining sessions on data on Facebook and Twitter. It is one of the most exciting modules of the program and a big favourite amongst students and professionals.
Our colleagues at Prompt Cloud recently shared with us an interesting study. They analyzed more than 400 thousand reviews of unlocked mobile phones sold on Amazon.com to find out insights with respect to reviews, ratings, price and their relationships.
Mobile phones have revolutionized the way we purchase products online, making all the information available at our fingertips. As the access to information becomes easier, more and more consumers will seek product information from other consumers apart from the information provided by the seller. Reviews and ratings submitted by consumers are examples of such of type of information and they have already become an integral part of customer’s buying-decision process. The review and ratings platform provided by eCommerce players creates transparent system for consumers to take informed decision and feel confident about it.
Amazon.com is a treasure trove of product reviews and their review system is accessible across all channels presenting reviews in an easy-to-use format. The product reviewer submits a rating on a scale of 1 to 5 and provides own viewpoint according to the whole experience. The mean value is calculated from all the ratings to arrive at the final product rating. Others can also mark yes or no to a review depending on its helpfulness – adding credibility to the review and reviewer. In this study, we analysed more than 400 thousand reviews of unlocked mobile phones sold on Amazon.com to find insights with respect to reviews, ratings, price and their relationships.
We extracted the following information from the ‘unlocked phone’ category of Amzon.com:
- Product Title
- Review text
- Number of people who found the review helpful
The total number of reviews extracted were more than 400,000 covering close to 4,400 unlocked mobile phones.
This statistical analysis had the following goals:
- Perform exploratory analysis of ratings and reviews
- Find out relationship between price and the number of reviews
- Find out relationship between helpfulness of review and length of review
- Find out relationship between review length and product price
- Find out relationship between review length and product rating
- Find out relationship between product price and product rating
- Word cloud of most-used words
- Sentiment analysis
1. EXPLORATORY ANALYSIS
First let’s look at the distribution of ratings among the reviews. Most of the reviewers have given 4-star and 3-star rating with relatively very few giving 1-star rating. The mean value of all the ratings comes to 3.62.
Now let’s consider the distribution of the length of the review. We can see that maximum reviews contain less than 300 characters. The mean length of all the reviews comes to 230 characters which means most people usually tend to write short reviews within one to two sentences.
2. RELATIONSHIP BETWEEN PRICE AND NUMBER OF REVIEWS
Let’s now try to explore correlation between product price and number of reviews. This will help us answer questions like: Do expensive products receive more number of reviews?
The scatter above says not necessarily. Statistically the correlation is negligible (r = 0.013). So there is no relationship between price and the number of reviews it gets.
3. RELATIONSHIP BETWEEN HELPFULNESS OF REVIEW AND LENGTH OF REVIEW
Here we plot the average length of reviews and the average number of votes based on the helpfulness. Let’s see if more number of people find longer reviews more helpful. There is an acceptable positive correlation (r = 0.30) between the two as supported by the trend line below.
4. RELATIONSHIP BETWEEN REVIEW LENGTH AND PRICE
Now we’ll explore relationship between the average length of the reviews and phone price. The plot shows that there is no increment in the length of the reviews with increase in the price. The correlation is very close to zero and by removing the outliers the correlation remains weak (r = 0.01).
- RELATIONSHIP BETWEEN REVIEW LENGTH AND RATING
The plot between average review length and rating will help us find out if the products with detailed reviews attract better rating. Here we can see that there is no correlation between both.
- RELATIONSHIP BETWEEN PRODUCT PRICE AND RATING
Now we’ll find out if costlier products have better ratings. This plot shows there is some correlation (r = 0.26) between rating and price. When consumers pay more for a product, they also expect better quality and sellers need to meet this expectation. It can be considered that with cost the productivity increases, which in turn leads to higher rating.
7. WORD CLOUD
We segregated the reviews according to their ratings – positive reviews (4 or 5 star) and negative reviews (1 or 2 star). In both type of reviews there are certain common words like “work”, “battery” and “screen”. The most frequently used words in positive reviews are: “great”, “good”, “camera”, “price”, “excellent”, etc. In case of negative reviews words such as “return”, “back”, “problem”, “charge” are prevalent.
8. SENTIMENT ANALYSIS
The sentiment analysis shows that the majority of reviews have positive sentiment and comparatively, negative sentiment is close to half of positive. Among the eight emotions, “trust”, “joy” and “anticipation” have top-most scores. High scores for “joy” and “anticipation” could be because of the newly delivered phones. Also, the highest score for “trust” among all the emotions shows that the reviewers are writing the reviews with conviction and they trust the product.
Amazon’s product review platform shows that most of the reviewers have given 4-star and 3-star ratings to unlocked mobile phones. The average length of the reviews comes close to 230 characters. We also uncovered that lengthier reviews tend to be more helpful and there is a positive correlation between price & rating. Sentiment analysis shows that positive sentiment is prevalent among the reviews and in terms of emotions, ‘trust’, ‘anticipation’ and ‘joy’ have highest scores.
It’d be interesting to perform further analysis based on the brand (example: Samsung vs. Apple). We can also look at building a model to predict the helpfulness of the review and the rating based on the review text. Corpus-based and knowledge-based methods can be used to determine the semantic similarity of review text. There are many more insights to be unveiled from the Amazon reviews