Fake Review Detection Using Machine Learning Techniques

Abhinandan V., Aishwarya C. A., Arshiya Sultana
DOI: 10.4018/IJFC.2020070104
Online reviews play a vital role in today's business and commerce. In the world of e-commerce, reviews are the best signs of success and failure. Businesses that have good reviews get a lot of free exposure on websites and pages that have good reviews show up at the top of the search results. Fake reviews are everywhere online. Online fake reviews are the reviews which are written by someone who has not actually used the product or the services. Because of the cut-throat competition, sellers are now willing to resort to unfair means to make their product stand out. This work introduces some supervised machine learning techniques to detect fake online reviews and also be able to block the malicious users who post such reviews.
Many approaches and techniques have been proposed to detect fake reviews. These methods have been used to detect fake reviews with bigger accuracy. Sun et al. (2016) has divided these approaches into two categories:

    Content Based Method: Content based methods focus on the content of the review. That is the text of the review or what is been told in it reviews. Heydari et al. have attempted to detect spam review by analyzing the linguistic features of the review. Ott et al. (n.d.) used three techniques to perform classification. These three techniques are- genre identification, detection of psycholinguistic deception and text categorization 1) Genre Identification: The parts of speech (POS) distribution of the review are explored by Ott et al. (n.d.). They used frequency count of POS tags as the features representing the review for classification.

    Detection of Psycholinguistic Deception: The psycholinguistic method approaches to assign psycholinguistic meanings to the important features of a review. Linguistic Inquiry and Word Count (LIWC) software was used by Pennebaker et al. (2001) to build their features for the reviews.

    Text Categorization: Ott et al. (n.d.) experimented n-gram that is now popularly used as an important feature in fake review detection. Other linguistic features are also explored. Such as, Feng et al. (n.d.) took lexicalized and un lexicalized syntactic features by constructing sentence parse trees for fake review detection. They show experimentally that the deep syntactic features improve the accuracy of prediction.

    Feature respected to Behavior: This study focuses on the reviewer that includes characteristics of the person who is giving the review. Lim et al. (2010) addressed the problem of finding users who were responsible for spam reviews. They have identified the following deceptive rating and review behaviors. Giving unfair rating too often: Professional spammers generally posts more fake reviews than the real ones. Suppose a product has average rating of 9.0 out of 10. But a reviewer has given 4.0 rating. Analyzing the other reviews of the reviewer if we find out that he often gives this type of unfair ratings than we can detect him as a spammer. Giving good rating to own country’s product: Sometimes people post fake reviews to promote products of own region.

This type of spamming is mostly seen in case of movie reviews. Suppose, in an international movie website an Indian movie have the rating of 9.0 out of 10.0, where most of the reviewers are Indian. This kinds of spamming can be detected using address of the reviewers. Giving review on a vast variety of product: Each person has specific interests of his own. A person generally is not interested in all types of products. Suppose a person who loves gaming may not be interested in classic literature. But if we find some people giving reviews in various types of products which exceeds the general behavior then we can intuit that their reviews are intentional fake reviews. Some researchers also used semi supervised classification techniques.

