Evaluation of NoSQL Databases: MongoDB, Cassandra, HBase, Redis, Couchbase, OrientDB

Evaluation of NoSQL Databases: MongoDB, Cassandra, HBase, Redis, Couchbase, OrientDB

Houcine Matallah, Ghalem Belalem, Karim Bouamrane
DOI: 10.4018/IJSSCI.2020100105
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The explosion of the data quantities, which reflects the scaling of volumes, numbers, and types, has resulted in the development of new locations techniques and access to data. The final steps in this evolution have emerged new technologies: cloud computing and big data. The new requirements and the difficulties encountered in the management of data classified “big data” have emerged NoSQL and NewSQL systems. This paper develops a comparative study about the performance of six solutions NoSQL, employed by the important companies in the IT sector: MongoDB, Cassandra, HBase, Redis, Couchbase, and OrientDB. To compare the performance of these NoSQL systems, the authors will use a very powerful tool called YCSB: Yahoo! Cloud Serving Benchmark. The contribution is to provide some answers to choose the appropriate NoSQL system for the type of data used and the type of processing performed on that data.
Article Preview
Top

Introduction

The plethora of sources to create digital data and extension of computer science in different sectors and areas (Astrology, Meteorology E-Commerce, E-Government, Multimedia, etc.) exploded amounts of data, which reflects the scaling volumes and types. It is extremely difficult to estimate the quantities of digital data produced every day in the world of business, government and individuals, whether photographs, videos, texts, tweets, or emails.

Computer designs since the nineties used data warehouses (Inmon, 2005), which are usually centralized in servers connected to storage arrays. These architectures poorly scalable (addition of power on demand). Indeed, the growing volume of data, the wide heterogeneous data, and the data velocity, traditional DBMS and even Data Warehouses have struggled to adapt.

This scientific revolution that invading the world of IT has imposed new issues that have led to the development of new technologies to contain and process these large volumes of data. The goal is to discover new orders of magnitude to capture, search, share, store, analyze, and present data. This new IT era has led to replacing traditional databases limited by ACID constraints with new solutions that respond to these imposed changes. These new requirements have led to the emergence of the movement NoSQL (Cattell, 2011; Oussous, Benjelloun, Lahcen, & Belfkih, 2017) and NewSQL movement (Aslett, 2011; Piekos, 2015).

Several open-source and proprietary NoSQL solutions have been designed, developed and deployed by the big companies of the sector, to manage large volumes of data manipulated. However, the lack of standardization and the panoply of solutions proposed in the market complicates the choice of the model appropriate to the operating environment, which poses a real problem on the best NoSQL solution to adopt compared to the user needs.

The contribution presented in this paper is to provide indicators which can help interested actors to decide on the solutions adopted by their companies, by developing a comparative study on a set of NoSQL solutions widely deployed on the market. This study compares the performance of NoSQL Databases from the experimental point of view. Note that the current work is an extension of a first work in which the performances of MongoDB and HBase were compared and which was the subject of a paper published (Matallah, Belalem, & Bouamrane, 2017a).

Our study focuses on six data management solutions characterized by the implementation in their kernels of the same algorithm “MapReduce” (Lattanzi, Moseley, Suri, & Vassilvitskii, 2011), these are the MongoDB (Chodorow, 2013; Membrey, Plugge, & Hawkins, 2011), Cassandra (Lakshman & Malik, 2010), HBase (George, 2011), Redis (Macedo & Oliveira, 2011), Couchbase (Brown, 2012), and OrientDB (Tesoriero, 2013) models. To evaluate and compare the available NoSQL solutions, several benchmarks have been designed, the most commonly used is the YCSB (Cooper, Silberstein, Tam, Ramakrishnan, & Sears, 2010).

This paper will be organized as follows: In the first Section of the manuscript, we expose the limitations of relational DBMS in large scale distributed environments which led to the emergence of NoSQL. In the second Section, we will present the NoSQL data management systems designed to meet the new needs required for scaling up. In in the third Section, we will focus on the six NoSQL solutions compared and the benchmark used. After assessing the performance of each database, the different experimental results of this comparative study will be synthesized and analyzed in the fourth Section. The paper is concluded with a summary and some perspectives for our future works.

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024)
Volume 15: 1 Issue (2023)
Volume 14: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing