Article Preview
TopIntroduction
Data refers to a collection of fact and information, and this data is the source of knowledge, information for the entire world. There may be various sources of data like databases; flat files, online sources and the amount of data coming from sources are huge in quantity. If we look into one such data source says online websites, and then we can see that there are more than 2000 tweets per second, more than ten thousand Google searches per second, more than a million emails are being sent which are coming from another million websites. So, this huge dataset or information generated which is popularly called Big Data (McAfee, Brynjolfsson, Davenport, Patil, & Barton, 2012) is not just about being big, the main crux lies in the management of data. Here comes the role of Big Data Storage Systems which are actually the store house of data. The most common form of Big Data Storage is the traditional storage, such as, RAM (Random Access Memory), Disk Drives. This paper analyses two diversified categories of big data storage systems- databases and cloud storage systems. All of the storage systems have their unique feature that supports CRUD operation (Create, Read, Update, and Delete) on data.
Databases are the most common source of data. The logical structure of a database defines the organization of data. The earliest data model was hierarchical data model e.g., IBM Information System which was followed by hierarchical database (Tsichritzis & Lochovsky, 1976). Further evolution of databases resulted in foundation of relational model (Rumbaugh, Blaha, Premerlani, Eddy, & Lorensen, 1991) where data is represented as tuples or rows which aggregate to form a relation and such systems are called Relational Database Management System (RDBMS). RDBMS uses structured query language (SQL) as its data query language.
Now, the transition from RDBMS to NoSQL is very significant (Hadjigeorgiou, 2013). RDBMS has several advantages (Jatana, Puri, Ahuja, Kathuria, & Gosain, 2012) like data is stored in a structured way which helps in maintaining the entity relationship. When the data volume is huge and data context is not fixed with time, the demand for incorporation of a new system becomes essential. NoSQL (Not Only Structured Query Language) not only supports the storing of dataset but also supports durability, reliability, availability and scalability (Han, Haihong, Le, & Du, 2011). Rather than following the ACID property, NoSQL database follows CAP (Consistency, Availability, and Partition Tolerance). With respect to transition-related application, RDBMS is better than NoSQL database.
Considering the NoSQL databases, they have a better management of structured, semi-structured and unstructured data (Moniruzzaman & Hossain, 2013, p. 19; Leavitt, 2010). There are four types of NoSQL databases like, (a) Key-Value: In this NoSQL database data is stored by forming of a group. This group is identified by a unique identifier known as key. Amazon S3, Azure follows this type of data storage structure to store large voluminous dataset. (b) Document: Here a set of data groups which have variable attributes are stored by forming a document. This document is identified by key value and presented in XML, JSON or BSON for-mat. CouchDB, MongoDB are the examples of document NoSQL database. (c) Graph: In a network-based system, instance of an entity is connected with other instance of another entity and this connection has explicit meaning to the storage dataset. In this situation, graph database stores dataset by holding information about how and in what way an instance is connected with other. OrientDB, Neo4j are the most popular graph-based NoSQL database. (d) Column-family: In this storage, data column-wise rather than as a horizontal tuple. This concept makes the data operation (i.e., access, storing) job faster. Cassandra, HBase are the example of Column-family NoSQL database.