Statistical Visualization of Big Data Through Hadoop Streaming in RStudio

Statistical Visualization of Big Data Through Hadoop Streaming in RStudio

Chitresh Verma, Rajiv Pandey
DOI: 10.4018/978-1-6684-3662-2.ch035
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Data Visualization enables visual representation of the data set for interpretation of data in a meaningful manner from human perspective. The Statistical visualization calls for various tools, algorithms and techniques that can support and render graphical modeling. This chapter shall explore on the detailed features R and RStudio. The combination of Hadoop and R for the Big Data Analytics and its data visualization shall be demonstrated through appropriate code snippets. The integration perspective of R and Hadoop is explained in detail with the help of a utility called Hadoop streaming jar. The various R packages and their integration with Hadoop operations in the R environment are explained through suitable examples. The process of data streaming is provided using different readers of Hadoop streaming package. A case based statistical project is considered in which the data set is visualized after dual execution using the Hadoop MapReduce and R script.
Chapter Preview
Top

Data Visualization

Data visualization is not only done by standard charts and graphs but also by technologically more advanced ways such as info-graphics, real-time dials and gauges, heat maps (Spakov&Miniotas, 2015). The visualization results like charts and bars are also interactive and they can be changed with a click of button. The data visualization is a well-developed domain where accomplished designers and data scientists have worked to build combination of the excellent visualization for data interpretation. It can be said that data visualization is not only creative but also decoding the data to the viewer is meaningful. In other words, connecting the gap between the actual data and logical inference is possible only by data visualization. A data designer uses his imagination to build the representation of the data which can easily be comprehended by the audience. All the combinations of data and its illustrations have the above mentioned sole purpose.

What Is Data Visualization?

Data visualization is the process of extracting the meaningful information from vast amount of data and then showing them in pictorial representation form for better understanding of the end users (Chen et al., 2007). Data visualization is science of filtering and isolating the data and then visualizing in different representation techniques.

The product of data visualization to the viewer may look as information moving from point A to point B. The data visualization process does not only involve designing the reports and charts but presenting it in a way that spectator can interpret the with least amount of effort.

Complete Chapter List

Search this Book:
Reset