Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Challenges in Big Data Analysis

M. Govindarajan

Source Title: Encyclopedia of Information Science and Technology, Fifth Edition

DOI: 10.4018/978-1-7998-3479-3.ch041

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Big data brings new opportunities to modern society and challenges to data scientists. On one hand, big data holds great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of big data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. Prior to data analysis, data must be well constructed. However, considering the variety of datasets in big data, the efficient representation, access, and analysis of unstructured or semi-structured data are still challenging. Understanding the method by which data can be preprocessed is important to improve data quality and the analysis results. The purpose of this chapter is to highlight the big data challenges and also provide a brief description of each challenge.

Chapter Preview

Top

Background

David Lazer et al., (2009) discusses an emerging field that leverages the capacity to collect and analyze data at a scale that may reveal patterns of individual and group behaviors. Stadler et al., (2010) developed an efficient EM algorithm for numerical optimization with provable convergence properties. High dimensionality also gives rise to incidental endogeneity, a phenomenon that many unrelated covariates may incidentally be correlated with the residual noises. The endogeneity creates statistical biases and causes model selection inconsistency that lead to wrong scientific discoveries (Liao and Jiang, 2011; Fan and Liao, 2012). Jianqing Fan et al., (2013) gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. Nawsher Khan et al., (2014) comprehensively surveys and classifies the various attributes of Big Data, including its nature, definitions, rapid growth rate, volume, management, analysis, and security. This study also proposes a data life cycle that uses the technologies and terminologies of Big Data. Future research directions in this field are determined based on opportunities and several open issues in Big Data domination. These research directions facilitate the exploration of the domain and the development of optimal techniques to address Big Data. Lenka Venkata Satyanarayana (2015) provides an in-depth analysis of different platforms available for performing big data analytics. This paper surveys different hardware platforms available for big data analytics and assesses the advantages and drawbacks of Big Data. D. P. Acharjya et al., (2016) explore the potential impact of big data challenges, open research issues, and various tools associated with it. Akhil et al., (2017) analyzed the potential effect of big data challenges, open research issues, and different tools related with it. Ripon Patgiri (2018) presents a study report on numerous research issues and challenges of Big Data which is employed in very large dataset. Reihaneh H. Hariri et al., (2019) reviews previous work in big data analytics and presents a discussion of open challenges and future directions for recognizing and mitigating uncertainty in this domain.

Key Terms in this Chapter

Big Data: Big data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data.

Scalability: The scalability issue of big data has led towards cloud computing, which now aggregates multiple disparate workloads with varying performance goals into very large clusters.

Incidental Endogeneity: Incidental endogeneity is another subtle issue raised by high dimensionality.

Big Data Problems: Big data problems such as heterogeneity, noise accumulation, spurious correlations, and incidental endorgeneity, in addition to balancing the statistical accuracy and computational efficiency.

Heterogeneity: Big data are often created via aggregating many data sources corresponding to different sub-populations.

Noise Accumulation: Analyzing Big Data requires us to simultaneously estimate or test many parameters. These estimate errors accumulate when a decision or prediction rule depends on a large number of such parameters.

Spurious Correlation: High dimensionality also brings spurious correlation, referring to the fact that many uncorrelated random variables may have high sample correlations in high dimensions.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Challenges in Big Data Analysis

Abstract

Background

Key Terms in this Chapter

Complete Chapter List