Data Mining for Junior Data Scientists: Data Analytics With Python

Data Mining for Junior Data Scientists: Data Analytics With Python

Copyright: © 2023 |Pages: 56
DOI: 10.4018/978-1-6684-4730-7.ch012
OnDemand:
(Individual Chapters)
Available
$33.75
List Price: $37.50
10% Discount:-$3.75
TOTAL SAVINGS: $3.75

Abstract

It is crucial for junior data scientists to learn computer programming as data science software packages may not always cater to the requirements of data analysis. Python provides a vast library of algorithms for data analysis, including NumPy, Pandas, Matplotlib, Seaborn, and Scikit-learn. NumPy and Pandas aid in organizing datasets as part of the pre-processing stage, while Matplotlib and Seaborn offer a range of data visualization commands. These visualization tools are instrumental in data exploration processes, such as creating histograms and scatter plots, and displaying data mining results like cluster analysis outcomes. Scikit-learn is a popular library in the data science industry that offers various data mining commands for regression, decision constructs, and cluster analysis, covering both supervised and unsupervised learning. Therefore, junior data scientists must learn Python programming for data science applications, especially when using software packages that require editing the model using Python commands.
Chapter Preview
Top

Introduction

Nowadays, there are tools used for analyzing data during the workflow; the data extraction tools, data survey tools, data preparation tools, Data Analysis Tools with Data Mining Techniques and Data Visualization tools. Each step has a ready-to-use software in both instant software and language programming software. Python is a tool that supports the entire workflow as it can be used for data analysis purposes; manipulating datasets, importing routines, developing data visualization, and data analysis with data mining techniques using libraries (Massaron & Mueller, 2015: Mueller & Massaron, 2019).

Complete Chapter List

Search this Book:
Reset