Future Outlier Detection Algorithm for Smarter Industry Application Using ML and AI: Explainable AI and ML for Smart Industry Evolution Using ML/AI Algorithms and Implementations

Future Outlier Detection Algorithm for Smarter Industry Application Using ML and AI: Explainable AI and ML for Smart Industry Evolution Using ML/AI Algorithms and Implementations

Kunal Dhibar, Prasenjit Maji
DOI: 10.4018/978-1-6684-8785-3.ch008
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Throughout many real-world investigations, outliers are prevalent. Even a few aberrant data points can cause modeling misspecification, biased parameter estimate, and poor forecasting. Outliers in a time series are typically created at unknown moments in time by dynamic intervention models. As a result, recognizing outliers is the starting point for every statistical investigation. Outlier detection has attracted significant attention in a variety of domains, most notably machine learning and artificial intelligence. Anomalies are classified as strong outliers into point, contextual, and collective outliers. The most significant difficulties in outlier detection include the narrow boundary between remote sites and natural areas, the propensity of fresh data and noise to resemble genuine data, unlabeled datasets, and varying interpretations of outliers in different applications.
Chapter Preview
Top

Introduction

Most real-world datasets contain data observations that are unlikely to correspond to the particular framework and/or characteristics of such dataset. Outliers are assertions that differ markedly from most of the data points found in the datagram. Outlier identification is a problem that must be addressed in a variety of usage, which include fake prevention and detection (e.g., potentially malicious utilization credit and debit cards or even different kinds of monetary transactions), healthcare information analyzation (e.g., capable of recognizing dynamics are changing to therapeutic interventions among patient populations), fault detection in production processes, and detection of network intrusions, among others. Additionally, the presence of outliers effects several data processing tasks, necessitating the limitation or elimination of outlier observations. Recognizing outliers in multidimensional data is a tough task that gets increasingly difficult when dealing with high-dimensional datasets.

Outlier identification is a significant topic in data mining which has been widely investigated for many years by different scholars. The definition of an anomaly is “an occurrence in a dataset that seems to be incongruous with the balance of that the collection of data,” according to. Mining outliers’ methodologies may be utilized in a range of areas, notably identifying credit card fraud, detection of network intrusions, and environmental monitoring. In feature extraction, there are two primary missions. To begin, we should define what data are considered outliers in a given set. Secondly, a productive technique for calculating these outliers is required. The outlier problem was initially investigated by the statistical community. They assume that the presented dataset is normally distributed, and an entity is referred to as an outlier if it differs significantly considerably from these dividends. In contrast, finding a suitable allocation for responsible for considerable is practically difficult. To solve the above noted shortcoming, the information management industry has proposed a number of model-free solutions. Outliers based on distance and outliers based on density are two examples.

Outlier detection is a prominent subject in the data mining field since it intends to find trends that happen seldom when compared to other data mining methodologies. An outlier is a finding that differs substantially from and then contradicting the main section of a datagram, as if though created by a separate methodology. Outlier detection is critical because outliers may provide both raw patterns and actionable insights about something like a dataset. Outlier detection is used in a broad variety of situations, as well as criminal identification, credit card fraud analysis, detection of network intrusions, medical diagnostics, defective detection in essential safety systems, and image processing abnormalities detection. There has been an awful lot of effort recently in outlier detection research, with several approaches described. Current research on outlier detection may be classified into three types based on whether or not label evidence is accessible or can be used to create outlier detection approach: unstructured, supervised, and semi-supervised strategies.

Figure 1.

Different outlier approach

978-1-6684-8785-3.ch008.f01
Top

Literature Review

Hawkins established an accepted definition of outlier, stating that outliers are measurements that differ considerably from the rest of the data. For something like the interpretation of significant, substantial, and meaningful data, outlier identification is essential. Outlier detection techniques are used to find anomalies in information contexts including such higher dimensional data, unpredictable data, real - time processing, network data, and time series data.

  • -

    Outlier recognition is achievable through initial data set creation examination and model analysis.

  • -

    By just using the permitted operations. This characteristic relies on algorithmic statistics that favors one controller over another.

Complete Chapter List

Search this Book:
Reset