An Engineering Domain Knowledge-Based Framework for Modelling Highly Incomplete Industrial Data

An Engineering Domain Knowledge-Based Framework for Modelling Highly Incomplete Industrial Data

Han Li, Zhao Liu, Ping Zhu
Copyright: © 2021 |Pages: 19
DOI: 10.4018/IJDWM.2021100103
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The missing values in industrial data restrict the applications. Although this incomplete data contains enough information for engineers to support subsequent development, there are still too many missing values for algorithms to establish precise models. This is because the engineering domain knowledge is not considered, and valuable information is not fully captured. Therefore, this article proposes an engineering domain knowledge-based framework for modelling incomplete industrial data. The raw datasets are partitioned and processed at different scales. Firstly, the hierarchical features are combined to decrease the missing ratio. In order to fill the missing values in special data, which is identified for classifying the samples, samples with only part of the features presented are fully utilized instead of being removed to establish local imputation model. Then samples are divided into different groups to transfer the information. A series of industrial data is analyzed for verifying the feasibility of the proposed method.
Article Preview
Top

Introduction

With the fast development of the computing power and data science, statistical and machine learning methods are applied to a wide range of fields. Numerous researches have been devoted to build accurate and efficient data models (Martin, Sequera & Huerga, 2017; Chiu, Tsai & Li, 2020; Shao, Zhu, Wang, Liu & Liu, 2020). The process of designing and manufacturing industrial products generates large amounts of data. Among many industrial categories, the automotive industry is a representative industrial field. Furthermore, the automobile industry has relatively complete information systems with large amounts of data accumulation.

The industrial data from automotive researches and developments includes design target data, simulation analysis data, test data, manufacturing data, operation data, etc. (Fang, Sun, Qiu & Kim, 2017; Xianping, 2019). These datasets are diverse and hierarchical. Due to the long development cycle of new products, the time span for collecting a sample is long and the amount of data is limited. Therefore, the data of automobile industry generally presents the characteristics of high dimension and small sample size. In addition, with the development of the automotive manufacturing industry, the range of recorded data is gradually expanded, and new indicators and parameters are constantly introduced. Hence all the newly introduced attributes are absent in the earlier records. Besides, the manual collection of data and the digitization progress can also lead to errors and missing values. As a result, it is inevitable to deal with fragmentary data collected in various stages of the development process.

As a key technology in the field of artificial intelligence, data mining and knowledge discovery technology aims to acquire novel and useful knowledge through data processing. Data mining and knowledge discovery technology consists of data acquisition, data pre-processing, data mining, evaluation, and application (Mariscal, Óscar & Covadonga, 2010). The key is to establish descriptive or predictive models through clustering, association analysis, classification, regression, and other machine learning methods.

Various researchers have applied these methods to engineering (Fotouhi & Montazerigh, 2013; Baraldi, Cannarile, Maio & Zio, 2016; Du, Wang, Yang & Niu, 2019). Specifically, a lot of researches have been done on occupant protection design with the help of the data mining methods (Zhao, Jin, Cao & Wang, 2010; Zhang, Ma, Chen & Zhang, 2013; Nie, Tang, Liu, Chang & Zhang, 2018).

These proposed methods are effective and are mostly based on simulated data without missing values. However, the accumulated test data in the research and development process usually contains numerous missing values, which hinder the process of building accurate models.

In general, the missing data mechanisms are classified into three patterns: (1) Missing completely at random (MCAR), when p does not depend on either the observed data or the missing data. (2) Missing at random (MAR), when p could depend on the observed data, but not on the missing data. (3) Not missing at random (NMAR), p could depend on the value of the attribute. p is the probability of a record having a missing value for an attribute. The approaches developed for handling missing values can be broadly divided into three different types (Liu, Pan, Dezert & Martin, 2016):

  • The first type, which is simplest yet effective in some situations, is to fill with default values or remove directly the record with missing values.

  • The second type is to process datasets without filling the missing values, such as the research by Pelckmans, Brabanter, Suykens & Moor (2005).

  • The third type is to impute the missing values by statistical analysis or machine learning methods. A great number of researches are devoted to solving the problem of missing values in this way.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing