This chapter provides a comprehensive introduction to a self-adaptive ReLU neural network method proposed. The purpose is to design a nearly minimal neural network architecture to achieve the prescribed accuracy for a given task in scientific machine learning such as approximating a function or a solution of partial differential equation. Starting with a small one hidden-layer neural network, the method enhances the network adaptively by adding neurons in the current or new hidden-layer based on accuracy of the current approximation. In addition, the method provides a natural process for obtaining a good initialization in training the current network. Moreover, initialization of newly added neurons at each adaptive step is discussed in detail.
Top1. Introduction
Given a data set
with xi∈Ω=[-1,1]d and positive weights
,consider the discrete least-squares problem: finding
such that
(1) where
![979-8-3693-0230-9.ch011.m05](https://igiprodst.blob.core.windows.net:443/source-content/9798369302309_321540/979-8-3693-0230-9.ch011.m05.png?sv=2015-12-11&sr=c&sig=hr%2Fbn4SaoE0vCltFyGTLpZnnk1h5HYlkghxbSiK3H48%3D&se=2024-02-24T20%3A16%3A41Z&sp=r)
is a ReLU neuron network defined in section 2 with
l hidden-layers and
L(∙) is a least-squares loss functional given by
.
For a prescribed tolerance 𝜀>0, this chapter presents a self-adaptive algorithm, the adaptive neuron enhancement method (ANE), to adaptively construct a nearly optimal network
such that the neural network approximation fnn(x) satisfiesL(fnn) ≤ 𝜀L(0),(2) where
is the square of the weighted l2 norm of the output data
.
Multi-layer ReLU neural network is described in this chapter as a set of continuous piece-wise linear functions. Hence each network function is piece-wise linear with respect to a partition of the domain. This partition, referred as the (domain) physical partition (see section 3), provides geometric feature of the function and hence plays a critical role in the design of self-adaptive neural network method. Determination of this physical partition for a network function is in general computationally expensive, especially when the input dimension d is high. To circumvent this difficulty, we introduce a network indicator function that can easily determine such partition.
The idea of the ANE is similar to that of standard adaptive mesh-based numerical methods, and may be written as loops of the form
train →
estimate →
mark →
enhance(3)Starting with a small one hidden layer network, the step train is to iteratively solve the optimization problem of the current network; the step estimate is to compute error of the current approximation; the step mark is to identify local regions that need refinement; and the step enhance is to add new neurons to the current network with good initialization. This adaptive algorithm learns not only from given information (data, function, partial differential equation) but also from the current computer simulation.
When the current error does not satisfy (1.2), an efficient ANE method relies on strategies to address the following questions at each adaptive step:
By exploiting the geometric feature of the current approximation, the enhancement strategy (see Section 4) determines the number of new neurons to be added at the last hidden layer. A new layer is added if a computable quantity measuring the improvement rate of two consecutive networks per the relative increase of parameters is small.