Statistical Model Selection for Seasonal Big Time Series Data

Statistical Model Selection for Seasonal Big Time Series Data

Brian Guangshi Wu, Dorin Drignei
Copyright: © 2023 |Pages: 14
DOI: 10.4018/978-1-7998-9220-5.ch182
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Time series exhibiting seasonal behavior are common in areas such as environmental sciences and economics. Given the current capabilities to generate and store large amounts of data, in particular seasonal time series recorded at a large number of time points, new modeling and computational challenges arise. This article addresses statistical model selection for such big seasonal time series data as follows. A small sample of model orders is obtained, the corresponding time series models are fitted, and an information criterion for each of them is computed. Kriging-based methods are used to emulate the information criterion at any new set of model orders, followed by an efficient global optimization (EGO) algorithm to identify the optimal orders, thus selecting the model. Both simulated and real seasonal big time series data are used to illustrate the method, showing that the model orders are accurately and efficiently identified.
Chapter Preview
Top

Introduction

Seasonal time series abound in areas such as environmental sciences and economics. For example, seasonal temperatures (e.g., Chen et al., 2016; Murthy & Kumar, 2021), seasonal precipitation (e.g., Sayemuzzaman & Jha, 2014; Martin et al., 2020) or seasonal wind speed (e.g., Shih, 2021) are common in environmental sciences, while seasonal business cycles (e.g., Gregory & Smith, 1996) or seasonal labor data (e.g. Liebensteiner, 2014) can be found in economics. Analyzing such data sets provides useful insight into seasonal patterns that have an impact on human activities and economic development. Due to recent capabilities to collect large amounts of data, however, classical statistical analysis methods have limitations, the big time series data sets posing new computational and modeling challenges.

In time series analysis, order identification refers to the selection of a time series model characterized by non-negative integer orders, which is followed by parameter estimation, diagnostic checking and forecasting. Despite being a critically important early step in time series analysis, order identification is perhaps the least developed among these steps. In autoregressive AR(p) processes the partial autocorrelation function is zero after lag p, thus identifying the AR order p. The autocorrelation function of moving average MA(q) processes is zero after lag q, providing a convenient method to identify the MA order q. The least-squares type method ESACF using the extended sample autocorrelation function has been proposed in Tsay and Tiao (1984) for the order identification of mixed autoregressive moving average ARMA(p,q) models. This method uses a sequence of linear regression models to identify the orders (p,q). A related method called SCAN has been proposed by the same authors in Tsay and Tiao (1985), using a canonical correlation approach. The applicability of these methods is facilitated by tables from which the orders p, q are identified. These methods can also be used for integrated ARMA (i.e. ARIMA) models. However, these methods are not directly applicable to other time series models, such as seasonal autoregressive integrated moving average (SARIMA) models, or certain types of nonlinear models. Cross-validation for time series model selection is a potential alternative, but it may be nontrivial to apply due to the inherent serial correlation (Bergmeir et al., 2018).

The most commonly used method for time series model selection is based on evaluating an information criterion for a few plausible time series models and choosing the model that minimizes such a criterion (e.g. Brockwell & Davis, 2016; Shumway & Stoffer, 2017). When a small set of plausible models is not available, one performs an exhaustive computation and minimization of the information criterion over a large enough grid of orders (Brockwell & Davis, 2016). However, choosing the best model using this exhaustive method is computationally challenging for big time series data, which could be a univariate large-sample time series (e.g. appliances energy consumption time series of length 19,735 in Candanedo et al., 2017), a collection of large-sample time series (e.g. 438 stocks over 1,495 days in Lunde et al., 2016), or it can occur in a less common format (e.g. a temporal sequence of 606 facial expressions in Xu et al., 2020; Wang et al., 2020).

Key Terms in this Chapter

Model Selection: A procedure that chooses the best model among a set of competing models.

Time Series: A sequence of observations recorded over time.

Computer Experiment: An experiment intended to study how the changes in the inputs of a computer model affect the outputs.

Optimization Algorithm: An iterative process leading to an optimal solution.

Kriging: An interpolation method initially developed in geostatistics.

Emulator: A simplified model that mimics the behavior of a more complex model.

Complete Chapter List

Search this Book:
Reset