Modeling and forecasting of climatic parameters : univariate SARIMA versus multivariate vector autoregression approach

Agriculture sector throughout the world including Bangladesh is extremely vulnerable to the negative consequences of climate change as evident in a good number of studies. Accurate climate forecasting may prove a valuable resource in mitigating these consequences in agriculture. The study aims to identify the best performing forecasting method by comparing the forecasting abilities of univariate seasonal autoregressive integrated moving average (SARIMA) and multivariate vector autoregression (VAR) models in forecasting monthly maximum and minimum temperatures, humidity, and cloud coverage in Bangladesh. Though the univariate time series investigate the influence of the past values of a single time series on the future values of that specific series, the VAR approach forecast multivariate time series simultaneously incorporating the interrelationship among the groups of variables. Monthly forecasts of climatic parameters for in-sample over the period 1972-2008 and out-of-sample from 2009-2013 were generated via a univariate SARIMA and a VAR approach. Different forecast accuracy measures reveal that VAR model give better forecast than univariate SARIMA model. The forecast results using VAR(9) model from January 2014 to December 2021 show that maximum and minimum temperature, as well as humidity are increasing while the cloud coverage is decreasing, that is, consistent with global warming. Moreover, the impulse response function results exhibit the fluctuated and significant dynamic relationships in future among the foresaid climatic variables. Thus, findings of the study can potentially allow Bangladeshi farmers and other actors in the agriculture sector to make proper planning to abate unwanted impacts or reap the expected benefits of favourable climate. Introduction It is now widely accepted that the impacts of climate change is inevitable (e.g., IPCC, 2001) in many parts of the world particularly in the developing countries and that proper planning, decision making are necessary to adapt with these changes. Climate change is a serious threat to crop productivity especially in the food insecure regions like south Asia. Rice growing countries of South Asia including Bangladesh is very much vulnerable to climate change. So, food security is totally weakened and in secured in those countries (Amin et al., 2015; Al-Amin et al., 2017). Economic, food security and agricultural production throughout the world are highly defenseless due to climate variability. Farmers and other decision makers in agriculture are not ready for the weather conditions which occur and accordingly make their decisions based on previous experiences (Jones et al., 2000). Therefore, in temperate environment precise investigation, analysis and forecast of climate parameters are imperative for better management of agriculture. Climate predictions to the agriculture sector may help in mitigating some of the adverse effects of climatic variability (Easterling and Mjelde, 1987). If climate predictions are successfully implemented, it may be able to guide the population at risk to diminish vulnerability to climate change. Several studies have shown the potential benefits of using climate forecasts on the decision making process in agriculture (Jones et al., 2000; Hansen, 2002). Many methods have been commonly used for modeling and forecasting of the climatic parameters (Dastorani et al., 2016). Among various modeling of climate variables, time series modeling is a key technique for long term forecast and decision making (Chatfield, 2001). Different statistical and graphical methods are used to select the best statistical model for forecasting a series. Many researchers have used time series modeling technique to forecast climatic variables (Dastorani et al., 2016). They concluded that combining different time series models is good for accurate forecasting (Dastorani et al., 2016). Finding the best model among various models is very important for accurate forecasting. Among two times series models (univariate and multivariate) the univariate time series model investigates the influence of the past values of a single time series on the future values of that specific series. In contrast, the multivariate time series model forecast a group of variables simultaneously incorporating the interrelationship among them. Climatic variables generally exhibit seasonality-periodic fluctuations, so among the univariate time series models, the seasonal autoregressive integrated moving average (SARIMA) model is useful in this context. Furthermore, the multivariate vector autoregression (VAR) model is used for more accurate forecasting of a group of variables incorporating the interrelationship among the variables. ARTICLE INFO Article history: Received: 25 March 2018 Accepted: 25 April 2018


Introduction
It is now widely accepted that the impacts of climate change is inevitable (e.g., IPCC, 2001) in many parts of the world particularly in the developing countries and that proper planning, decision making are necessary to adapt with these changes.Climate change is a serious threat to crop productivity especially in the food insecure regions like south Asia.Rice growing countries of South Asia including Bangladesh is very much vulnerable to climate change.So, food security is totally weakened and in secured in those countries (Amin et al., 2015;Al-Amin et al., 2017).Economic, food security and agricultural production throughout the world are highly defenseless due to climate variability.Farmers and other decision makers in agriculture are not ready for the weather conditions which occur and accordingly make their decisions based on previous experiences (Jones et al., 2000).Therefore, in temperate environment precise investigation, analysis and forecast of climate parameters are imperative for better management of agriculture.
Climate predictions to the agriculture sector may help in mitigating some of the adverse effects of climatic variability (Easterling and Mjelde, 1987).If climate predictions are successfully implemented, it may be able to guide the population at risk to diminish vulnerability to climate change.Several studies have shown the potential benefits of using climate forecasts on the decision making process in agriculture (Jones et al., 2000;Hansen, 2002).Many methods have been commonly used for modeling and forecasting of the climatic parameters (Dastorani et al., 2016).Among various modeling of climate variables, time series modeling is a key technique for long term forecast and decision making (Chatfield, 2001).Different statistical and graphical methods are used to select the best statistical model for forecasting a series.Many researchers have used time series modeling technique to forecast climatic variables (Dastorani et al., 2016).They concluded that combining different time series models is good for accurate forecasting (Dastorani et al., 2016).Finding the best model among various models is very important for accurate forecasting.Among two times series models (univariate and multivariate) the univariate time series model investigates the influence of the past values of a single time series on the future values of that specific series.In contrast, the multivariate time series model forecast a group of variables simultaneously incorporating the interrelationship among them.Climatic variables generally exhibit seasonality-periodic fluctuations, so among the univariate time series models, the seasonal autoregressive integrated moving average (SARIMA) model is useful in this context.Furthermore, the multivariate vector autoregression (VAR) model is used for more accurate forecasting of a group of variables incorporating the interrelationship among the variables.
To the best of our knowledge, a number of studies used single (e.g.univariate or multivariate) time series models or comparison within the single approach in simulation and prediction of single climate parameters.But few studies incorporate the comparison between multivariate vector autoregression (VAR) and univariate seasonal autoregressive integrated moving average (SARIMA) for climatic parameters.For instance, Dastorani et al. (2016) compare stochastic time series models for predicting rainfall in Iran.Ferdous and Baten (2011) used least squares method for trend analysis of climatic temperature, rainfall, relative humidity and sunshine in Bangladesh.Shahin et al. (2012) used VAR model for forecasting temperature, humidity and cloud coverage of Rajshahi district in Bangladesh.On this backdrop, our study aims to develop multivariate vector autoregression (VAR) forecast models for forecasting monthly minimum temperature, maximum temperature, humidity and cloud coverage in temperate climate condition-a case of Barisal District (South-central) of Bangladesh and also, compare the performance of the VAR model with univariate SARIMA model.Furthermore, our study determines the response of one climate variable for the shock of other variables in future using impulse response function (IRF).

Data
The data set of our study was collected from Bangladesh Meteorological Department (BMD) over the period 1972 to 2013 (i.e.42 years) of Barisal district covering 780 observations including year-wise monthly average maximum temperature, minimum temperature, humidity, and cloud coverage.The measurement units of these variables are Degree Celsius, Percent and Octas, respectively.Barisal district was selected because of the vulnerability of this district's due to climate change.We split the entire data into two parts as training data set over the period 1972 to 2008 and test data set over period 2009 to 2013.The training dataset is used to estimate the model whereas the test dataset is used for assessing the accuracy of the forecasts using the fitted model.Afterwards, we estimate the parameters of the model using the training set and use the test data as an out-of-sample trial run to analyze how well the model predicts future unknown data.

Model specification Unit Root Test
In order to determine whether the climatic variables are stationary in our study, the different unit root tests: Augmented Dickey Fuller (Dickey and Fuller, 1979), Phillips-Perron (Phillips and Perron, 1988) and Kwiatkowski-Phillips-Schmidt-Shin (Kwiatkowski et al., 1992) where L indicates lag operator, moving average (MA) polynomial of order q with vector of coefficients ), , , ( 1  Box and Jenkins (1970), is commonly used.The best model is obtained with a minimum value of Akaike information criterion (AIC), Corrected AIC (AICc) or Bayesian information criterion (BIC).The Ljung-Box test for autocorrelation (Ljung and Box 1978) and the Kolmogorov-Smirnov test (Chakravarti et al., 1967) for normality as well as Normal Q-Q plot and histogram of the residuals is used to check the adequacy of each model.Finding appropriate values for p, d and q can be complicated.Thus, in our study, we used the statistical software R to fit SARIMA models to climate data using the auto.arima()function in 'forecast' package proposed by Hyndman and Khandakar (2008).Hyndman and Khandakar algorithm which coalesces unit root tests, minimization of the AICc and Maximum Likelihood Estimation (MLE) to attain an ARIMA model is employed in the function auto.arima() in R.These algorithms are suitable for both seasonal and nonseasonal data.

Vector Autoregression (VAR)
The univariate time series model forecasts a series considering the unidirectional relationship between the variables.However, in the real sense, there are many cases where bidirectional relationship exists among the group of variables.For instance, the variables we have considered in our study: maximum temperature, minimum temperature, humidity and cloud coverage are interconnected naturally.In such situation, the vector autoregression (VAR) model is more suitable which modeled as if the variables influence each other equally.In this case, all variables are now treated as "endogenous".In our study, we would use the pair-wise Granger causality tests proposed by Granger (1969) and popularized by Sims (1972), which tests whether an endogenous variable can be treated as exogenous.If x Granger-cause y and y Granger-cause x it would mean that x and y are both endogenous.Therefore, in this circumstance a VAR model is required.
Vector autoregressions (VARs) were introduced by Sims (1980).Following Shahin et al. (2012)  , can be written as: A VAR(p) model can be written in matrix form as: The log-likelihood function is written as follows: The maximum likelihood (ML) estimate of B and Σ are written as follows: The lag length for the VAR(p) model finds out which minimizes some model selection criteria.The most common criteria are-the Akaike Information Criterion (AIC) (Akaike, 1974), Schwarz Information Criteria (SIC) (Schwarz, 1978), Hannan-Quinn information criteria (HQ) (Hannan and Quinn, 1979), Final Prediction Error (FPE) (Akaike, 1969) and Likelihood Ratio (LR) statistic (Lutkepohl, 1991).In our study, the minimum values of AIC, SC, HQ and FPE have provided the number of relative time lags and can be computed as follows: where, p>k and under the null hypothesis: 0 :  Breusch (1978) and Godfrey (1978) are most commonly applied for testing the lack of serial correlation as well as the multivariate Jarque-Bera normality test, (Jarque and Bera, 1987) are applied to the residuals of a VAR(p) model.

Forecasting
For a given empirical VAR, h-step forecasts can be calculated recursively according to . 2 Together with forecasts, impulse response analysis and forecast error variance decomposition are other tools for investigating the dynamic relationships.

Impulse Responses
A shock to the j-th variable not only affects itself but also transmitted to all of the other response variables through the dynamic structure of the vector autoregression model.An impulse response function draws the consequence of a one-time shock to one of the innovations on the present and future observation of the response variables.

Variance Decomposition
Variance decomposition partitions the variation in a response variable into the element shocks to the vector autoregression.Thus, it provides information about the comparative magnitude of each random innovation.
The Forecast Error Variance Decomposition (FEVD) is obtained from the orthogonal impulse response coefficient matrices; . n Ψ The FEVD facilitates the user to investigate the performance of variable j to the h-step forecast error variance of variable m.Mathematically, the forecast error variance is calculated as follows:  Hyndman and Koehler (2006) is the Mean Absolute Scaled Error (MASE).Our study also incorporates this measure to select the best forecasting model.

Results and Discussion
It is vividly evident that the PP test for maximum temperature (MAXT), minimum temperature (MINT), humidity (HUM) and cloud coverage (CLOUD) are stationary at 1 % significance level (Table 1).The ADF and KPSS tests also reveal that all the climatic parameters are stationary.(5,0,4)(0,0,2)[12], SARIMA (5,0,0)(2,0,0)[12], SARIMA (5,0,1) (0,0,2) [12], respectively.After estimating the parameters of these models, we assessed the adequacy by analyzing their residuals.Figure 1 and 2 show the Q-Q plot and histogram for residuals.Both figures represent that the standardized residuals for the models approximated nomal distribution.In addition, the Kolmogorov-Smirnov test gives no reason to reject the assumption that thedistribution of residuals is normal as well as the Ljung-Box test ensures that there is no autocorrelation in the residuals (p> 0.10) for each fitted model as shown in Table 3.Thus, we can say the SARIMA models as shown in Table 2 fit the data very well.To trace out the endogeneity of the variables the Granger causality test was used.The Table 4 indicates bidirectional causality among the climatic parameters.We know that, if X (say) is granger cause of Y (say) and Y (say) is granger cause of X (say) is called bidirectional causality between X (say) and Y (say).Hence, the Granger causality test suggests that there is bidirectional causation between all variables.
The sequential modified Likelihood Ratio test statistics (LR), Final Prediction Error (FPE), Akaike Information Criteria (AIC), Schwarz Information Criteria (SC) and Hannan-Quinn information criteria (HQ) listed in Table 5 were used to select the order of VAR model.The LR statistics, FPE, AIC and HQ indicate that, the appropriate lag order of VAR is 9 but SC is 5. Ivanow & Kilian, (2001) suggest, in the context of VAR models, that AIC tends to be more accurate with monthly data.Therefore, our study chose lag length 9 of VAR model on the basis of LR, FPE, AIC and HQ criteria.Figure 3 shows the actual, fitted and residual of VAR(9) model of these variables in which the fitted values are presented by dot lines.The diagnostic checking of the selected VAR( 9) model had performed based on the residuals.In our study for testing the lack of serial correlation and normality in residuals of VAR( 9) model we used the Portmanteau test and the multivariate Jarque-Bera test.The Portmanteau test ensures that there is no serial correlation in the residuals as well as the multivariate Jarque-Bera test shows that the residuals of the VAR( 9) model are normally distributed as shown in Table 6.In Figure 4, the approximately straight line of Q-Q plot of the residuals series indicates the normal distribution.And the histogram of residuals series shown in Figure 5 to Figure 8 respectively obtained from model VAR( 9) suggest that the shocks may be roughly normally distributed.Thus, the dynamics of the system are fully captured.We computed the predicted values using the fitted VAR(9) model.A simultaneous computation system is used in computing these predicted values.The predicted values of MAXT, MINT, HUM and CLOUD for VAR(9) from January, 2014 to December, 2021 with 95% confidence interval (CI) are evident in Figure 9.
On the other hand, Table 9 shows the predicted values of these four climatic parameters from January, 2017 to December, 2021.The forecasted values imply that the maximum and minimum temperatures as well as humidity exhibit slightly upward trend, on the contrary, cloud coverage exhibits slightly downward trend.Our findings are consisting with Shahin et al. (2012).
As we know, the VAR model is over-parameterized systems in which the individual parameters can hardly be interpreted meaningfully.For this reason, other methods such as Forecast Error Variance Deco mposition (FEVD) and Impulse Response Function (IRF) are also used.Forecast Error Variance (FEV) is used to examine the short-run dynamic interactions between the variables.Table 10 represents the variance decompositions for the forecast horizons 1, 6, 12, 18 and 24 months.The response of MAXT to MINT, HUM and CLOUD reveal that the MAXT appears to be less exogenous in the system which explains 71 % of its FEV after 18 months.Likewise, the MINT, HUM and CLOUD account for 17%, 2% and 10%, respectively of the variation in the MAXT.Moreover, the response of MINT to MAXT, HUM and CLOUD show that the MINT explains 48% of its FEV after 18 months.The MAXT, HUM and CLOUD explain over 38%, 2% and 12%, respectively of the variation in MINT.The response of HUM to MAXT, MINT and CLOUD indicate that the HUM explains 54% of its FEV after 18 months.The MAXT, MINT and CLOUD explain 13%, 30% and 3% of the variation in HUM.The response of CLOUD to MAXT, MINT and HUM, we see that the CLOUD explains 40% of its FEV after 18 months.The MAXT, MINT and HUM explain 11%, 45% and 5% of the variation in CLOUD.So, the dependency suggests that, in short run, the MAXT, MINT, HUM and CLOUD are approximately very important relationship in future.Monte Carlo simulation was used to compute the response of each variable and standard errors.Our study takes into consideration with 50000 repetitions.The response in each variable is expressed to one S.D. innovation.For one standard deviation increase in the shock the graphical representations of the impulse response functions are shown in Figure 10.The immediate response of MAXT to MINT has a significant effect at 5% level with its 4th lag and explains that after 4 months the MAXT decreases by 0.288770% in response to the 0.04719% increase in the shock of MINT.Also, the HUM has a significant effect on MAXT at 5% level and we see that the MAXT may increase 0.117957% in response to the increase of 0.05717% in the shock of HUM with its 7 th lag.Likewise, after 5 months the MAXT significantly decreases by 0.250462% in response to the increase of 0.05180% in the shock of CLOUD at 5% level.Similarly, we observe that the response of MINT is significant to the shock of MAXT, HUM and CLOUD at 5% level.We see that after 10 months the MINT decreases by 0.486623% in response to the increase of 0.06125% in the shock of MAXT; also after 11 months 0.139879% increase in MINT corresponds to the increase 0.06818% in the shock of HUM.On the other hand, when the shock of CLOUD increases of 0.06303% the response of MINT significantly decreases by 0.333300% after 5 months.
Furthermore, we find that the impulse of MAXT, MINT and CLOUD exert significant effects on the response of HUM at 5% significance level.The response of HUM increases of 0.351624% if the shock of MAXT increases by 0.13057% after 4 months.Also when the impulse of MINT increases of 0.11680% the response of HUM decreases by 0.547854% after 7 months.Besides, the overall response of HUM is small, when the shock is CLOUD.Finally, we see that the response of CLOUD is significant to the shock of MAXT, MINT and HUM at 5% level.We observe that for 0.03022% increase in the shock of MAXT the response of CLOUD decreases by 0.111212% after 9 months and the reaction of CLOUD increase by 0.158626% for 0.02014% increases in the impulse of MINT after 12 months.Moreover, it is obvious that, when the shock is HUM, the almost overall reaction of CLOUD is positive but very small.In nutshell, we can say that for the shock of one variable does not have a permanent effect on the system.From Figure 10, we observe a fluctuated effect (e.g.positive and negative) of one response variable for the impulse of other variables in future.

Conclusions
Climate change impacts are global concern throughout the world and for that matter of Bangladesh where lives and livelihoods mainly rely on agriculture.In addition, the country is frequently cited as one of the most susceptible countries to climate change because of its disadvantageous geographic location; flat and low-lying topography; higher density of population; increasing poverty level; the dependence of many livelihoods on climate sensitive sectors, particularly agriculture and fisheries.Many of the anticipated adverse effects of climate change, such as higher temperatures, enhanced humidity, and shrink cloud coverage, will aggravate the existing stresses that already impede development in Bangladesh.For that reason, our study endeavours for appropriate statistical time series modeling to ensure better forecast of four climatic parameters in order to mitigate the vulnerabilities and allow enough room to farmers to dwindle unwanted consequences of climate change, facilitating rational agricultural decisions for boosting agricultural productivity.We used univariate seasonal autoregressive integrated moving average (SARIMA) and multivariate vector autoregression (VAR) for modeling and comparing the forecasting abilities of the climatic data.

Fig. 9 .
Fig. 9. Combined graphs for forecasted value of VAR(9) model from January, 2014 to December, 2021 were carried out.
After estimating VAR-model, it is of crucial interest to see whether the residuals satisfy the model's Accuracy MeasuresFinding the best forecast time series model it is pivotal to trace out the forecast accuracy of test data.The h-step forecast errors can be found as follows:

Table 1 . Unit root test for time series MAXT, MINT, HUM and CLOUD
Note: ***, ** and * represent the statistical significantat1%, 5% and 10% levels, respectively.The Augmented Dickey-Fuller (ADF) and the Phillips-Perron (PP) tested the null hypothesis of that the relevant series contains a Unit root I(1) against the alternative that it does not, while the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tested the null hypothesis that the series are I(0).

Table 5 . Various selection criteria for lag order of VAR
Note: * indicates lag order selected by the criterion LR: sequential modified LR test statistic (each test at 5% level), FPE: Final prediction error AIC: Akaike information criterion, SC: Schwarz information criterion HQ: Hannan-Quinn information criterion

Table 10 . Forecast error variance decomposition from VAR(9) model
The different unit root tests of stationarity reveal that maximum temperature, minimum temperature, humidity and cloud coverage are stationary at level.The pair-wise granger causality tests exhibit bi-directional causality among the variables.The best-fit SARIMA models with lowest AIC, AICc, BIC and estimated error variance for maximum, minimum temperature, humidity, and cloud coverage were fitted and the diagnostic results confirm the model adequacy.Based on different model selection criteria a 9 th order VAR model was fitted to our data and the diagnostic checking of the selected VAR(9)model using the Portmanteau test and the multivariate Jarque-Bera test ensure the lack of serial correlation and normality of the residuals of climatic parameters.Forecast accuracy measures include RMSE, MAE, MAPE and MASE in the case of in-sample data over the periodand out-of-sample data from 2009-2013 show that VAR(9) model give better forecast than univariate SARIMA model.The FEV decomposition suggests that the selected climate variables are important in future because they explain the future variation.Together with this, the impulse response function tells us the time path of one variables response to the shock of others variables in future.And finally, the monthly forecasts result using VAR(9) from January 2014 to December 2021 reveals that maximum temperature, minimum temperature and humidity are slightly increasing while the cloud coverage is decreasing minimally.Therefore, proper attention has to be taken to reduce environmental degradation as priority basis from government's and individual's perspectives.