Modeling and Short-Term Forecasts of Indicators for COVID-19 Outbreak in 25 Countries at the end of March

Background: The coronavirus, which originated in Wuhan, causing the disease called COVID-19, spread more than 200 countries and continents end of the March. There is a lot of data since the virus started. However, these data will be explanatory when accurate analyzes are made and will allow future predictions to be made. In this study, it was aimed to model the outbreak with different time series models and also predict the indicators. Methods: The data was collected from 25 countries which have different process at least 20 days. ARIMA(p,d,q), Simple Exponential Smoothing, Holts Two Parameter, Browns Double Exponential Smoothing Models were used. The prediction and forecasting values were obtained for the countries. Trends and seasonal effects were also evaluated. Results: China has almost under control according to forecasting. The cumulative death prevalence in Italy and Spain will be the highest, followed by the Netherlands, France, England, China, Denmark, Belgium, Brazil and Sweden respectively as of the first week of April. The highest daily case prevalence was observed in Belgium, America, Canada, Poland, Ireland, Netherlands, France and Israel between 10% and 12%.The lowest rate was observed in China and South Korea. Turkey was one of the leading countries in terms of ranking these criteria. The prevalence of the new case and the recovered were higher in Spain than Italy. Conclusions: More accurate predictions for the future can be obtained using time series models with a wide range of data from different countries by modelling real time and retrospective data.


INTRODUCTION
Coronavirus disease (COVID-19, SARS-CoV-2), which started in Wuhan, China in December 2019, has been recognized as a global threat and it turned into a pandemic that threatened two hundred countries at the end of March. Although there are differences in various indicators such as cumulative case and cumulative death among countries, the outbreak can be estimated with various mathematical models. For this purpose, various models such as logistics model, Bertalanffy model, Gompertz model, Cubic Curve, Exponential Curve, ARIMA models, phenomenological models, optimization algorithms have been examined before. But the estimates have been made for several countries such as China, Japan and USA, whose data are relatively cases reached very high at the end of March, and the outbreak period has exceeded one month. Under these conditions, modeling the data collected from many countries to define the outbreak process and the characteristics of the outbreak, to take precautions and to create projection for the countries where the outbreak has begun or will began is of great importance [1][2][3][4][5][6][7][8][9] . In this study, it was aimed to model the outbreak for six indicators from twenty five countries with different starting dates and processing at least twenty days of data using the end of March data. For this purpose, ARIMA (p, d, q), Simple Exponential Smoothing Model, Holt's Two Parameter Model, Brown's Double Exponential Smoothing Models were used. Then prediction and forecasting values were presented for the countries.

Data -Indicators and Models
In the study, twenty five countries were selected which cumulative cases exceeding 1000 among the 160 countries have exposed to the COVID-19 outbreak before March 15 and data for six indicators were modeled. The countries selected for modeling were divided into periods at certain intervals according to the start Time of the outbreak and was presented in Table 1 as a summary. The number of countries struggling with the outbreak increased to 200 at the end of March but data from 40 countries that did not contain sufficient data for modeling and started to struggle the outbreak after March 15 were excluded from the study. Since at least one of the countries which was announcing the first case at different periods, was selected for modeling, the results to be obtained from these countries will create a prediction for other countries under similar conditions and for countries that have just started the outbreak.  ARIMA (p,d,q):ARIMA time series are nonstationary time series, AR, MA and ARMA processes are stationary series. Generally, nonstationary processes are encountered in applications. Because, in practice, data generally varies due to seasonal, irregular movements or incidental reasons. The process becomes stationary when the nonstationary ARIMA time series is taken from the appropriate degree. ARIMA processes are expressed in ARIMA (p, d, q) notation. Here, p indicates the degree of autoregressive process, d is the difference of degrees, and q is the degree of the moving averages process. The ARIMA (p, d, q) process includes both AR (p), MA (q) and ARIMA (p, q) processes. The ARIMA process is an integrated autoregressive moving average process. In the ARIMA models, where d = 1, p and q is 0, then the ARIMA (0,1,0) process is obtained, which is defined as the random walk process [10][11][12] . When the first difference of the non-stationary series is taken (d=1), the ∆ = − −1 = ′ equation is obtained. If the series is not still stationary, a second difference is taken. ∆ 2 = ∆� ! � = ! − −1 ! = − 2 −1 + −2 is obtained. In this case, d = 2. Therefore, the non-stationary series is made stationary and shows the ARMA process. The model takes the form of ARIMA (p, d, q). In the method, trial method is used to determine the appropriate p, d and q values. Therefore, this method has the disadvantage, but it gives very good results in short term estimations. While d value is determined as the number of differences in the series, p and q values are obtained by examining partial autocorrelation and autocorrelation functions, respectively. The least squares method or nonlinear methods are used in parameter estimates. Assessment of goodness of fit is investigated by AIC and BIC criteria. In addition, compliance with the white noise process is checked for model suitability. The model must comply with the white noise process [10][11][12] . Simple Exponential Smoothing Model: The advantage of the model is that it can be applied in series with stochastic trends and the estimates can be updated by evaluating the latest changes in the data. The updating is done by giving different weights to the historical data. In cases where periodic and irregular fluctuations are very large, when there is no trend and seasonality, the use of exponential smoothing methods enables more accurate estimations. In the exponential smoothing method, a smoothing coefficient is used and it gives decreasing values according to the distances from today's period 13. According to the seasonal component, the model is = −1 + ( −1 − −1 ). In the model is the observation value at time t and the is the smoothed observation value at time t. The forecast error is equal to −1 − −1 . The smoothing parameter takes values between 0 and 1. Generally 0.05≤ ≤ 0.3 range is used. For optimal selecting smoothing parameters, researchers can perform different range of values. The , which gives the lowest average standard error value, will be the best choice13. The simple exponential smoothing method is only suitable for time series that move around an average. In the moving averages method, equal weight is given to the period values by experiment while different weights are given in the simple exponential smoothing method and these values are becoming exponential shape. In the simple exponential smoothing method, higher values are given to recently obtained values in weighting. The simple exponential smoothing method is often used to estimate short-term predictions13. Holt's Two Parameter Model: The Holt's two parameter model is used when there is a trend in the data. But in the data there is no seasonality. In the method, while each forecast is obtained, the previous forecast is updated and new forecast values are obtained. Since the method also takes into account the trend in the data, it is slightly more complicated than moving averages and simple exponential smoothing methods14. Holt's two parameter model is created with the help of the following equations.
= + (1 − )( −1 + −1 ) = ( − −1 ) + (1 − ) −1 � + = + In the above functions, p is the number of periods predicted, while the value of is the new smoothed value, while is the trend estimate value and is the actual value in period t. The � + is the post period prediction value14. Method uses trend estimates, so in the model there is two coefficients. α is the smoothing coefficient and the β is the coefficient used for trend estimation. These coefficients ranges between 0 and 1. The alpha and beta coefficients that make the total error square minimum are chosen as the most appropriate parameter value. Brown's Double Exponential Smoothing Model: This model was obtained as a result of the development of the simple exponential smoothing method. While the series contains trends, it does not include seasonality. Equations of the method is given in the below 15.
Here is the predicted level in time t, while is the predicted trend in time. If there is an increase and decrease in the series in the data set, Brown's model gives better results. In the method, two different smoothed series are used, which are located in two different centers. Brown's method tries to create a linear equation. Estimates include single and double smoothed constants. Therefore, the method takes its name from this feature 15 .

Statistical Analysis
In modeling, cumulative cases, cumulative deaths, daily cases, daily deaths, cumulative recovered and active cases, which are indicators of the COVID-19 outbreak, were used. The models with the most successful results were selected and 10-day forecasts after the modelling were calculated.
In the modelling process, autocorrelation values were calculated in the data and seasonality and trend of the data were examined. The largest R-square value and the smallest RMSE and Normalized BIC values were used for the model performance.

RESULTS
The results of the China is very important for many countries because the COVID-19 outbreak first began in China. In addition, China has caused the outbreak to spread to many countries and has been fighting this epidemic for 3 months. Also, as of the end of March, it is the fifth country with the highest number of cases and total deaths in the world When the modeling results of China data were investigated considering six different indicators, It was seen that the first four indicators were modeled with high success, and sufficient success was reached in the modeling of cumulative recovered (Table 2). However, there was no suitable time series model that successfully models active cases. (R-squared < 0,50). The 10-day forecast results after the model building date for the indicators modeled with high success were given in Table 3. When the table was evaluated, it was observed that the number of cases did not change much except recovered cases in the first half of April in China.  Estimates of different indicator data obtained from China are shown in Figure 1. From the graphs, it was seen that there was no significant change in the number of cumulative cases and the number of cumulative deaths as of the end of March. But the number of cumulative recovered was seen to increase. Daily cases and daily death were close to the minimum level.

Figure 1. Predicting and forecasting of indicators for China
In 11 out of 24 countries that detected the first Coronavirus cases in the period of 16-31 January, the total number of cases was over 1000. So the data of the indicators used in the prediction of the outbreak process were modeled and the results are presented in Table 4 and Table 5. It was seen that the activated cases can be modeled successfully in 6 out of 11 countries, and there was no time series model that produces a successful forecast for other countries (Table  4, R-squared < 0,50). In addition, success in modeling the number of cumulative recovered numbers in several countries was between 70% and 85%, while the prediction success of models for other indicators was calculated around 99%. Italy was the second country with the highest number of cases in the world and the country with the highest number of deaths as of the end of March. In general, it was observed that the increase in indicator results continues, but the prevalence in the daily cases and daily death was slower than other indicators when Table 5 and Figure 2, which include the results of the forecast for the next 10 days, were examined.

Figure 2. Predicting and forecasting of indicators for Italy
Spain was the 2nd country with the highest total number of deaths and the 4th country with the total number of cases. The forecast values for 10 days after modeling were presented in Table 6 and Figure 3. It can be seen from the graphs that the number of cases and deaths continues to increase.  South Korea was a country that has controlled the number of cumulative cases and cumulative deaths since the outbreak. The number of new deaths per day has never exceeded 10. After modeling the obtained data, the 10-day predicted results were given in Table 7 and Figure 4. It was predicted that the changes in the indicators will not change in the first half of April when the table is evaluated.  When the 10-day forecast values given in Table 8 and Figure 5 were examined, it was seen that there will be an increase in the indicators similar to the current days and the cumulative death in the cumulative case is very low (1%).  The results for France were summarized in Table 9 and Figure 6. The increase rate in the analyzed indicators continues. In the first half of April, the cumulative recovered prevalence in the cumulative case were expected to be around 14%. The mortality rate in positive cases is expected around 6%.  Figure 6. Predicting and forecasting of indicators for France USA data results were given in Table 10 and Figure 7. The increase rate in the analyzed indicators continues. In the first half of April, the daily case prevalence in cumulative case was expected to be 12% and the cumulative deaths prevalence is around 2%.  The results of UK data were given in Table 11 and Figure 8. The rate of increase in the indicator values examined was similar in the first week of April. In this period, the cumulative recovered prevalence was quite low. The cumulative active case prevalence was found to be quite high (90%).

Figure 8. Predicting and forecasting of indicators for UK
When Canada data was evaluated, it is seen that the number of daily deaths is estimated to be quite low (Table 12 and Figure 9). The cumulative recovered prevalence was expected to be around 7% and the cumulative active case prevalence was around 87% in the first week of April.  Figure 9. Predicting and forecasting of indicators for Canada When Sweden's data was evaluated, it was predicted that the new daily case prevalence will be around 5% in the first week of April, and the daily mortality and recovered prevalence will still be at a very low level. In addition, cumulative death prevalence in cumulative case was estimated at 4% levels (Table 13 and Figure 10).  Figure 10. Predicting and forecasting of indicators for Sweden At the end of the modeling for Australia data, it was predicted that the number of daily deaths will be very low in the first week of April, and the cumulative death prevalence will be observed at a very low level in the cumulative case. In addition, daily new cases and cumulative recovered prevalence were estimated to be around 4% (Table  14 and Figure 11).

Figure 11. Predicting and forecasting of indicators for Australia
When Malaysia results were analyzed, it was predicted that the number of daily deaths will be very low and the cumulative death prevalence in the cumulative case will be 1% in the first week of April. In addition, the daily case prevalence was estimated to be around 6% and the cumulative recovered prevalence around 15% (Table 15 and Figure 12). These results emphasized that Malaysia has shown successful results in struggle with the outbreak.

Figure 12. Predicting and forecasting of indicators for Malaysia
Model performance criteria obtained when Belgium data exposed to the outbreak between February 1-15 were modeled (Table 16). For Belgium, it was predicted that the daily mortality prevalence would be low in the cumulative case, the daily case prevalence would be around 15% and the cumulative recovered prevalence would be around 10% in the first week of April (Table 17 and Figure 13).  Figure 13. Predicting and forecasting of indicators for Belgium Iran and Israel, which were exposed to the outbreak between 16-20 February, model performance criteria obtained and the model results were given in Table 18. When Iran data is modeled, it was predicted that the daily mortality prevalence in the cumulative case will be low in the first week of April, the daily case prevalence would be around 6% and the cumulative recovered prevalence will be around 30% (Table 19 and Figure 14).  Figure 14. Predicting and forecasting of indicators for Iran When Israel data was modeled, it is predicted that there would be no new daily deaths in the first week of April, the daily case prevalence in the cumulative case would be around 10% and the cumulative recovered prevalence would be around 2% (Table 20 and Figure 15). The model performance criteria obtained when the data of 7 countries selected among the countries exposed to the outbreak between 21-29 February were modeled are given in Table 21.  It was predicted that the number of new daily deaths will be low in the first week of April, the daily case prevalence in the cumulative case would be around 6-10% and the cumulative recovered prevalence would be around 75% when Switzerland data was modeled (Table 22 and Figure 16).  Figure 16. Predicting and forecasting of indicators for Switzerland When the Netherlands results were analyzed, it was predicted that cumulative recovered and daily new deaths are very low in the first week of April, the daily case prevalence in the cumulative case would be around 8-10% and the cumulative active case prevalence would be reach around 92% (Table 23 and Figure 17). Table 23. Forecasting for 10 days in Netherlands Figure 17. Predicting and forecasting of indicators for Netherlands When the results of Austria data were evaluated, it was predicted that cumulative death, daily new death and cumulative recovered prevalence would be quite low in the first week of April. In addition, the cumulative recovered prevalence in the cumulative case was estimated to be around 2% (Table 24 and Figure 18). Table 24. Forecasting for 10 days in Austria On the other hand, the cumulative active case ratio in the cumulative case was estimated at 95% (Table 27 and Figure 21). Table 27. Forecasting for 10 days in Denmark  Table 29 shows the model performance criteria for data from two countries selected from countries exposed to the outbreak between 1-7 March. When Portugal's results were analyzed, it was predicted that daily new death and cumulative recovered prevalence would be quite low in the first week of April. On the other hand, the prevalence of daily cases in the cumulative cases was at most 15% and it was predicted that this prevalence will decrease gradually, but the cumulative active cases prevalence would be 95% (Table 30 and Figure 23). According to results, the cumulative death prevalence in Italy and Spain is the highest, followed by the Netherlands, France, UK China, Denmark, Belgium, Brazil and Sweden respectively. The daily new case proportion in cumulative cases can also be considered as an indicator of an increase or decrease in the severity of the outbreak. The highest values was observed in Belgium, USA, Canada, Poland, Ireland, Netherlands, France, and Israel, which vary between 10% and 12% and the lowest proportions was observed in China and South Korea. The daily deaths proportion in the cumulative case was low in many countries and similar. But it is lowest in Netherlands. The cumulative recovered proportion in the cumulative case is the important criteria for outbreak. This proportion was the highest in China, followed by South Korea. However, the highest value was in Switzerland (16.45%) among the ten countries exposed to the outbreak after February 21. The prevalence was found 1.75% for Turkey. Finally, the active case proportion in cumulative case was another criterion that can be used to evaluate the course of the outbreak. The predicted active case proportions are high in most of countries. This result shows that there are many people in the active disease period.

DISCUSSION AND CONCLUSION
The results obtained from the evaluations made with limited epidemiological data at the beginning of March or earlier have transferred some information to the present. As of the end of March, the longest 3.5 months and the shortest 1 month data are available and their evaluation is of great importance. The results of this study gives an important information for the countries struggling with the outbreak in less than one month or who have not yet been exposed to the outbreak. Also future models that can be simulated against similar risks will be obtained. It was predicted that the outbreak in China was almost under control as a result of the models obtained with this study. The number of cases and deaths has increased rapidly in Italy and the country could not control the outbreak since the outbreak started. However, it was seen that there was a relatively increase in the proportion of the recovered. Both the rapid case increase to date and the increase seen in future forecasts show that Spain has not yet been able to control the outbreak. The new case and recovered case proportions were higher than Italy. It is seen that there has not been a significant decrease in the severity of the outbreak, although it has been 2.5 months since Italy and Spain announced the first case. South Korea, which started to fight the outbreak in the same period, was the most successful country in terms of results, followed by Germany. The results in Iran was better than Israel, which is in the same period, and France that the outbreak started 1 month ago. The results of Turkey was similar to Germany and Ireland.
The differences and advantages of this study from the modelling studies published before can be summarized as follows.
• Modeling and comparative analysis of real-time and retrospective data of 25 countries with a total number of cases exceeding 1000 among infected countries until 10 March, • The number of data of the most countries has reached sufficient size to develop a good model • It is also the evaluation of trend and seasonal effect in time series models.