Population forecasts for Bangladesh, using a Bayesian methodology.

Population projection for many developing countries could be quite a challenging task for the demographers mostly due to lack of availability of enough reliable data. The objective of this paper is to present an overview of the existing methods for population forecasting and to propose an alternative based on the Bayesian statistics, combining the formality of inference. The analysis has been made using Markov Chain Monte Carlo (MCMC) technique for Bayesian methodology available with the software WinBUGS. Convergence diagnostic techniques available with the WinBUGS software have been applied to ensure the convergence of the chains necessary for the implementation of MCMC. The Bayesian approach allows for the use of observed data and expert judgements by means of appropriate priors, and a more realistic population forecasts, along with associated uncertainty, has been possible.


INTRODUCTION
A widely-used method of forecasting the age-and sex-specific population for future years, in which the initial population is stratified by age and sex and projections, is generated by application of survival ratios and birth rates, followed by an additive adjustment for net migration. To get this information, the behaviour of the related variables is analyzed based on the past data by statisticians, and then inferences are drawn from the analysis to make forecasts of the desired variable. At present, there exist two major paradigms in statistics, namely conventional (frequentist) and Bayesian statistics for the purpose of data analysis. Use of Bayesian methodology in the field of data analysis is comparatively new and has found massive support in the last two decades from the experts belonging to various disciplines. Probably, the main reason behind the increasing support is its flexibility and generality that allows it to deal with the complex situations. Besides, Bayesian method is typically preferred over classical approach in parameter estimation because of the intractable form of the likelihood function (1).
There are a number of methodologies used for population projections. One of the most popular methods is cohort component method which is based on the estimates about the future levels of fertility, mortality, sex composition, migration, and other parameters. Many studies have examined the relative performance of simple mathematical models, extrapolation based on time-series and cohort-component models of population forecasting. Most have found that constant growth mathematical models or standard time-series models of population growth are as least accurate as cohor component models (2)(3)(4).
The present study is not intended to assess the relative accuracy of various projection models. Rather, it only aims to investigate the usefulness of cohort component method in making the population projection for Bangladesh, using Bayesian approach. Bayesian analysis has been applied in cohort component model for providing a neat and transparent way of estimation. It provides probabilistic point estimates of the parameters, along with the highest posterior density interval (HPD) or Bayesian cred-ible interval. Bayesian credible interval is a measure of uncertainty, and it is based on statistical theory and data on error distributions that provide an explicit estimate of the probability that a given range will contain the future population. This approach develops statistical prediction intervals to accompany population forecasts (5)(6)(7). Prediction intervals will provide extremely valuable information to data-users and will improve the quality of decisionmaking, based on population forecasts.

LITERATURE REVIEW
A cohort component strategy of population projection is based on the logic of a general population-component methodology which examines separately the components of population change, fertility, mortality, and net migration. The cohortcomponent model of population projection (CC-MPP) is perhaps the iconic method in demography (8)(9)(10)(11)(12)(13)(14)(15)(16). This classic method forwards, in time, a population defined by age according to a specified life table and set of age-specific fertility rates, taking into account the net migration at each age. A very basic equation can show the whole model:

P(t+n)=P(t)+Births−Deaths+Immigrants−Emigrants
where, t is the starting point of time; n is the projection interval; P(t) is the population-size at time t; and P(t+n) is the population size at time t+n. If we put immigrants and emigrants together, then we get:

P(t+n)=P(t)+Births−Deaths+Net Migrants
where, Net Migrants=Immigrant−Emigrants. A population grows through the addition of births and inmigrants and declines through the subtraction of deaths and out-migrants.
The term 'fertility' refers to the ability of an individual to give a livebirth (or births). This is equally applicable to a group or an entire population. Agespecific fertility rates are required to project the number of births in future fertility projections, which are made by projecting the course of TFR over time and translating this total fertility rate into age-specific fertility rates. In general, the projection of TFR is divided into assumptions regarding a level at which fertility eventually becomes constant in a country or a region and the path taken from current to eventual levels. Once fertility reaches its eventual level, the population will reach a stable agestructure and constant growth rate assuming that mortality and migration rates are also fixed. If the eventual fertility level is at replacement level and net migration is zero, the growth rate will eventually be zero. Both projected pace of fertility decline and the assumed eventual fertility level are important for determining trends in population-size and age-structure. The lower the assumed eventual fertility level, the more important the pace of fertility decline becomes to projected population-size (17).
Births in cohort component models are typically projected by applying projected age-specific birth rates to projections of the female population by age. In this approach, the size and age composition of the female population of childbearing ages have a major impact on the projected number of births. Since most mothers for the first 25 years of the projection period are already alive at the time the projection is made, the size and age composition of the female population are the most predictable elements in short-term fertility projections.
Time-series techniques have been used for projecting births or birth rates. Several authors have applied time-series methods by themselves, using autoregressive integrated moving average (ARIMA) methods to forecast total births (18)(19)(20). While these efforts yielded some insights into the use of time-series methods on fertility, the forecasts ignored the advantage of using cohort component methods (21). This omission was partially remedied by Lee (22)(23) who applied time-series methods to TFR, the sum of all age-specific rates that occur in a given year. In our study, we have applied the Gompertz model, using Bayesian methodology to TFR.
The representation of mortality data via a parametric model has attracted the attention of actuaries, demographers, and statisticians for over a century. One of the most common models is that of logistic curve (13). In this paper, we adopt a Bayesian analysis to this curve, using MCMC technique to produce the posterior summaries required. For other Bayesian work relating to mortality smoothing and life-table construction (24-25), Carlin (26) used MCMC methods but not in a parametric curve modelling context. Table 1 provides TFR in Bangladesh from 1991 to 2001, which have been used for a fertility model fit to making future fertility projections. Using these data and the Gompertz growth model, a WinBUGS program has been developed to make a Bayesian analysis of the data and to provide projections of sis, we need to provide prior distributions to all the parameters a, b, c, d and τ. A massive discussion on the choice of priors is also available in the BUGS manual (28).

MATERIALS AND METHODS
Mortality projections are based on projecting future life-expectancy at birth for males and females, defined as the average lifespan of a child born today if current age-specific mortality levels were held fixed in the future. In developing countries where mortality remains high, future life-expectancy will be determined by the effciency of local health services, the spread of traditional (e.g. malaria) and new (e.g. AIDS) diseases, and the general standards of living and education. In this paper, we avoid the new epidemics (AIDS).
The life-expectancy at birth (average number of years lived by a newborn baby if he/she follows the current age-specific mortality patterns) is projected on the basis of the past experience of increase in the life-expectancy at birth. A logistic curve has been fitted using trends in life-expectancy at birth, and it assumes that increase in life-expectancy at birth follows an S-shaped curve. The logic behind using logistic curve is that when the life-expectancy at birth is very low, the increase is expected to be slow due to poor health facilities. Once the health facilities are provided and with improvement in socioeconomic conditions, the life-expectancy increases at a faster rate. At the higher level of life-expectancy, the rate of increase is slow, and it would stabilize at the biological maximum. To project the population from one year to the next, survival rates by age and sex are needed and, to obtain future survival rates, future life tables may be constructed. Model life tables developed by United Nations (29) , and a logistic growth model is used. In this model, the life-expectancy at birth Q ij in the year t i has been assumed to follow the TFR of Bangladesh. In this paper, we follow the time-series tradition in developing a method to forecast TFR and then convert it to the age-specific fertility rates on the basis of base-year age-specific fertility rates. Multiplying these forecasts by forecasts of the size of the age-specific female population would then yield fertility forecasts derived from both time-series and demographic cohort component traditions. In this way, the advantages of the demographic tradition in taking account of the predictability of the size and age composition of the female population can be combined with the more statistically-rigorous time-series techniques of modelling the short-term variability of the agespecific fertility rates.
Let Y i to denote TFR in Bangladesh in the year t i (i=1, 2, ..., 11) where i refers to successive censuses starting from 1991, for which i=1 and the data are given in Table 1. The most famous growth model is that of Gompertz (27) and is used for TFR where TFR Y i in the year t i has been assumed to follow normal distribution with respective means h i and common precision τ. Non-informative priors have been assigned to all the parameters of the model. The nonlinear regression model for TFR is described as: where h i is the deterministic part, and e i is the disturbance part; assuming the disturbance to be e i~i id N (0, τ), where τ is the precision (=1/variance), the fertility model and the non-informative priors might be defined as: where d is the lower asymptote, c is the upper asymptote, b is the rate at which the fertility increases, and a is the parameter that determines the shape of the Gompertz curve. For Bayesian analy-Bayesian inference can be found elsewhere (31)(32)(33)(34)(35).
As an iterative tool, the MCMC methods are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution (33).
MCMC tool has been used in the WinBUGS to obtain the posterior distribution of the unknown parameters in the model. In the process, we need to run a number of chains for each parameter for a long time. When the chains have run sufficiently large number of iterations and have reached the stationary distribution, the samples obtained by further running of the chains are supposed to be drawn randomly from the posterior distribution of the parameter. WinBUGS provides a number of inbuilt diagnostics to assess the convergence of chains. For a more formal approach to convergence diagnosis, the software also provides an implementation of the techniques described in Brooks and Gelman (34), and a facility for outputting monitored samples in a format that is compatible with the CODA software (36).
In practice, WinBUGS allows multiple chains for each parameter to run simultaneously. Running multiple chains is a way to check the convergence of MCMC simulations. Two chains have been set in the model of this problem. When the diffierent chains do not provide sufficient mixing of chains even after a long run, it will be an evidence of lack of convergence of the chains. Once we are convinced that chains have been converged through the diagnostics, we will need to run the simulation for a further number of iterations to obtain samples that can be used for posterior inference. The more samples we save, the more accurate will be our posterior estimates. Once we have run enough updates and are satisfied with the history of the chains, we discard the earlier samples. We obtain the summary statistics only from the samples generated afterwards.

RESULTS
The summary statistics of the estimated parameters of the fertility model after 10,000 initial updates were discarded and 80,000 updates were run after the initial burn-in is presented in Table 2. During these updates, none of the diagnostics indicated any symptom of non-convergence of the chains. The number of iterations required to run after the convergence of the chains is assessed on the basis of Monte Carlo error (MC error) for each parameter. MC error is an estimate of the difference between the mean of the sampled values (which we are us-normal distribution with respective means p ij and common precision τ j . Non-informative priors have been assigned to all the parameters of the model. The non-linear regression model for the population growth is described as: where p ij is the deterministic part, and ε i is the random error part; assuming the error to be ε ij~N (0, τ j ) where τ j is precision, the mortality model and the non-informative priors are: where q 1 is the upper asymptote, q 4 is the lower asymptote, q 2 and q 3 are the other parameters that define the shape of the logistic curve, and e is the base of the natural logarithm.
Future international migration is more difficult to project than fertility or mortality. Migration can be volatile since short-term changes in economic, social, or political factors often play an important role. In addition, projections are generally based on past trends and current policies since no single, compelling theory of migration exists; however, data on historical migration are sparse for Bangladesh. In this work, we assumed that the population is closed, i.e. no migration takes place, or even if it does, net effect is zero.
As for the sex ratio at births which divide the future number of newborns into male and female, the female to male ratio is set at 100:105 based on the results of the last five years, and it remains consistent from 2001 onward.

Diagnostics
Bayesian approach faces serious computational difficulties due to likely involvement of complicated mathematical expressions in the posterior distributions. Many of these have been suitably addressed with greater ease, using MCMC methods. These methods enable us to carry out analysis on a wide range of Bayesian statistical models. More details with examples of the MCMC implementation in ing as our estimate of the posterior mean for each parameter) and the true posterior mean.
It has been suggested by the WinBUGS manual as a rule of thumb that the simulation should be run until the MC error for each parameter of interest is less than about 5% of the sample standard deviation, and this was followed in our analysis. From   Table 2, it is obvious that MC errors for each parameter were less than 5% of the sample standard deviation.  Table 3 presents the summary statistics of estimated parameters of the mortality model on life-expectancy at birth for both males and females after discarding 10,000 initial updates and 70,000 updates were run after the initial burn-in. During these updates, none of the diagnostics indicated a symptom of non-convergence of the chains. While running our model with the WinBUGS, we have monitored five nodes q 1 , q 2 , q 3 , q 4 , and τ.
The graphical presentation of the models fitted and forecasted to both males and females are depicted in Figure 2

DISCUSSION
The final calculations of cohort component method combine the results from the mortality, migration, and fertility modules. On the basis of the future forecasts of population growth components, the forecasted population of Bangladesh from 2006 to 2051 has been presented in Appendix.
The present study was an attempt to show the application and suitability of the MCMC tool in Bayesian data analysis for fitting population data and making projection of the future population, using cohort component model. The use of Bayesian approach in fitting the components of growth models allows for further extensions over classical estimation methods, leading to a more realistic forecasts and associated uncertainty measures. The cohort component population projection method follows the process of demographic change and is viewed as a more reliable projection method than those that primarily rely on census data or information that reflect population change. In this paper, we had been presenting the basics of the implementation of the Bayesian data analysis with an illustration of the population projection. We have not performed the sensitivity analysis taking different prior distributions mainly because the selected priors were non-informative. These priors did not provide substantial information to the posterior distribution. However, they were necessary for the implementation of the Bayesian data analysis.

Limitations
In this study, we are unable to provide future forecasts for the component of migration because of sparse data for Bangladesh. To overcome this problem, we have used a strong assumption, and this is the major drawback of our study. Apart from this shortcoming, the total fertility rate has declined to replacement level in 2010 and afterwards, which is unrealistic for Bangladesh but it is evident from Figure 2 and 3 that the mortality component has fitted very well. In both fertility and mortality models, we have applied non-informative priors, and it is also a limitation of this study. We hope to further explore these areas in future, using Bayesian methods motivated by the augments provided throughout this paper.

Conclusions
Utilizing Bayesian methods to the growth components, a more realistic summary in population forecasts has been produced because it allows formal incorporation of expert judgement embodied in priors and, hence, alter the forecasted population characteristics and their levels of uncertainty. In this paper, we have applied non-informative priors to fertility and mortality models and, thus, a large level of uncertainty in the forecasted population is resulted. This level of uncertainty could be reduced through the inclusion of informative priors. Moreover, informative priors based purely on expert opinions regarding the future of population growth rates could have been included. Such prior information would result in further reductions in the estimated uncertainty due to added information in the parameter estimation and model-choice procedures.