Weighting National Survey Data in Bangladesh: Why, How and Which Weight ?

Background: Weighting of national survey data enables the sample to be more representative of the target population. Weighting procedure is a thorough exercise and yields several types of weights. However, considerable variation exists among authors on which weight to use leaving the researchers baffled. As a result, survey data are often used by researchers without the weights leading to erroneous conclusions. In addition, despite availability of powerful yet costly statistical software researchers from developing countries are mostly unable to use those due to high cost. In this article, we share our experience on weighting for recent national surveys in Bangladesh using Microsoft Excel. Objectives: Overall objective was to perform sample weighting of a national survey of Bangladesh using Excel. As specific objective, the study was aimed at creating different weighting variables, describe their features and identify the appropriate weight to be used for analysis. Methods: We generated four types of weights: the base weight calculated from probabilities of selection, and non-response adjusted, population calibration adjusted, and trimmed weights. We compared the distribution of the population by sex and age by unweighted and four types of weighted numbers. Finally, we calculated weighted means, medians, ranges, standard errors, confidence intervals, variances, multiplicative effects and design effects with these four weights. In addition, we compared the weighted prevalence of a key variable of the survey using these four weights. Results: We compared unweighted distribution with weighted ones and identified that weighting makes the sample distribution to conform to the target population. Among the four calculated weights, the trimmed weight had narrow standard error and variance, and smallest design and multiplicative effects. It yielded an acceptable prevalence and distribution of prevalence of mental disorder. Conclusion: Among the four weights, we show that the trimmed weight met all parameters of good quality and precision. We performed this complex exercise using Microsoft Excel which is largely available to researchers in Bangladesh. Therefore, we recommend using the trimmed weight for national level surveys in Bangladesh in a similar context.


Introduction
Sample survey is one of the most important methods of collecting health data that can draw conclusion on a reference population. 1 However, accurate inference cannot be drawn without treating the sample data. 2 Weighting corrects the imperfections in the sample that prevents bias and other differences between the sample and the reference population. 3 In complex sample surveys four types of imperfections emerge from unequal probabilities of selections, multistage selection, stratifying sample into the reporting domains, and non-responses. 4 Ignoring these will lead to incorrect inferences in a survey.
Though sample survey can draw conclusion to a reference population, the results may be influenced by sampling and non-sampling errors. 1,5 Among the non-sampling errors, non-response -both unit nonresponse and item non-response -is addressed rigorously through weighting. 6 Adjustment of the nonsampling error can be different depending on the data collection technique of the survey -digital and paper-pencil. 7 Digital data collection has a well-ordered method to adjust non-sampling error compared to penpaper based one. 8 Several recent national level household (HH) surveys in Bangladesh used digital data collection tools. [9][10][11] In addition, weighting adjusts the weighted sample distribution for key variables of interest (for example, age, race, and sex) to make it conform to a known population distribution. 3 Production of design-unbiased estimates of parameters of interest is possible by applying proper weights. 12 Thus weighting procedure is a critical step after the survey data have been collected and all the essential steps of data processing have been completed. 13 However, there is no universally held protocol for calculating weights. The aim of the weighting procedure is to calculate the 'final weight' starting with base weight, and non-response, population calibration and trimming adjustments.
Testing variability of the calculated non-response rates and weight is an important step of generating acceptable weights. 9 High variation in weights can lead to some observations having too much importance leading to distortion of results. 14 In addition, if the sampling design is not informative, using the weights should not introduce any significant differences in the estimates. 12 In addition, if the sampling design turns out to be informative, the use of weighted estimators will produce "better" results. 12 Additionally, a trimming procedure can be applied and the process varies between researchers. 15,16 Weighting itself to some researchers is like a black box. 13 Handling different types of weights and which to use sometimes lead researchers confused. In addition, although there are many powerful statistical softwares available for complex sample analysis, yet these are used much less in the financial context of a developing country like Bangladesh. As a result survey data are often used by researchers without the weights leading to erroneous conclusions. 12 Given the influence weights have on survey results, it is important that researchers understand enough about weighting process to be discerning users of the survey data. 13 In this article we briefly described the weighting process, our approach in identifying which weight to use and explain the reason behind selecting one. So far, there is no scientific article on weighting of national level survey data by researchers from Bangladesh and this article is the first of its kind in the context of performing the exercise using Microsoft Excel.

Materials and Methods
Brief overview of the weighting procedure: A detailed step-by-step procedure of weighting is described elsewhere. 17 Following is a brief description: The non-response adjusted base weight is calculated by multiplying the three non-response factors with the base weights successively.

Population calibration:
The goal was to bring weighted sums of the sample data in line with the corresponding age-sex matched counts of target population. 20,21 Initially projected population is estimated (not described) -if recent population data unavailable -then population calibration factor (r) is calculated.

Calculating post stratification adjustment factor (r)
The population calibration factor is calculated by division, residence, gender and the five age groups resulting in 160 (8 x 2 x 2 x 5) adjustment cells. The post-stratification adjustment is calculated as: Population calibrated weights are calculated by multiplying the non-response adjusted base weight with population calibration factor.

Trimming of weight:
We applied this procedure on the population calibrated weight. 13 Initially we identified the extreme weights and fixed a cut-off value. The weights above the cut-off value were trimmed and equally distributed among the nontrimmed weights repeating this till no weights were above the cut-off point. 15,16 We checked all the steps of calculations for the weighting process including the distribution of the weights specially taking notice of the extreme values and back-tracking these for possible errors.
Role of the funding source: No fund was required to undertake the exercise described in the manuscript.

Results
The The overall response rate of the survey was 90.4% (table I). 24 We calculated the four weights using the mathematical formulas mentioned in the method section. We used Microsoft Excel ("Excel") in Microsoft Office 365 bundle for this exercise.

'Base weight' calculation:
Probabilities of selection of PSUs, HHs, sex randomization and individual selection probability were taking into consideration. This is applied to 16 strata comprised of eight divisions and two residence strata. This procedure yielded a total base weight value of 79 422 102 (table II).

Non-response factor calculation:
i. PSU-level non-response factor: This is also calculated for 16 domains. The value for the PSU non-response is essentially '1' for all the 15 domains except in the one domain where on PSU was dropped. The mean PSU non-response is 1.0008.
ii. HH non-response factor: This is calculated in all 496 PSUs. The mean PSU nonresponse is 1.1002.
iii. Person non-response factor: The mean PSU non-response is 1.1002.
When this base weight is adjusted with the nonresponse weights, the adjusted base weight stands at 92 569 866 (table II).
Calculating the projected population from census data: The total projected population calculated for adults aged 18 years or more is: 102 161 911. We accommodated for the change in division number from seven to eight (table II). 25 Population weight/ Calibration (r): This is calculated in 160 domains: eight divisions, two residence strata, two sex strata and five age groups. In each of the domains the sum of projected population in that domain is divided by the nonresponse adjusted base weights of that domain. The mean 'r' was 1.33 (table II).

Calculating population calibration and nonresponse adjusted weight:
This weight is the product of base weight; PSU, HH, and individual non-response factors; and population calibration factor. Here calculated adjusted weight was 102 948 678 (table II).

Trimming of weight:
In our exercise, we trimmed the non-response and population calibration adjusted base weight. We identified the median of the non-response adjusted and population calibrated weight to be 9 091.9. All weights above and below the 3.5 times median(15)(16) value of 31 821.7 and was set at that value. 15, 16 We trimmed any weight above 31 821.7 and fixed the weight at that value (table III).

Comparing the calculated weights:
A comparison was made between the distribution of the projected population with the unweighted sample to show the differences in distribution by age and residence. ( figure  2A and B). It is shown that the unweighted sample distribution is not similar to the population distribution. However, when we make the same comparison with weighted distributions with any of the four calculated weights, it shows that the distribution closely matches with that of population. The best match was achieved by the sample distribution weighted with population calibrated and trimmed weights (figure 2).
All the weights except the trimmed weight show a wide range denoting instability of the calculated weights. Sum of the calculated weights gradually increased from the base weight to the trimmed weights. The population calibrated weights and the trimmed weights thus stands at 100.8% of the projected population (table III). The distribution of the trimmed weight is more centrally oriented as is denoted by the difference between maximum and minimum, narrower standard error, confidence level than other weights. The multiplicative effect for the trimmed weight is 1.5 and the only weight which is less than 2. 9 We also checked the effect on the different weights on the prevalence of mental disorders according to the NMHS 2019. 11 The unweighted prevalence is 17.3%. Which is very close to the weighted prevalence (15.8%-16.8%). We also calculated the prevalence in urban-rural and male and female domains and found no notable difference. However, we observe that the prevalence of mental disorders tends to decrease from unweighted to weighted results. However, we think that this difference is negligible. We calculated the design effect of unweighted and weighted calculations. Though it is somewhat increased in the weighted results, we observe the lowest design effect for base weights (1.7) and trimmed weights (1.7) (table IV). * Calculated from non-response weights and population calibration adjusted base weights † Strata code contains divisional (two digits from left), residence (third digit from left), sex (fourth digit from left) and age group (last digit from right) codes ‡ Calculated by, base weight = 1/p1 x 1/p2 x 1/p3 x 1/p4, p1= primary sampling unit; p2=household; p3=sex randomization and p4=individual selection probabilities § PSU: primary sampling unit; HH: Household; Á Calculated by, Non-response adjusted base weight=base weight x non-responses weights (PSU x HH x Individual) ¶ Projected population of Bangladesh aged e"18 years is based on Census 2011(25) ** Population calibration factor = (projected population in a domain)/ (non-response adjusted weights in that domain) † † Population calibrated weight= Non-response adjusted weight x Population calibration factor ‡ ‡ Trimmed weight: calculated after trimming any weight of population calibration adjusted weights beyond 3.5 times median weight (31821.53) and set at that level. The additional weights trimmed is then equally distributed among the non-trimmed weights. This is run twice till no weights were more than 3.5 times median weight. * Calculated by, base weight = 1/p1 x 1/p2 x 1/p3 x 1/p4, p1= primary sampling unit; p2=household; p3=sex randomization and p4=individual selection probabilities † Calculated by, Non-response adjusted base weight=base weight x non-responses weights (PSUxHHxIndividual) ‡ Calculated by, multiplying the non-response adjusted base weight with population calibration factor. § All weights above 3.5 times median (31821.53) is set at that value and excess weights are equally distributed among non-trimmed weights and process is repeated till no weight is above the cutoff value. Á Percent difference from projected population (25) of 102 161 911 calculated as, (sum of weights = 100)/102161911 ¶ Multiplicative effect = 1 + (sample variance) / (mean weight) 2

Discussion
We calculated four weights: base, non-response adjusted, population calibrated and trimmed weights using Microsoft Excel. We presented here the weighting process from a recently conducted NMHS 2019 and tested those for quality. 9,11 It is claimed that weighting with base weight only is an efficient method as it is a simple one to  construct. 13,17 It may be completed after the mapping and listing activity before the data collection. It avoids the performing meticulous non-response calculations and the need for population projection estimation and calibration. Thus base weight can be used as the final weight for a survey when response rate is 90% or more. 24 Otherwise, calculating a non-response and population calibration adjusted base weight is recommended. 9 However, we generated all four weights despite survey response rate was acceptable and fresh census data unavailable.
In the NMHS Bangladesh 2019 data were collected though handheld computers and item non-response was absent. 8 However, the weighting procedure corrected the sample distribution for unit nonresponse. The compared to the unweighted distribution of sample, the weighted distribution were more reflective of population distribution and size.
Despite the sampling design with equal allocation of PSUs to urban-rural and male-female strata, the calculated weights corrected the sample distribution for variables like sex, residence etc. to make it conform to the population by distribution and size achieving one of the prime objectives of the weighting exercise. 3,15 The biasness induced by the design effect is also reduced by the small design effect in the weighted results. 25 Small design effect will help to estimate a smaller required sample size for future studies which is much needed in a low-resource country. 26 In addition, we calculated trimmed weights. 15 Though some authors do not recommend this procedure as it might induce inaccurate results by introducing a small bias. 14,27 However, it also greatly reduces standard errors. 24 The disadvantage of weighting data is reduced precision to some extent. 25 Some researchers worry about dealing with highly unequal weights which trimming might address thus improving precision. 15 In our study, the trimmed weight was stable and provided a favorable result as suggested by others. 14 We performed non-response adjustment without taking into account characteristics of each individuals rather than in a subgroup -a weakness noted by some authors. 28 We also used a basic calibration procedure to achieve the population adjusted weight for performing the exercise in Microsoft Excel using simple statistical techniques. 9 Despite recommendation of using expensive statistical software for weighting of data, we used the easily available and affordable Microsoft Excel for this exercise and generated weights through a simple stepby-step procedure. This has implications in increased applicability of weighting procedure for surveys to generate high-quality results in resource-limited setting like Bangladesh.
We tested these weights for compensation of nonsampling errors, variability and accuracy, and compared with population distribution. 9,13 It has been argued that even if weights reduce bias, they might largely inflate variance of estimates. 29 Though we encountered a little loss of precision overall in the process if base weight is used, this is gradually removed when we use the other weights. The results calculated using the trimmed weight was the most precise. Except for the trimmed weight, other weights had wider values denoting instability. In our data we showed that trimming procedure generated a weight that stroke a good balance between instability and accuracy. 30

Conclusion
Weighting compensated for the non-sampling errors and corrected the imperfections in the sample and prevented bias between the sample and the reference population in contrast to the unweighted sample. We found that the trimmed weight was the most acceptable among the four weights. The results generated by using the trimmed weights yields a more nationally representative, precise results and renders it comparable with other national data.