Unbiased Modified Two-Parameter Estimator for the Linear Regression Model

This study centers on estimating parameters in a linear regression model in the presence of multicollinearity. Multicollinearity poses a threat to the efficiency of the Ordinary Least Squares (OLS) estimator. Some alternative estimators have been developed as remedial measures to the earlier mentioned problem. This study introduces a new unbiased modified two-parameter estimator based on prior information. Its properties are also considered; the new estimator was compared with other estimators’ Mean Square Error (MSE). A numerical example and Monte Carlo simulation were used to illustrate the performance of the new estimator.


Introduction
The linear regression model is expressed as: where Y is an n×1 vector of observations on the dependent variable, X is an n×p matrix of the predictor variables, β is a p×1 vector of unknown regression coefficients, e is an n×1 vector of random error with e i ~ N(0, ). The Ordinary Least Square (OLS) estimator of β is given as: where ̂ is a p×1 vector of unknown regression coefficients, ( ) is a p×p orthogonal matrix, ( ) is a p×1 vector and ̂~ N(β, 2 ( ) -1 ). The OLS estimator is unbiased and possesses minimum variance among other estimators. However, one of the notable limitations of this estimator occurs when the predictor variables are highly correlated. This is termed multicollinearity, in the presence of which the OLS becomes unstable and gives misleading regression results. Several biased estimators have been proposed in the literature to overcome the problem of multicollinearity. Hoerl et al. [1] proposed the Ridge Regression (RR) estimator.
where k is the ridge parameter (or biasing constant) and 0 k 1.
Amin et al. [2] proposed the modified ridge regression estimator based on prior information. The estimator is given as: where b is prior information on . The OLS is a special case of this estimator when k = 0. It tends to b as k tends to infinity.
Liu [3] proposed an estimator to overcome the limitations of RR estimator. He combined the benefit of both the estimators given by [1] and [4]. It is given as: where 0 < d < 1. Dorugade [5] and Dorugade [6] introduced a modified two-parameter estimator. The estimator is given as: where k > 0, 0 < d < 1. This estimator is a general estimator which includes the OLS and RR estimator as special cases, when k = 0 or d = 0, it gives the OLS estimator and when d=1, it gives the RR estimator.
Although these estimators solve the problem of multicollinearity, they are biased estimators. Unbiased estimators have also been proposed by some researchers. The major advantage of unbiased estimators over biased estimators is that they produce unbiased estimates with minimum variance.
Crouse et al. [7] proposed an unbiased ridge estimator with prior information J. The estimator is defined as: where J ~ N( ,( ) ) and J is uncorrelated with ̂ Amin et al. [8] proposed an Almost Unbiased Two-Parameter (AUTP) estimator. AUTP estimator was compared with OLS estimator and Two-Parameter (TP) estimator based on MSE criterion. The estimator is given as: Wu [9] introduced an Unbiased Two-Parameter (UTP) estimator with prior information based on the Two-Parameter (TP) estimator was by. This is defined as follows: where ( ) ( ) (11) with J being uncorrelated with ̂ and J~N( ( ( ))( ⁄ ) ) and S = In this study, a new estimator referred to as an "Unbiased Modified Two-Parameter" (UMTP) Estimator to minimize the effect of multicollinearity in a linear regression model is introduced. The article is organized as follows. In section 2, the new estimator is proposed, and its properties are obtained. The proposed estimator is compared with other two-parameter estimators using the MSE criterion in section 3.

The new estimator and its properties
Dorugade [5] introduced a Modified Two-Parameter (MTP) estimator, which was earlier defined in equation (7) and 0< k <1, 0 < d < 1. Considering the convex estimator below: where C is a p×p matrix, I is a p×p Identity matrix and ̂( ) is an unbiased estimator of . We define the new estimator, the Unbiased Modified Two-Parameter (UMTP) Estimator, based on prior information as follows: where ( ) and J~N(β, 2 (kdI) -1 ) for k > 0, 0 < d <1.

Determination of the variance of J (Var(J))
Recall from equation (12); ̂( ) ̂ ( ) Note that R kd in the proposed estimator corresponds to C in the convex estimator. Therefore, for the proposed estimator, ( ) ( ) ( ) .

Comparison of the proposed estimator with other existing estimators based on MSE vriterion
which is a non-negative definite matrix for k > 0 and 0 < d < 1. Thus, according to Lemma 1 ̂( ) is superior to ̂ .

Selection of bias parameters k and d
The bias parameters k and d are used in estimating two-parameter estimators. They are very important and crucial as they play a vital role in controlling the regression bias towards the mean of the response variable [10].
To examine the performance of the new proposed estimator over other existing estimators, the bias parameters k and d are chosen. In this study, we choose and where parameters k and d are ridge parameters and constant

Numerical example
The proposed estimator will be illustrated using Portland cement data that exhibit multicollinearity, where multicollinearity is a statistical concept with several independent variables in a model correlated. Multicollinearity occurs when two or more independent variables are highly correlated with one another in a regression model. This means that an independent variable can be predicted from another independent variable in a regression model. Multicollinearity generally occurs when there are high correlations between two or more predictor variables. In other words, one predictor variable can be used to predict the others. This creates redundant information, skewing the results in a regression model; for example, correlated predictor variables are also called multicollinear predictors.
Portland cement data set that was initially used by [12]. We computed the regression coefficient and the MSE of the proposed unbiased modified two-parameter estimator ̂( ) and also that of the following estimators ̂ , ̂

Simulation study
In order to investigate the performance of the proposed estimator, the MSE of the proposed estimator is compared with those of some existing estimators. The simulation process uses a linear regression model with fixed independent variables such that there exist different levels of multicollinearity among the independent variables. Considering the regression model: Y i = β 0 + β 1 X 1i + β 2 X 2i + β 3 X 3i + ……+ β p X pi + (33) where i=1, 2,…, n and p=3, 6. The independent variables were generated by the simulation process used by [2,13,14] and [15][16][17][18] as follows: , 2,…, n and j = 1, 2,…, p where are the generated independent variables, is the correlation between any two independent variables, are random numbers from standard normal distribution and p is the number of independent variables. In this study, we take p = 3, 6 and = 0.8, 0.9, 0.95, and 0.99.
The parameter values were chosen such that =1, which is a common restriction in simulation studies of this type [19][20][21][22][23][24][25]. The data set are simulated with sample sizes n = 20, 50, 100 and = 1, 5, 10. The process is replicated 2000 times. We obtained the estimated MSE values of the following estimators OLS, MRR, MLIU, MTP, TP, UTP, and the proposed UMTP, respectively. Their respective MSE is obtained by the following computation.
where ̂ is the estimate of the i th parameter in the j th replication and is the true parameter value. The estimated MSE values for different combinations of the n, p, and are presented in Tables 2 and 3. It is observed in Tables 2 and 3 that the new proposed two-parameter estimator ̂( ) is superior to ̂ and other two-parameter estimators such as ̂ , ̂ , ̂ , ̂ , ̂ because it possesses minimum MSE when compared to others. The new proposed estimator performs better for a different number of independent variables and various levels of correlation among independent variables ( ). It also performs better when n is small (n = 20) and for various combinations of variance ( ) of the error term. The new estimator is an unbiased estimator that overcomes the problem of multicollinearity and can be used in the place of other estimators considered in this study.

Conclusion
A new estimator is proposed, called the Unbiased Modified Two-Parameter (UMTP) estimator, based on prior information to minimize the effect of multicollinearity for the linear regression model. A Monte Carlo simulation study across different combinations of d, k, n, p, and are carried out, and the MSE criterion was used to examine the performance of the new estimator over the OLS and other existing two-parameter estimators reviewed in this st udy. Real-life data with multicollinearity problems were also used to evaluate the performance of the new estimator. It was observed that the newly proposed Unbiased Modified Two-Parameter (UMTP) estimator performs better than the existing estimators in the presence of multicollinearity.