Power of t-test for Simple Linear Regression Model with Non-normal Error Distribution : A Quantile Function Distribution Approach

One of the major assumptions of the regression analysis is the normality assumption of the model error. We generally assume that the error term of the simple linear regression model is normally distributed. But in this paper g-and-k distribution is used as the underlying assumption for the distribution of error in simple linear regression model and a numerical study is conducted to see what extent of the deviation from normality causes what extent of effect on the size and power of t-test for simple linear regression model with the deviation being measured by a set a of skewness and kurtosis parameters. The strength of t-test is evaluated by observing the power function of t-test. The simulation result shows that, the performance of the t-test for simple linear regression model with g-and-k error distribution is seen to be vastly affected in presence of excess kurtosis and small samples (i.e. n<100).t-test is size robust under normal situation. Skewness and kurtosis parameter has a very little effect on the size of the t-test.


Introduction
Most of the statistical procedures such as t-test, tests for regression coefficients, analysis of variance, and the F-test of homogeneity of variance have a fundamental assumption that the sampled data come from a normal distribution.The assumption of normality in a statistical procedure requires an effective test of whether the assumption holds, or a vigilant argument showing that violation of assumption does not invalidate the procedure used.Much statistical research has been concerned with evaluating the effect of the violations of assumption on the true significance level of a test or the efficiency of the regression model.Hence, in this paper the extent of effect of non-normality on the power is investigated by varying the skewness and kurtosis parameter of g-and-k distribution and measured numerically.

The g-and-k Distribution
Rayner and MacGillivray [13] examined the effect of non-normality on the distribution of (numerical) maximum likelihood estimators.The g-and-k distribution [12] can be defined in terms of its quantile function as: ( ) where A and B > 0 are location and scale parameters respectively, g measures skewness in the distribution, k> -½ measures kurtosis (in general sense of peakness/tailedness) in the distribution and is the u th quantile of a standard normal variate, and c is a constant chosen to help produce proper distributions.It can be clearly observed that for g = k = 0, the quantile function in (1) is just the quantile function of a standard normal variate.
The sign of the skewness parameter indicates the direction of skewness; g < 0 indicates the distribution is skewed to the left, and g > 0 indicates skewed to the right.Increasing/decreasing the unsigned value of increases/decreases the skewness in the indicated direction.When g = 0 the distribution is symmetric.The kurtosis parameter k, for the g-and-k distribution, behaves similarly.
Increasing k increases the level of kurtosis and vice versa.The value k = 0 corresponds to no extra kurtosis added to the standard normal base distribution.However, this distribution can represent less kurtosis than the normal distribution, as k > -1/2 can negative values.If curves with more kurtosis required then base distribution with less kurtosis than standardized normal distribution can be used.
For these distributions c is the value of 'overall symmetry' (MacGillivray).For an arbitrary distribution, theoretically the overall asymmetry can be as large as one, so it would appear that for c < 1, data or distribution could occur with skewness that cannot be matched by these distributions.However for g ≠ 0, the larger the value chosen for c, the more restrictions on k are required to produce a completely proper distribution.Real data seldom produce overall asymmetry values greater than 0.8.We have used c = 0.83 throughout this paper.To examine extent of the effect of different level of non-normality on the test of conventional linear regression model we have considered that the random error belongs to the g-and-k distribution.

Robustness
Robustness occurs when the nominal and actual test sizes are not drastically different under slight model failure.When the validity of a certain test result is not extremely affected by poorly structured data, then the test is considered robust.So the technical definition of robustness is: if the actual Type I error of a test is close to the proclaimed Type I rate, say 0.05, then the test is said to be robust.In other words it is resistant against the violation of assumptions.

Regression Model
We have considered a basic regression model where there is only one independent variable and the regression function is linear.The model [14] can be stated as follows: where i Y is the value of the response variable in the ith trial, 0 β and 1 β are intercept and slope parameter, i X is the value of the independent variable in the ith trial, i ε is a random error term with mean Model ( 2) is said to be simple, linear in the parameters, and linear in the independent variable.It is "simple" in a sense that there is only one independent variable, "linear in the parameters" because of the sense that no parameter appears as an exponent or is multiplied or divides by another parameter and "linear in independent variable'' because this variable appears only in the first power.A model which is both linear in the parameter and the independent variable is also called first-order model.
The parameters 0 β and 1 β in regression model (2) are called regression coefficients.β does not have any particular meaning as a separate term in the regression model.

Simulation Technique
In this paper, we have considered simple linear regression model with one explanatory variable.As we know, in simple linear regression model, the error term i ε are normally distributed.But in this paper, we assumed the random error term i ε to follow the g-and-k distribution.We have observed the extent of non-normality on the size and power of test 0 H : slope = 0 by varying the skewness and the kurtosis parameter of the g-and-k distribution.Using the g-and-k distribution allows us to quantify how much the data depart from normality in terms of the values chosen for the g (skewness) and k (kurtosis) parameters.For g = k = 0, the quantile function for g-and-k distribution is just the quantile function of a normal variate.
In order to observe the power of the tests, expressions for the power curve are required.However, in practice it is not practical to obtain analytic expressions for these power functions.Instead, we have conducted a simulation to estimate these power functions for various combinations of the g and k parameter values for the error distribution from the gand-k distribution.In simulating size and power, we have considered a simple linear regression model of y with one explanatory variable x where the explanatory variable was generated from Uniform distribution.The values of the slope coefficient 1 β are to be varied from -4 to 4. The distribution of error is taken as g-and-k distribution where the values of skewness parameter g is taken from (-2, 2) and kurtosis parameter are -0.5, 0, 0.5, 1.While considering the distribution, the location parameter of the distribution A is taken to be 0, scale parameter B is taken as 1, constant c is considered as 0.83.The random variable z (0, 1) is a standard normal random variate.To get the power of t-test, we test 0 H : 1 0 β = against the alternative 1 H : 1 0 β ≠ .For each combination of g and k, we determine the number of rejections of the null hypotheses out of 10,000 times simulation for each value of 1 β other than zero and divide the total number of rejections by 10,000 which gives the power of the test.To get the size we generate data under the null hypotheses and repeat the above procedure.We have considered different combinations of g and k for different sample size n = 10, 20, 30 and 100.The level of significance is considered as 0.05 throughout the simulation.The above procedure is conducted by using statistical software R.

Size of t-test
The probability of rejecting a true null hypothesis is called type I error.It is also known as size of the test.First, we consider the effect of non-normality on the size of the test.For simulating, the size of t-test we generate explanatory variable x from uniform distribution and the random error ε from g-and-k distribution with location and scale parameters A = 0 and B = 1, respectively.Using statistical software R we generate data for sample size 10, 20, 30 and 100, and test 0 1 : 0.

H β ≠
The test statistic is: under 0 H where, se=standard error To determine the size of the test, we generate data under the null hypothesis and repeat the test 10,000 times and divide the total number of times the hypothesis is rejected by 10,000.Tests are carried out using a nominal size of α = 0.05.In Table 1 some simulation results are documented to see the effect of different level of non-normality on the size of ttest.For sample size 10, t-test is size robust under normal situation, but under non-normal situation, if the contamination is small there is a little effect on the size of the test.For sample size 20 and 30, we see that both the skewness and kurtosis parameter has very little effect on the size of t-test.For sample size 100, even in the case of non-normal situation, t-test is almost size robust.Table 1.Size of t-test for different combinations of (g,k) with varying sample sizes.

Power of t-test
Power of the test is an important and considerable matter.Powerful test gives better conclusion, so it is desired to use powerful test.The probability of accepting a false null hypothesis is called Type II error.The power of the test is the probability that the Type II error will not occur.
To simulate power, we test where, se=standard error.We generate data using ∈ 1 β (-4, -3.5, -3.0… 3.0, 3.5, 4) and repeat the test procedure 10,000 times.We first determine the number of rejections of the test out of 10,000 times for each value in the mentioned set and divide the total number of rejections by 10,000, with the level of significance α = 0.05.We generate power for sample size n=10, 20, 30, and 100 with various combinations of shape parameters g and k to see the extent of effect of skewness and kurtosis on the power of t-test.
Figs. 1 through 7 show the power curve of t-test with various combinations of (g,k) for sample size n = 10, 20, 30, and 100.In Fig. 1, for sample size 10, from the curves of (g,k) = (0,0), (0,0.3),(0,0.5),(0,0.8),(0,1), we see that for fixed g = 0 and varying the kurtosis parameter k in positive direction the power of the test decreases vastly than that of the normal data (g,k) = (0,0).The curves of (g,k) = (0,0), (0.5,0), (0.7,0), (1,0) shows that for fixed k = 0, and varying the skewness parameter g, the power of the t-test decreases slightly than that of normal data.The curves of (g,k) = (0,0), (0.5,0), (0,0.5),(0.5,0.5) and (0,0), (1,0), (0,1), (1,1) shows that increasing kurtosis parameter reduces the power more than that of the skewness parameter.In Fig. 2, from the curves of (g,k) = (0,0), (0,-0.2),(0,-0.3),(0,-0.5),we see the effect of negative kurtosis on the power and it is seen that when the kurtosis parameter increases in negative direction t-test gives better power than that of normal data.The curves of (g,k) = (0,0), (0.5,0), (0,-0.3),(0,-0.5)shows that negative kurtosis increases the power than that of normal data whereas varying the skewness parameter g decreases the power slightly.Fig. 3 also shows the same pattern of change in power in varying the skewness and kurtosis parameter for sample size 20.In Fig. 4, the curve of (g,k) = (0,0), (-0.5,0), (-1,0), (-1.5, 0), (-2,0) shows that when the values of g increases in negative direction the power turn down slightly than that of normal curve.Fig. 5 and Fig. 6 shows the change of power for different combination of (g,k) for sample size 30 in the same way.In Fig. 7, for sample size n = 100, the curves of (g,k) = (0,0), (0,0.3),(0,0.5),(0,0.8),(0,1), shows that the loss of power increases when the kurtosis parameter is increased in positive direction whereas the curves of (g,k) = (0,0), (0.5,0), (1,0), (1.5,0), (2,0) shows the t-test to some extent robust for varying the skewness parameter in positive direction.Fig. 7 shows that for large sample size i.e. n = 100 the difference between the power in normal and non-normal situation decreases to some extent.7, it is clear that t-test gives better power than that of the normal situation when the data have negative kurtosis.e.From Fig. 4 and Fig. 6, it is clear that the power of the test decreases slightly than that of the normal data when the skewness parameter g increases in positive or negative direction.f.From Table 1 we can say that the skewness parameter g and the kurtosis parameter k have a little effect on the size of the t-test.

Conclusion
Extent of effect on power of t-test for simple linear regression model is measured numerically and power curves are shown graphically.We can distil our study into the following four observations: a.As the value of the parameter k increases in positive direction, t-test gives less power than that of normal data and in case of negative kurtosis it gives better power.b.Skewness parameter g has a very little effect on the power of the test.c.As the sample size increases the difference between the power in normal and nonnormal situation decreases.d.Kurtosis parameter has more effect on the size and power of the test than that of skewness parameter for simple linear regression model.

1 β
is the slope coefficient of the regression line.It measures the change in the mean of the probability distribution of Y per unit increase in X.The parameter 0 β is the intercept coefficient of the regression line.If the scope of the model includes, X = 0, 0 β gives the mean of the probability distribution of Y at X = 0 .When the scope of the model does not cover X = 0, 0