# Using Dummy Variables in Regression

An equivalent to using analysis of variance is to use what are called dummy variables in regression. Dummy variables are indicator variables. They are normally coded 0 or 1. In order to use our 'CLASS' variable as dummy variables in regression, we would need to create these dummy variables:

X1 = 1 if 'CLASS' is a 1 (freshman), 0 otherwise

X2 = 1 if 'CLASS' is a 2 (sophomore), 0 otherwise

X3 = 1 if 'CLASS' is a 3 (junior), 0 otherwise

It is necessary to have only 3 dummy variables even though there are 4 classes: if X1, X2, and X3 are all 0, then we have a senior.

The CODE statement in MINITAB allows us to create these variables from the existing 'CLASS' variable. Let's use columns 10,11, and 12 to store these new variables. The statements to create the new variables would be the following.

MTB> NAME C10= 'X1' C11= 'X2' C12= 'X3'

MTB> CODE (1)1 (2 3 4)0 'CLASS''X1'

MTB> CODE (2)1 (1 3 4)0 'CLASS' 'X2'

MTB> CODE (3)1 (1 2 4)0 'CLASS' 'X3'

The first code statement asks that we code the value 1 to a 1, and the values 2,3, and 4 to a 0, for the variable 'CLASS', and put the results in the variable 'X1'.

Then we could run a regression. If our dependent variable is 'TEST4' our independent variables would be the dummy variables for 'CLASS'. The model would be this:

Y= B0 + B1*X1 + B2*X2 + B3*X3

To run the regression, type:

MTB> REGRESS 'TEST4' 3 'X1' 'X2' 'X3'

The regression equation is

TEST4 = 81.5 + 1.83 X1 - 3.75 X2 - 4.50 X3

Predictor       Coef       Stdev   t-ratio        p
Constant 81.500 5.817 14.01 0.000
X1 1.833 6.991 0.26 0.796
X2 -3.750 8.227 -0.46 0.655
X3 -4.500 8.886 -0.51 0.619

s = 11.63 R-sq = 6.1% R-sq(adj)= 0.0%

Analysis of Variance

SOURCE DF SS MS F p
Regression 3 140.1 46.7 0.34 0.793
Error 16 2165.8 135.4
Total 19 2305.8

SOURCE DF SEQ SS
X1 1 96.9
X2 1 8.4
X3 1 34.7

Unusual Observations

Obs. X1 TEST4 Fit Stdev.Fit Residual St.Resid
15 1.00 60.00 83.33 3.88 -23.33 -2.13R

R denotes an obs. with a large st.resid.

We can see that the analysis of variance table produced by regression is the same as that produced by the Analysis of Variance command.