# Using Dummy Variables in Regression

An equivalent to using analysis of variance is to use what are called dummy variables in regression. Dummy variables are indicator variables. They are normally coded 0 or 1. In order to use our 'CLASS' variable as dummy variables in regression, we would need to create these dummy variables:

`X1 = 1 if 'CLASS' is a 1 (freshman), 0 otherwise`

` X2 = 1 if 'CLASS' is a 2 (sophomore), 0 otherwise`

` X3 = 1 if 'CLASS' is a 3 (junior), 0 otherwise`

It is necessary to have only 3 dummy variables even though there are 4 classes: if X1, X2, and X3 are all 0, then we have a senior.

The CODE statement in MINITAB allows us to create these variables from the existing 'CLASS' variable. Let's use columns 10,11, and 12 to store these new variables. The statements to create the new variables would be the following.

`MTB> NAME C10= 'X1' C11= 'X2' C12= 'X3'`

` MTB> CODE (1)1 (2 3 4)0 'CLASS''X1'`

` MTB> CODE (2)1 (1 3 4)0 'CLASS' 'X2'`

` MTB> CODE (3)1 (1 2 4)0 'CLASS' 'X3'`

The first code statement asks that we code the value 1 to a 1, and the values 2,3, and 4 to a 0, for the variable 'CLASS', and put the results in the variable 'X1'.

Then we could run a regression. If our dependent variable is 'TEST4' our independent variables would be the dummy variables for 'CLASS'. The model would be this:

`Y= B0 + B1*X1 + B2*X2 + B3*X3`

To run the regression, type:

`MTB> REGRESS 'TEST4' 3 'X1' 'X2' 'X3'`

The regression equation is

`TEST4 = 81.5 + 1.83 X1 - 3.75 X2 - 4.50 X3`

`Predictor       Coef       Stdev   t-ratio        pConstant      81.500       5.817     14.01    0.000X1             1.833       6.991      0.26    0.796X2            -3.750       8.227     -0.46    0.655X3            -4.500       8.886     -0.51    0.619s = 11.63       R-sq = 6.1%      R-sq(adj)= 0.0%Analysis of VarianceSOURCE       DF          SS         MS         F        pRegression    3       140.1       46.7      0.34    0.793Error        16      2165.8      135.4Total        19      2305.8SOURCE       DF      SEQ SSX1            1        96.9X2            1         8.4X3            1        34.7Unusual ObservationsObs.      X1     TEST4       Fit Stdev.Fit Residual   St.Resid 15     1.00     60.00     83.33     3.88    -23.33     -2.13R R denotes an obs. with a large st.resid.`

We can see that the analysis of variance table produced by regression is the same as that produced by the Analysis of Variance command.