# SPSS Techniques Series: Statistics on Likert Scale Surveys

Disclaimer

What is a Likert Scale?

Reading in the survey from optical mark scanning sheet data

Producing a Table of the frequency results

Producing Means and Standard Deviations

Using T-Tests to compare groups

About the assumptions of the T-Test

An Alternative to T-Tests: the Chi Square Statistic

T-Tests on pairs of questions

Computing Subscales and what to do about reverse wording

Statistics on subscales

### Disclaimer

This paper contains a description and example of how to run t-tests on individual Likert scale questions. Although this is often done in practice, it is NOT a statistically valid technique, since Likert scale questions do not possess a normal probability distribution. It is presented in this paper since it is commonly asked about. This is further discussed below.

### What is a Likert Scale?

A Likert scale measures the extent to which a person agrees or disagrees with the question. The most common scale is 1 to 5. Often the scale will be 1=strongly disagree, 2=disagree, 3=not sure,4=agree, and 5=strongly agree.

### Reading in the survey from optical mark scanning sheet data

Since likert scale questions most often range from 1 to 5, optical mark scanning sheet can be used for data entry. If we have a survey consisting of 20 Likert scale questions, an SPSS program to read in the data and produce frequencies would be this:

```/* likert.sps spss program written by Mary Howard on 5-1-91.
/* This demonstrates how to read from
/* a file containing OMR data, and run frequencies on
/* each of the questions.

data list
file='likert.dat'
/id 60-62 sex 67-67 (a) q1 to q20 70-89

variable labels
q1 'question 1'
/q2 'question 2'
/q3 'question 3'
/q4 'question 4'
/q5 'question 5'
/q6 'question 6'
/q7 'question 7'
/q8 'question 8'
/q9 'question 9'
/q10 'question 10'
/q11 'question 11'
/q12 'question 12'
/q13 'question 13'
/q14 'question 14'
/q15 'question 15'
/q16 'question 16'
/q17 'question 17'
/q18 'question 18'
/q19 'question 19'
/q20 'question 20'

value labels
sex 'M' 'male'
'F' 'female'
/q1 to q20
1 '1) strongly disagree'
2 '2) disagree'
3 '3) not sure'
4 '4) agree'
5 '5) strongly agree'

frequencies
variables = sex q1 to q20```

*****

```SEX
Valid     Cum
Value Label                 Value Frequency  Percent  Percent  Percent

female                          F       68     56.7     56.7     56.7
male                            M       52     43.3     43.3    100.0
-------  -------  -------
Total      120    100.0    100.0

Valid cases     120      Missing cases     0
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - -
Q1        question 1
Valid     Cum
Value Label                 Value Frequency  Percent  Percent  Percent

2) disagree                     2        1       .8      1.3      1.3
3) not sure                     3        5      4.2      6.3      7.5
4) agree                        4       15     12.5     18.8     26.2
5) strongly agree               5       59     49.2     73.8    100.0
.       40     33.3   Missing
-------  -------  -------
Total      120    100.0    100.0

Valid cases      80      Missing cases    40```

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - -

```Q2        question 2
Valid     Cum
Value Label                 Value Frequency  Percent  Percent  Percent

2) disagree                     2        2      1.7      2.4      2.4
3) not sure                     3        2      1.7      2.4      4.7
4) agree                        4       25     20.8     29.4     34.1
5) strongly agree               5       56     46.7     65.9    100.0
.       35     29.2   Missing
-------  -------  -------
Total      120    100.0    100.0

Valid cases      85      Missing cases    35```

-----------------------------------------------------------------------

```Q11        question 11
Valid     Cum
Value Label                 Value Frequency  Percent  Percent  Percent

1) strongly disagree            1       10      8.3     10.5     10.5
2) disagree                     2       11      9.2     11.6     22.1
3) not sure                     3       13     10.8     13.7     35.8
4) agree                        4        9      7.5      9.5     45.3
5) strongly agree               5       52     43.3     54.7    100.0
.       25     20.8   Missing
-------  -------  -------
Total      120    100.0    100.0

Valid cases      95      Missing cases    25
-----------------------------------------------------------------------```

### Producing a Table of the frequency results

The TABLES procedure has the ability to produce presentation quality tables with labels as the column headings. This is especially useful for likert scale questions. Using the TABLES command, we can present the frequency results for all questions on one page.

To run the tables procedure, replace the frequencies code in the above program with the following code:

```tables
format= margins(1,80) dbox light nspace cwidth(13,5)
/table = q1+q2+q3+q4+q5+q6+q7+q8+q9+q10+q11+q12+
q13+q14+q15+q16+q17+q18+q19+q20 by (labels) > (statistics)
/statistics = count('cnt'(f3.0)) cpct('pct')
/ttitle = 'Results of questions 1 to 20'```

The format statement specifies the following formats:

• margins at 1 and 80.
• dissertation boxing (no lines in the body of the table).
• light, non- bold printing
• no spacing between variables
• column width of 13 for the first column (containing descriptions), and 5 for all other columns.

The table statement requests that all the questions be listed down the rows of the table. The plus sign between questions indicates that they should be concatenated together. Across the columns, the keyword "(labels)" indicates that the labels should form the headings to the columns. A greater than sign precedes the keyword "(statistics)", indicating that the statistics should be nested under the labels.

The "statistics" statement requests that the count and the count percent be printed. A format follows the count to request it be printed as a whole number. A label follows the percent.

The "ttitle" statement prints a title for the table.

```                         Results of questions 1 to 20
________________________________________________________________________
-------------------------------------------------------------------------
1) strongly 2) disagree3) not sure  4) agree   5) strongly
disagree                                         agree
----------- ---------------------- ----------- -----------
cnt   pct   cnt  pct   cnt   pct   cnt   pct   cnt   pct
-------------------------------------------------------------------------
question 1                   1   1.3%   5   6.3%   15  18.8%   59  73.8%
question 2                   2   2.4%   2   2.4%   25  29.4%   56  65.9%
question 3                   1   1.1%  12  13.3%   31  34.4%   46  51.1%
question 4                              2   2.7%   17  23.0%   55  74.3%
question 5       2   2.2%   11  12.0%  17  18.5%   27  29.3%   35  38.0%
question 6      15  14.9%   22  21.8%  15  14.9%   20  19.8%   29  28.7%
question 7       3   3.4%    8   9.1%  18  20.5%   23  26.1%   36  40.9%
question 8       1   1.1%    4   4.3%  11  11.7%   32  34.0%   46  48.9%
question 9                   5   5.6%  10  11.1%   26  28.9%   49  54.4%
question 10                  4   5.3%   6   8.0%   17  22.7%   48  64.0%
question 11     10  10.5%   11  11.6%  13  13.7%    9   9.5%   52  54.7%
question 12      7   6.4%   15  13.6%  20  18.2%   29  26.4%   39  35.5%
question 13                  2   2.2%   6   6.7%   16  17.8%   66  73.3%
question 14                  6   6.2%   5   5.2%   33  34.0%   53  54.6%
question 15     18  16.5%   24  22.0%  20  18.3%   25  22.9%   22  20.2%
question 16      2   2.2%    2   2.2%   6   6.6%   24  26.4%   57  62.6%
question 17      1   1.3%    2   2.5%   4   5.1%   17  21.5%   55  69.6%
question 18     26  22.4%   35  30.2%  19  16.4%   22  19.0%   14  12.1%
question 19                  1   1.1%  10  11.1%   18  20.0%   61  67.8%
question 20                  3   3.1%   7   7.3%   36  37.5%   50  52.1%
-------------------------------------------------------------------------```

### Producing Means and Standard Deviations

The DESCRIPTIVES procedure in SPSS produces means and standard deviations for variables. It also prints the minimum and maximum value. Likert scale questions are appropriate to print means for since the number that is coded can give us a feel for which direction the average answer is. The standard deviation is also important as it give us an indication of the average distance from the mean. A low standard deviation would mean that most observations cluster around the mean. A high standard deviation would mean that there was a lot of variation in the answers. A standard deviation of 0 is obtained when all responses to a question are the same. The following code produces descriptive statistics of columns 1 to 20. The minimum and maximum value tell us the range of answers given by our survey population.

```descriptives
variables = q1 to q20

Valid
Variable      Mean    Std Dev   Minimum  Maximum      N  Label

Q1            4.65        .66        2         5     80  question 1
Q2            4.59        .66        2         5     85  question 2
Q3            4.36        .75        2         5     90  question 3
Q4            4.72        .51        3         5     74  question 4
Q5            3.89       1.11        1         5     92  question 5
Q6            3.26       1.45        1         5    101  question 6
Q7            3.92       1.14        1         5     88  question 7
Q8            4.26        .90        1         5     94  question 8
Q9            4.32        .88        2         5     90  question 9
Q10           4.45        .86        2         5     75  question 10
Q11           3.86       1.45        1         5     95  question 11
Q12           3.71       1.26        1         5    110  question 12
Q13           4.62        .71        2         5     90  question 13
Q14           4.37        .85        2         5     97  question 14
Q15           3.08       1.39        1         5    109  question 15
Q16           4.45        .89        1         5     91  question 16
Q17           4.56        .81        1         5     79  question 17
Q18           2.68       1.34        1         5    116  question 18
Q19           4.54        .74        2         5     90  question 19
Q20           4.39        .76        2         5     96  question 20```

### Using T-Tests to compare groups

T-Tests compare the means between groups. You might wish to know if the males answered the question differently from the females. First, form a new variable which coded the sex as 1=males, 2=females, then do a t-test comparing the two groups. This is necessary because the "sex" variable is alphabetic, and the t-test procedure requires that the grouping variable be numeric. The code is this:

```numeric sex num(f1)
recode sex ('M'=1) ('F'=2) into sexnum
t-test
groups= sexnum(1,2)
/variables = q1 to q20```

The numeric statement declares a new variable, "sexnum". "f1" specifies that the variable should be numeric, 1 digit long. The recode statement uses the alphabetic sex variable read off the OMR sheet to form a new variable that is numeric. The t-test procedure in SPSS then compares groups 1 and 2 of the variable sexnum (male and female). The variables statement specifies that a t-test is to be performed on each of the variables q1 to q20. This will produce 20 t-tests.

``` t-tests for independent samples of SEXNUM

GROUP 1 - SEXNUM  EQ  1
GROUP 2 - SEXNUM  EQ  2

Variable         Number             Standard    Standard
of Cases     Mean  Deviation      Error
------------------------------------------------------------
Q1        question 1
GROUP 1      40       4.4750      .716       .113
GROUP 2      40       4.8250      .549       .087

| Pooled VarianceEstimate | Separate Variance Estimate
|                         |
F   2-tail  |    t   Degrees of2-tail |    t    Degrees of  2-tail
Value  Prob.  |  Value   Freedom  Prob. |  Value    Freedom    Prob.
----------------+-------------------------+------------------------------
1.70   .103  |  -2.45     78     .016  |  -2.45     73.12      .017```

*****

```Variable         Number             Standard    Standard
of Cases    Mean   Deviation      Error
------------------------------------------------------------
Q2        question 2
GROUP 1      37      4.3784       .828       .136
GROUP 2      48      4.7500       .438       .063

| Pooled VarianceEstimate | Separate Variance Estimate
|                         |
F   2-tail  |    t   Degreesof 2-tail |    t    Degrees of  2-tail
Value  Prob.  |  Value   Freedom  Prob. |  Value    Freedom    Prob.
----------------+-------------------------+------------------------------
3.58   .000  |  -2.67    83      .009  |  -2.48     51.33      .017```

*****

```      Variable         Number             Standard    Standard
of Cases    Mean   Deviation      Error
------------------------------------------------------------
Q3        question 3

GROUP 1      40      4.4750       .751       .119
GROUP 2      50      4.2600       .751       .106

| Pooled VarianceEstimate | Separate Variance Estimate
|                         |
F   2-tail  |    t   Degreesof 2-tail |    t    Degrees of  2-tail
Value  Prob.  |  Value  Freedom   Prob. |  Value    Freedom    Prob.
----------------+-------------------------+------------------------------
1.00  1.000  |   1.35    88      .180  |   1.35     83.72      .181```

### About the assumptions of the T-Test

Mendenhall, in the book "Statistics for Management and Economics", writes:

"It is important to note that the Student's t and the corresponding tabulated critical values are based on the assumption that the sampled population possesses a normal probability distribution."

He goes on to write:

"Fortunately, this point is of little consequence, as it can be shown that the distribution of the t statistic possesses nearly the same shape as the theoretical t distribution for populations that are nonnormal but possess a mound shaped probability distribution."

These are important points when one considers doing a t-test on a likert scale question. A likert scale question with only 5 possible answers cannot possibly possess a normal probability distribution. This is because the range of answers is discrete, not continuous (presumably one is not allowed to answer 1.3 or 2.55). So the researcher should at least make sure the distribution is mound shaped. You should check the frequency results of your questions if you are planning t-tests.

In reality, many researchers run t-tests on every question. You should be skeptical of your results if the underlying distribution is not mound shaped. Use the results of t-tests not for hard scientific proof, but rather for indications of trends in the data.

For example, when we run a t-test on question 1 by sex we find we have a significance level of .017, nearly 99% significant. But then look back at the frequency distribution for question 1. No one answered 1, 1 person answered 2, 5 people answered 3, 15 people answered 4, and 59 people answered 5. Not only were most of the responses concentrated on answers 4 and 5, but the results are clearly not mound shaped, but heavily skewed towards response 5. Thus it would be incorrect to assume that males are different from females in the way they answered this question since we have severely violated the assumptions of the test.

### An Alternative to T-Tests: the Chi Square Statistic

Likert scale questions have a range of answers that is discrete, not continuous. The chi-square statistic is designed for use in a multinomial experiment, where the outcomes are counts that fall into categories. The chi-square statistic determines whether observed counts in cells are different from expected counts.

In the t-test above we wanted to determine if the responses to questions differed by sex. Rather than doing a t-test, we can run a chi-square statistic. Since the chi-square statistic assumes a discrete distribution rather than a normal distribution, the results will be statistically valid and can be used as scientific proof. There is an assumption that all expected counts be greater than or equal to 5. We can print the expected counts in SPSS to check this.

To run a chi-square statistic on every question by sex, use the Crosstabs procedure, like this:

crosstabs tables = sexnum by q1 to q20 /cells = count row column expected resid /statistics = chisq

```      SEXNUM  by  Q1  question 1

Q1                            Page 1 of 1
Count  |
Exp Val |
Row Pct |2) disag3) not s 4) agree 5) stron
Col Pct |ree     ure               gly agre   Row
Residual|     2 |     3  |     4  |     5  | Total
SEXNUM     -------+-------+--------+--------+--------+
1  |     0 |     5  |    11  |    24  |    40
|    .5 |   2.5  |   7.5  |  29.5  | 50.0%
|   .0% | 12.5%  | 27.5%  | 60.0%  |
|   .0% |100.0%  | 73.3%  | 40.7%  |
|   -.5 |   2.5  |   3.5  |  -5.5  |
+-------+--------+--------+--------+
2  |     1 |     0  |     4  |    35  |    40
|    .5 |   2.5  |   7.5  |  29.5  | 50.0%
|  2.5% |   .0%  | 10.0%  | 87.5%  |
|100.0% |   .0%  | 26.7%  | 59.3%  |
|    .5 |  -2.5  |  -3.5  |   5.5  |
+-------+--------+--------+--------+
Column       1       5       15       59       80
Total    1.3%    6.3%    18.8%    73.8%   100.0%

Chi-Square              Value           DF               Significance
----------------        -----------       ----              ------------

Pearson                  11.31751          3                  .01013
Likelihood Ratio         13.77762          3                  .00322
Mantel-Haenszel test for  5.65936          1                  .01736
linear association

Minimum Expected Frequency-     .500
Cells with Expected Frequency < 5 -     4 OF     8 ( 50.0%)

Number of Missing Observations: 40```

*****

```      SEXNUM  by  Q2  question 2

Q2                            Page 1 of 1
Count  |
Exp Val |
Row Pct |2) disag 3) not s 4) agree 5) stron
Col Pct |ree     ure               gly agre   Row
Residual|     2 |     3  |     4  |     5  | Total
SEXNUM     ------+--------+--------+--------+--------+
1  |     2 |     2  |    13  |    20  |    37
|    .9 |    .9  |  10.9  |  24.4  | 43.5%
|  5.4% |  5.4%  | 35.1%  | 54.1%  |
|100.0% |100.0%  | 52.0%  | 35.7%  |
|   1.1 |   1.1  |   2.1  |  -4.4  |
+-------+--------+--------+--------+
2  |     0 |     0  |    12  |    36  |    48
|   1.1 |   1.1  |  14.1  |  31.6  | 56.5%
|   .0% |   .0%  | 25.0%  | 75.0%  |
|   .0% |   .0%  | 48.0%  | 64.3%  |
|  -1.1 |  -1.1  |  -2.1  |   4.4  |
+-------+--------+--------+--------+
Column       2       2       25       56       85
Total    2.4%    2.4%    29.4%    65.9%   100.0%

Chi-Square          Value          DF               Significance
--------------------   -----------       ----              ------------

Pearson                  7.31033          3                  .06264
Likelihood Ratio         8.79340          3                  .03217
Mantel-Haenszel test for 6.62466          1                  .01006
linear association

Minimum Expected Frequency -    .871
Cells with Expected Frequency < 5 -     4 OF     8 ( 50.0%)

Number of Missing Observations: 35```

*****

```      SEXNUM  by  Q3  question 3

Q3                            Page 1 of 1
Count  |
Exp Val |
Row Pct |2) disag 3) not s 4) agree 5) stron
Col Pct |ree     ure               gly agre   Row
Residual|     2  |    3  |     4  |     5  | Total
SEXNUM     ------+--------+-------+--------+--------+
1  |     1  |    3  |    12  |    24  |    40
|    .4  |  5.3  |  13.8  |  20.4  | 44.4%
|  2.5%  | 7.5%  | 30.0%  | 60.0%  |
|100.0%  |25.0%  | 38.7%  | 52.2%  |
|    .6  | -2.3  |  -1.8  |   3.6  |
+--------+-------+--------+--------+
2  |     0  |    9  |    19  |    22  |    50
|    .6  |  6.7  |  17.2  |  25.6  | 55.6%
|   .0%  |18.0%  | 38.0%  | 44.0%  |
|   .0%  |75.0%  | 61.3%  | 47.8%  |
|   -.6  |  2.3  |   1.8  |  -3.6  |
+-------+--------+--------+--------+
Column       1      12       31       46       90
Total    1.1%   13.3%    34.4%    51.1%   100.0%

Chi-Square            Value          DF               Significance
--------------------     -----------       ----              ------------

Pearson                    4.61345          3                  .20239
Likelihood Ratio           5.09372          3                  .16506
Mantel-Haenszel test for   1.80598          1                  .17899
linear association

Minimum Expected Frequency -    .444
Cells with Expected Frequency < 5 -     2 OF     8 ( 25.0%)
Number of Missing Observations: 30```

*****

```     SEXNUM  by  Q6  question 6

Q6                                     Page 1 of 1
Count  |
Exp Val |
Row Pct |1) stron 2) disag 3) not s 4) agree 5) stron
Col Pct |gly disaree      ure               gly agre   Row
Residual|     1  |    2  |     3  |     4  |     5  | Total
SEXNUM     -------+--------+-------+--------+--------+--------+
1  |    12  |   15  |     9  |     9  |     6  |    51
|   7.6  | 11.1  |   7.6  |  10.1  |  14.6  | 50.5%
| 23.5%  |29.4%  | 17.6%  | 17.6%  | 11.8%  |
| 80.0%  |68.2%  | 60.0%  | 45.0%  | 20.7%  |
|   4.4  |  3.9  |   1.4  |  -1.1  |  -8.6  |
+--------+-------+--------+--------+--------+
2  |     3  |    7  |     6  |    11  |    23  |    50
|   7.4  | 10.9  |   7.4  |   9.9  |  14.4  | 49.5%
|  6.0%  |14.0%  | 12.0%  | 22.0%  | 46.0%  |
| 20.0%  |31.8%  | 40.0%  | 55.0%  | 79.3%  |
|  -4.4  | -3.9  |  -1.4  |   1.1  |   8.6  |
+--------+-------+--------+--------+--------+
Column      15      22       15       20       29      101
Total   14.9%   21.8%    14.9%    19.8%    28.7%   100.0%

Chi-Square           Value           DF               Significance
--------------------   -----------       ----              ------------

Pearson                   19.06658          4                  .00076
Likelihood Ratio          20.18692          4                  .00046
Mantel-Haenszel test for  18.16309          1                  .00002
linear association

Minimum Expected Frequency -   7.426
Number of Missing Observations: 19```

In each cell SPSS has printed the count, the expected count, the row percent, the column percent, and the residual. The expected count is computed as the row total times the column total divided by the overall total. The residual is computed as the count minus the expected count. These residuals form the basis of the chi-square statistic. In order to derive the chi-square statistic, the residual for each cell is squared, then divided by the expected count. Then this value is added over all cells to become the chi-square statistic. The degrees of freedom is the number of rows minus 1 times the number of columns minus 1. The Pearson chi-square statistic is the most commonly used.

SPSS will print the number of cells with an expected frequency less than 5. A large proportion of cells like this will mean that the assumptions of the chi-square test are being violated, and one should not make inferences from the statistics. In the examples above, questions 1 and 2 have 50% of cells with expected frequency less than 5, and therefore their chi-square statistics are not statistically valid. In question 6, none of the expected frequencies are less than 5, so these results are statistically valid.

### T-Tests on pairs of questions

Some surveys using likert scale questions are set up with pairs of responses, such as this:

```                                           Do Now     Will do in
future
description                       (1) 1 2 3 4 5  (2) 1 2 3 4 5
description                       (3) 1 2 3 4 5  (4) 1 2 3 4 5
description                       (5) 1 2 3 4 5  (6) 1 2 3 4 5
description                       (7) 1 2 3 4 5  (8) 1 2 3 4 5
description                       (9) 1 2 3 4 5 (10) 1 2 3 4 5
description                      (11) 1 2 3 4 5 (12) 1 2 3 4 5
description                      (13) 1 2 3 4 5 (14) 1 2 3 4 5
description                      (15) 1 2 3 4 5 (16) 1 2 3 4 5
description                      (17) 1 2 3 4 5 (18) 1 2 3 4 5
description                      (19) 1 2 3 4 5 (20) 1 2 3 4 5```

To compare the first column to the second column, you can use the paired t-test(though the same problems with normality exist as discussed above). The paired t-test coding would be this:

``` compute score1=mean(q1,q3,q5,q7,q9,q11,q13,q15,q17,q19)
compute score2=mean(q2,q4,q6,q8,q10,q12,q14,q16,q18,q20)
t-test
pairs= q1 q2
/q3 q4
/q5 q6
/q7 q8
/q9 q10
/q11 q12
/q13 q14
/q15 q16
/q17 q18
/q19 q20
/score1 score2```

### Computing Subscales and what to do about reverse wording

In surveys that contain a series of likert scale questions we are often interested in combining the questions to arrive at a score or series of scores for a person. Usually these scores are computed as the mean of all or selected questions. An overall scale would be the mean of all the questions. A subscale would be the mean of a series of questions related to one topic.

A researcher will often design a survey with some questions with reverse wording. This is usually done to force the person taking the survey to carefully read the questions. Prior to computing a scale that is the mean of a series of questions, first assign points to each question so that the reverse wording questions will be assigned the opposite number of points than the positively worded questions.

Suppose in our questions 1 through 20 the following questions have reverse wording: 3,4,7,10,12,15,16,20. Since our scale is 1 to 5, with 5 being strongly agree, we will want to assign points to the reverse wording questions like this: If the answer is 1, give 5 points, if the answer is 2, give 4 points, if the answer is 3, give 3 points, if the answer is 4, give 2 points, and if the answer is 5, give 1 point to the answer.

The questions that are not reverse worded would get the same number of points as the answer.

To accomplish the coding, create 20 new variables to hold the points. Let's call them points1 to points20. Then the code would be the following:

``` numeric points1 to points20(f1)
recode q3 q4 q7 q10 q12 q15 q16 q20
(1=5) (2=4) (3=3) (4=2) (5=1) (else=sysmis) into
points3 points4 points7 points10 points12 points15 points16 points20
recode q1 q2 q5 q6 q8 q9 q11 q13 q14 q17 q18 q19
(1,2,3,4,5=copy) (else=sysmis) into
points1 points2 points5 points6 points8 points9 points11
points13 points14 points17 points18 points19```

Some people just recode the original questions. I prefer to create new variables, since the code is clearer and there is less chance of making a mistake. I have found it is useful to do recodes on the questions that are positively worded as well as those that are negatively worded. Although it may seem that you are just copying the values, this coding forces any illegal values (such as a "6") to be coded as missing for the points.

Once you have assigned points to each question, then you can compute your scales. Suppose that you want an overall scale (questions 1 to 20), and also 3 subscales, where the first subscale would be the mean of questions 1,3,4,5,9,11, and 12, the second subscale would be the mean of questions 2,6,7,8,13,15,and 17, and the third subscale would be the mean of questions 10,14,16,18,19, and 20. Note that it is important to compute means for subscales, not sums, since subscales with more items are likely to have higher sums than subscales with fewer items.

To compute the subscales, the code would be this:

``` numeric scale1 scale2 scale3 overall (f4.2)
compute scale1= mean(points1,points3,points4,points5,points9,points11,points12)
compute scale2=mean(points2,points6,points7,points8,points13,points15,points17)
compute scale3=mean(points10,points14,points16,points18,points19)
compute overall=mean(points1 to points20)```

### Statistics on subscales

Means and standard deviations:

To print descriptive statistics for your scales, use the

"descriptives" procedure in SPSS, like this:

```  descriptives
variables= scale1 scale2 scale3 overall

Valid
Variable      Mean    Std Dev  Minimum   Maximum      N  Label

SCALE1        3.14        .67     1.00      5.00    120
SCALE2        3.71        .49     2.33      5.00    120
SCALE3        3.03        .68     1.00      5.00    120
OVERALL       3.24        .35     2.50      4.50    120```

T-Tests:

To see if the scales are different among different groups, such as males and females, use the t-test procedure, as was done on the individual variables:

```t-test
groups=sexnum
/variables = scale1 scale2 scale3 overall```

The normality assumption is not as much of an issue when doing t-tests on subscales. Since subscales are derived as a mean of a series of questions, they are likely to have more of a range of values than one question would have. However, frequency distributions of the responses should be analyzed to see if the distribution is mound shaped.

``` t-tests for independent samples of SEXNUM

GROUP 1 - SEXNUM  EQ  1
GROUP 2 - SEXNUM  EQ  2

Variable         Number             Standard    Standard
of Cases    Mean   Deviation      Error
------------------------------------------------------------
SCALE1
GROUP 1      52       3.0505      .498       .069
GROUP 2      68       3.2167      .768       .093

| Pooled Variance Estimate | Separate Variance Estimate
|                         |
F   2-tail  |    t   Degreesof 2-tail |    t    Degrees of  2-tail
Value  Prob.  |  Value   Freedom  Prob. |  Value    Freedom    Prob.
----------------+--------------------------+------------------------------
2.37   .002  |  -1.36    118     .177  |  -1.43    115.22      .155```

*****

```      Variable         Number             Standard    Standard
of Cases    Mean   Deviation      Error
------------------------------------------------------------
SCALE2
GROUP 1      52      3.6407       .526       .073
GROUP 2      68      3.7632       .453       .055

| Pooled Variance Estimate | Separate Variance Estimate
|                         |
F   2-tail  |    t   Degrees of 2-tail |    t    Degrees of  2-tail
Value  Prob.  |  Value   Freedom  Prob. |  Value    Freedom    Prob.
----------------+--------------------------+------------------------------
1.35   .246  |  -1.37   118      .174  |  -1.34    100.53      .183```

*****

```      Variable         Number             Standard    Standard
of Cases    Mean   Deviation      Error
------------------------------------------------------------
SCALE3
GROUP 1      52      2.7292       .549       .076
GROUP 2      68      3.2564       .682       .083

| Pooled Variance Estimate | Separate Variance Estimate
|                         |
F   2-tail  |    t   Degrees of 2-tail |    t    Degrees of  2-tail
Value  Prob.  |  Value   Freedom  Prob. |  Value    Freedom    Prob.
----------------+--------------------------+------------------------------
1.54   .108  |  -4.56   118      .000  |  -4.69    117.66      .000```

*****

```      t-tests for independent samples of  SEXNUM

GROUP 1 - SEXNUM  EQ  1
GROUP 2 - SEXNUM  EQ  2

Variable         Number             Standard    Standard
of Cases    Mean   Deviation      Error
------------------------------------------------------------
OVERALL
GROUP 1      52      3.1175       .291       .040
GROUP 2      68      3.3271       .359       .043

| Pooled VarianceEstimate | Separate Variance Estimate
|                         |
F   2-tail  |    t   Degreesof 2-tail |    t    Degrees of  2-tail
Value  Prob.  |  Value   Freedom  Prob. |  Value    Freedom    Prob.
----------------+--------------------------+------------------------------
1.52   .123  |  -3.44   118      .001  |  -3.53    117.54      .001```

Correlations:

To know how correlated each of the subscales are to each other and to the overall scale, produce a correlation matrix.

```correlations
variables= scale1 scale2 scale3 overall```

The correlation procedure will produce a correlation matrix. For each combination of variables, the Pearson Product-Moment Coefficient of Correlation is computed. This correlation coefficient, commonly called r, measures the linear correlation between two variables. Thus r=0 implies no linear correlation between two variables. A positive value for r would imply that if you plotted the two variables, a line could be drawn through the points that sloped upward and to the right; a negative value indicates that it slopes downward to the right.

The significance of the correlation is also given. The significance is determined by using regression with the model being y= b0 + b1 x, where b0 is the intercept of the line (beta zero), and b1 is the slope of the line (beta one). The null hypothesis of this test is that b1 is equal to zero, that there is no linear relationship between the two variables. The alternative hypothesis is that b1 is not equal to zero, implying that there is a significant slope to the line, either negative or positive.

Although it not printed by the correlations procedure, from the r correlation coefficient one can obtain the r squared coefficient by simply squaring r. The r squared coefficient measures the percent of variance of a variable y that is explained by another variable x. A correlation coefficient of r=.1 implies a r squared of .01, or 1% of the variance (which is very small). A correlation coefficient of r=.5 implies a r squared of .25, or 25% of the variance (which still leaves 75% of the variance not explained).

There is an assumption in correlation that both variables have a normal distribution. You should be careful not to correlate categorical variables with continuos or other categorical values. An example would be our "sexnum" variable above, where 1=male and 2=female. You certainly would not wish to make inferences based on the coding of categorical variables; notice that we could have coded 1's to be females and 2's to be males (or 50 and 1000, for that matter).

```                           - -  CorrelationCoefficients  - -

SCALE1     SCALE2    SCALE3     OVERALL

SCALE1      1.0000     -.0554     .2088*     .6619**
SCALE2      -.0554     1.0000     .0864      .4806**
SCALE3       .2088*     .0864    1.0000      .6247**
OVERALL      .6619**    .4806**   .6247**   1.0000

* - Signif. LE .05      ** -Signif. LE .01      (2-tailed)

" . " is printed ifa coefficient cannot be computed```

ssps2