Introduction to SPSS

Account information

Some Useful DCL Command Strings

Running a SPSS Job on the Dec Alpha

Running an SPSS Job from a Subdirectory

Sample Survey

Coding Data

Entering Data using the EVE Text Editor

Creating A SPSS Program

Indentation and Syntax in SPSS

Frequency Tables

Labels for Values

Labels for Variables

Crosstabulation

Tables

Walkthrough


Preliminaries

In order to use SPSS, you must first obtain a DEC Alpha account. Contact the ITS Computer Consulting Center in ITTC 36 (phone 319-273-5555) for an application.

Some useful DCL command strings

DCL stands for Digital Command language. The DCL prompt is the dollar sign. From it you can enter DCL command strings, and run SAS. DCL command strings are composed of commands, parameters, and qualifiers. A command is an instruction to the operating system. A parameter defines what the command is to operate on. A qualifier defines how that action will occur.

Here are some useful DCL command strings:

DIRECTORY -- list the files in your directory
PRINT <filespec>-- print a file to the printer
TYPE <filespec>--display a file on the screen
LOOK <filespec>and TYPE/PAGE <filespec>-- display a file on the screen, a page at a time
DELETE <filespec>-- delete a file
PURGE --delete all but most recent version of your files
HELP -- invoke the help screens
COPY <filespec> <filespec>--copy a file
ED <filespec>--enter the EVE text editor
LOGOUT --log off the system

All DCL commands may be abbreviated. All that is needed are enough letters to make it unique. Four characters are always sufficient. Complete file specifications <filespec>have the following form:

 

node::device:[directory]filename.type;version

The node is for networked computers and is "ACAD" for the DEC alpha at UNI. The device is a logical name for a disk pack. For faculty, this is FAC. For students, this is STU1 or STU2. The directory is your Username, or a subdirectory you have created. Note that the brackets are required.

Both filenames and types can be up to 39 letters. They may consist of letters, numbers, the underscore, the hyphen, and the dollar sign. File names may start with either a letter or a number, but not the underscore, hyphen, or dollar sign. Version numbers are not required in most instances. Usually, all that is needed is a filename and a filetype.

 

Running a SPSS Job on the DEC Alpha

Typically, two files need to be created prior to running SPSS. First, there needs to be a file that contains the data. The results of a survey, for example, need to be taken from the survey forms and entered into a file.

The second file needed is a file that contains SPSS commands. This is a program written in the SPSS language. The program contains instructions as to how to read the data. It also contains procedures for producing results from the data.

These files are usually created using the editor. Once you have created these two files, you can then run SPSS. The command string to do so is:

$ SPSS programfile

For example, if you have named your program file "RECYCLE.SPS", and you want your output to be placed in a file called "RECYCLE.LIS", you would type this:

$ SPSS RECYCLE

Here we have used default file types: ".SPS" is the default file type for SPSS command files and ".LIS" is the default file type for SPSS listings.

A message will appear indicating the status of your job, like this:

Job programfile (queue SPS$BATCH, entry 533) started on SPS$BATCH

Batch queue SPS$BATCH, on TOPGUN::

 

     Jobname Username Entry Status
     ------- -------- ----- -------
     RECYCLE Howard 533 Executing

Job programfile (queue SPS$BATCH, entry 999) started on SPS$BATCH

Your SPSS job has been submitted to the proper batch queue.  You will be notified when it finishes.

Results are being written to: FAC:[HOWARD]RECYCLE.LIS

When the job is done, a message like this will appear:

Job programfile (queue SPS$BATCH, entry 999) completed

Note that the queue is on node "TOPGUN". This is because the SPSS package is actually located on the VAX 4000 workstation, whose logical name is "TOPGUN". The DEC Alpha takes care of submitting and receiving the job from the 4000, so it should not be of a concern. However, if the VAX 4000 workstation is down, you cannot run SPSS.

To find out where your job is in the queue, type: $ show queue sps$batch/all

To delete your job from the queue, type: $ delete/entry=nnn, where nnn is your entry number in the queue.

The listing file will contain carriage control characters, and is 132 columns wide unless you include an option statement in your SPSS program, described below.

Use the LOOK utility to see the file. The commands "L" and "R" in the LOOK utility will move you left and right.

An SPSS program can be created in an 80 column wide width by including this line as the first line in your SPSS program:

set width=80

To print your program and results, type this:

$ PRINT programfile.SPS {prints the program }


$ PRINT programfile.LIS {prints the output }

Running a SPSS job from a Subdirectory

To run a SPSS job from a Subdirectory, normally both the SPSS program and the data file accessed by that program must reside in that Subdirectory. To run the program from that directory, type:

$ SPSS programfile

For example, to run a program "RECYCLE.SPS" in a Subdirectory, you would type:

$ SPSS RECYCLE

The file created by SPSS, "programfile.LIS", would then be created in that Subdirectory.

Sample Survey

We will use the following survey to illustrate write a SPSS program.

Suppose that the city of Cedar Falls is interested in recycling. They have hired you as a consultant. You have developed the following survey to study the problem. The survey will be distributed to residents who have garbage collection.

1. If the city provided a recycling center at the landfill for newspapers and cans, would you use it?

a) Yes, for both newspapers and cans

b) Yes, only for newspapers

c) Yes, only for cans

d) No

2. If the city provided a recycling center at the College SquareMall for newspapers and cans, would you use it?

a) Yes, for both newspapers and cans

b) Yes, only for newspapers

c) Yes, only for cans

d) No

3. Would you be willing to pay an extra $20 per year in garbage fees to have the city pick up newspapers and cans at your home in order to recycle them?

a) Yes

b) No

c) Unsure

4. What is your sex?

a) Male

b) Female

5. What is your family income?

a) $0 to $10,000

b) $10,001 to $20,000

c) $20,001 to $30,000

d) over $30,000

Coding the data

To enter the survey on the computer so that it can be analyzed by SPSS, we first need to code the data. We should decide on codes for each answer. Also we should consider that some questions may not be answered. This must be coded as well. Use numbers rather than letters, as computers are faster at computing with numbers. Here is a coding scheme for the questionnaire above.

Question 1:

0 = No Response

1 = a) Yes, for both newspapers and cans

2 = b) Yes, only for newspapers

3 = c) Yes, only for cans

4 = d) No

Question 2:

0 = No Response

1 = a) Yes, for both newspapers and cans

2 = b) Yes, only for newspapers

3 = c) Yes, only for cans

4 = d) No

Question 3:

0 = No Response

1 = a) Yes

2 = b) No

3 = c) Unsure

Question 4:

0 = No Response

1 = a) Male

2 = b) Female

Question 5:

0 = No Response

1 = a) $0 to $10,000

2 = b) $10,001 to $20,000

3 = c) $20,001 to $30,000

4 = d) over $30,000

We then code each survey in preparation for putting the data into a file. Writing out the coded data on graph paper or computer coding sheets often helps new users to make fewer mistakes. Then we can enter the coded data into a file in the computer.

You should number each survey. Use a sequential number such as 1, 2, 3, etc. rather than some other key like a social security number. Write this number on the original survey. This allows checking between the coded data and the surveys.

Use one line per survey, unless it does not fit. In that case take several lines to code a survey. When your survey is small, you may find it useful to put blank spaces between each data item. For large surveys, it is usually preferable not to have blank spaces since more data items can fit on a line. You must be consistent about which column contains the answer to a question. In an SPSS program with fixed column input, the data is described the data using the column number.

For our example, we would enter the survey ID, followed by the five coded answers, like this:

001 1 2 1 1 4


002 1 1 1 2 3


003 4 4 2 1 2


004 2 3 1 1 4


005 1 1 1 1 1


006 1 1 1 1 4


007 2 2 2 2 1

Survey #3 in this coding has responded d) to question 1, d) to question 2, b) to question 3, a) to question 4, and d) to question 5.

Entering the data using the EVE editor

To enter the EVE Editor on the DEC Alpha, type ED followed by the file name. If we wanted to call our data file "RECYCLE.DAT", then we would type:

$ EDIT RECYCLE.DAT

To start entering data, just start typing. The [End of file] marker will move down to the next line. To access the command line, press the <DO> key. At the bottom of your screen you will see this:

Command:

Here you can type in any of the EVE commands. There are also keys that are useful in editing. Here are some of the most useful commands and keys (keys shown in brackets):

access the command line -- <Do>


access the HELP screens -- <Help> or HELP


turn insert on/off -- <CRTL/A> or <F14>


delete a character -- <comma on numeric pad>


delete a word -- <minus on numeric pad>


delete a line -- <PF4 on numeric pad>


save the file and exit -- <Do> EXIT or <CTRL/Z>


quit without saving -- <Do> QUIT

Note that since each survey can be coded using one line on the file, there would be as many lines in this file as there are questionnaires. When you enter the data into a file, do not enter any descriptions or labels in this file. If we did enter descriptions, SPSS would try to read these descriptions as data, and an error would occur. There should be no blank lines in this file.

When you are done entering the data, make sure that the [End of file] marker is directly below the last line of data, then press the <Do> key and type EXIT. You will then return to the DCL prompt ($).

Creating the SPSS program

A SPSS program contains the commands to interpret the data and produce the statistics desired. A SPSS program to read the data and print it would be this:

 

/* RECYCLE.SPS SPSS Program written by Mary Howard.
/* This program does the analysis of a survey on recycling. 
/* Written to demonstrate the use of SPSS.

data list
      file = 'recycle.dat'
     /id 1-3 q1 5 q2 7 q3 9 q4 11 q5 13 

list
     variables = all

Statements following the /* are comments. Note that comments may not start in column 1, so these comments start in column 2. Comments are useful in documenting the name of the file containing the program, the date written, and the purpose of the program. When you have many programs on your disk, these comments are useful in determing where the program is and what it does.

The "data list" statement also starts in column 1. It describes which columns contain the data we wish to analyze. Names are given to each column, called variables. In our example, "q1" is a variable name that refers to the answers to question 1 of our survey. We indicate that it is in column 5 of the data file. Variable names must be 8 characters or less. They may be composed of letters and numbers, but must start with a letter.

"List" is a procedure that prints the data. The variable names will be displayed at the top of the columns. It would produce output like this:

 

id q1 q2 q3 q4 q5

1 1 2 1 1 4
2 1 1 1 2 3
3 4 4 2 1 2
4 2 3 1 1 4
5 1 1 1 1 1
6 1 1 1 1 4

7 2 2 2 2 1

Indentations in SPSS

In SPSS, a procedure name must start in column one. Subcommands may be placed on the same line or be continued on the lines after that. No line in a procedure after the first may start in column one, since SPSS would interpret it as a new procedure. All subcommands after the first one (the one directly following the procedure name) must start with a slash.

A convention we will use here is to put the procedure name on a line by itself, followed by each subcommand indented on lines following it, like this:

procedurename


subcommand


/subcommand


/subcommand

procedurename


subcommand

/subcommand

/subcommand

/subcommand

procedurename

subcommand

Note that sometimes subcommands are required in a procedure or are related to each other: consult the SPSS User's Guide if you are unsure. The convention that is used here is to put one subcommand on a line (unless it does not fit). Also, we will write out the entire subcommand, even if it is possible to abbreviate parts of it. It is felt that some of the possible abbreviations make the program hard to read and understand for new users. Use blank lines between procedures. This also makes code easier to read.

Frequency Tables

A frequency table shows the number of persons that responded in a certain way on a question. It also shows percentages. To include frequency tables in our program, enter the text editor with the program and add these lines to the program:

 

frequencies
      variables = q1 q2 q3 q4 q5

On the output listing we would find five frequency tables. Each would look somewhat like this:

 

q1

                                                             Valid     Cum
        Value Label              Value  Frequency  Percent  Percent  Percent
                                     1         4     57.1     57.1     57.1
                                     2         2     28.6     28.6     85.7
                                     4         1     14.3     14.3    100.0
                                          -------  -------  -------
                                 Total         7    100.0    100.0

      Valid cases       7      Missingcases     0

Here we see that the value 1 occurs 4 times, the value 2 occurs 2 times, and the value 4 occurs 1 time for question 1.

Labels for values

Adding labels for values can improve the readability of output. In frequency tables in SPSS, value labels provide a description next to the value on the frequency table. To add labels to the SPSS program, insert this code just below the "data list" portion of the program:

 

value labels 
       q1    0  'No Response'  
             1  'a) Yes, for both newspapers and cans'
             2  'b) Yes, only for newspapers'
             3  'c) Yes, only for cans'
             4  'd) No'
       /q2   0  'No Response'
             1  'a) Yes, for both newspapers and cans'
             2  'b) Yes, only for newspapers'
             3  'c) Yes, only for cans'
             4  'd) No'
       /q3   0  'No Response'
             1  'a) Yes'
             2  'b) No'
             3  'c) Unsure'
       /q4   0  'No Response'
             1  'a) Male' 
             2  'b) Female'
       /q5   0  'No Response'
             1  'a) $0 to $10,000'
             2  'b) $10,001 to $20,000'
             3  'c) $20,001 to $30,000'
             4  'd) over $30,000'

Although up to 60 characters are allowed for value labels, some procedures use fewer. The "frequencies" procedure uses only 20. You may wish to modify labels accordingly in your own work.

Labels for variables

Variable labels help improve the readability of variable names, since variable names must be no more than 8 characters. Variable labels can be up to 120 characters, though some procedures use fewer. To code variable labels in our SPSS program, we would add these lines just below the code for the "value labels":

 

variable labels

          q1 '1. Center at Landfill'
         /q2 '2. Center at Mall'
         /q3 '3. Willing to pay extra'
         /q4 '4. Sex'
         /q5 '5. Family Income'

Here is what the frequency tables would look like with both value and variable labels added:

q1        1. Center at Landfill
                                                             Valid     Cum
        Value Label              Value  Frequency  Percent  Percent  Percent

      a) Yes, for both new           1         4     57.1     57.1     57.1
      b) Yes, only for new           2         2     28.6     28.6     85.7
      d) No                          4         1     14.3     14.3    100.0
                                          -------  -------  -------
                                 Total         7    100.0    100.0

      Valid cases       7      Missing cases     0
      - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - -
      q2        2. Center at Mall
                                                             Valid     Cum
        Value Label              Value  Frequency  Percent  Percent  Percent

      a) Yes, for both new           1         3     42.9     42.9     42.9
      b) Yes, only for new           2         2     28.6     28.6     71.4
      c) Yes, only for can           3         1     14.3     14.3     85.7
      d) No                          4         1     14.3     14.3    100.0
                                          -------  -------  -------
                                 Total         7    100.0    100.0

      Valid cases       7      Missing cases     
      - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - -
      q3        3. Willing to pay extra
                                                             Valid     Cum
        Value Label              Value  Frequency  Percent  Percent  Percent

      a) Yes                         1         5     71.4     71.4     71.4
      b) No                          2         2     28.6     28.6    100.0
                                          -------  -------  -------
                                 Total         7    100.0    100.0

      Valid cases       7      Missing cases     0
      - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - -
      q4        4. Sex
                                                             Valid     Cum
        Value Label              Value  Frequency  Percent  Percent  Percent

      a) Male                        1         5     71.4     71.4     71.4
      b) Female                      2         2     28.6     28.6    100.0
                                          -------  -------  -------
                                 Total         7    100.0    100.0

      Valid cases       7      Missing cases     0
      - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - -
      q5        5. Family Income
                                                             Valid     Cum
        Value Label              Value  Frequency  Percent  Percent  Percent

      a) $0 to $10,000               1         2     28.6     28.6     28.6
      b) $10,001 to $20,00           2         1     14.3     14.3     42.9
      c) $20,001 to $30,00           3         1     14.3     14.3     57.1
      d) over $30,000                4         3     42.9     42.9    100.0
                                          -------  -------  -------
                                 Total         7    100.0    100.0

      Valid cases       7      Missing cases     0
_ _ _ _ _ _ _ _ _ _ _ _ _ __ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Crosstabs

A crosstabulation presents the results of one variable by another variable. Perhaps we want to find out how question 1 was answered by sex. The program statements to do so would be:

 

crosstabs
        table = q1 by q4

Here is what the crosstabs would look like:

 

Q1  1. Center at Landfill  by  q4 4. Sex

                          Q4          Page 1 of 1
                  Count  |
                         |a) Male b) Femal
                         |        e          Row
                         |     1 |     2  | Total
      Q1         --------+-------+-------+
                      1  |     3 |     1  |     4
        a) Yes, for both |       |        |  57.1
                         +-------+-------+
                      2  |     1 |     1  |     2
        b) Yes, only for |       |        |  28.6
                         +-------+-------+
                      4  |     1 |        |     1
        d) No            |       |        |  14.3
                         +-------+-------+
                  Column       5       2        7
                   Total    71.4    28.6    100.0

      Number of Missing Observations: 0

Tables

The "tables" procedure in SPSS produces high quality tables. Its features include complex crosstabulations, frequency counts, breakdowns, and statistical summaries of continuos as well as categorical data.

"Tables" can produce higher quality crosstabulation results than the "crosstabs" procedure. To produce a crosstabulation of question 1 with question 4 using the "tables" procedure, we would add this code to our program:

 

tables

     format = dbox light margins(1,80) cwidth(15,6)
    /table = q1 by q4
    /statistics = count cpct('pct':q4)
    /ttitle = 'Results of question 1 by question 4 sex'

The "format" subcommand instructs SPSS how to print the table. We request that margins of the table be between columns 1 and 80. "dbox" asks for dissertation boxing, where there are lines under the headings of the table but none in the body of the table. "light" requests light, non-bold output.

The "table" subcommand indicates which variables are in the table. The "by" keyword separates those variable down the page (the stub) from those variables across the page (the banner). Here we request question 1 to be in the stub dimension and question 4 to be in the banner dimension.

The "statistics" subcommand indicates which statistics are to be printed. "count" requests the frequency count, as in "crosstabs". "cpct" requests the count percent. In parentheses we specify a label for the cpct statistic, 'percent'. The ":q4" instructs SPSS to calculate the percents down the columns of question 4. This will result in values that add to 100% in each column, so that we can compare males to females.

 

The "ttitle" subcommand prints a title at the top of the page.

The resulting table is shown below:

 

                 Results of question 1 by question 4 sex

                 ___________________________________________
                 -------------------------------------------
                                          4. Sex
                                 ---------------------------
                                   a) Male      b) Female
                                 --------------------------
                                 Count  pct   Count   pct
                 -------------------------------------------
                 1. Center at
                    Landfill
                 a) Yes, for
                    both
                    newspapers
                    and cans        3   60.0%     1   50.0%
                 b) Yes, only
                    for
                    newspapers      1   20.0%     1   50.0%
                 d) No              1   20.0%
                 -------------------------------------------

A more complex table would present the results of question 1, 2, and 3 by question 4 in a single table.

 

tables
       format = dbox light margins(1,80) cwidth(15,6)
      /ftotal = total
      /table = q1+q2+q3 by q4 + total
      /statistics = count cpct('pct':q4)
      /ttitle = 'Results of questions 1, 2, and 3 by sex'

Here we have added a "ftotal" subcommand and changed the "table" subcommand. The "ftotal" subcommand indicates there will be a total variable named "total" following the detail of the table.

In the "table" subcommand, remember from the previous example that the word "by" keyword separates the stub from the banner. To the left of the word "by", we have "q1+q2+q3". The plus is the concatenation symbol. This asks for the results of q1 to be followed by the results of q2, then the results of q3. To the right of the word "by" we have "q4+total". This asks for the banner to contain the variable q4, followed by the total variable "total" defined in the "ftotal" subcommand.

Here is what the resulting table would look like:

 

                    Results of questions 1, 2, and 3 by sex

          _________________________________________________________
          ---------------------------------------------------------
                                   4. Sex                TOTAL
                          ----------------------------------------
                             a) Male     b) Female   Count   pct
                          --------------------------
                          Count  pct   Count   pct
          ---------------------------------------------------------
          1. Center at
             Landfill
          a) Yes, for
             both
             newspapers
             and cans         3  60.0%     1   50.0%     4   57.1%
          b) Yes, only
             for
             newspapers       1  20.0%     1   50.0%     2   28.6%
          d) No               1  20.0%                   1   14.3%

          2. Center at
             Mall
          a) Yes, for
             both
             newspapers
             and cans         2  40.0%     1   50.0%     3   42.9%
          b) Yes, only
             for
             newspapers       1  20.0%     1   50.0%     2   28.6%
          c) Yes, only
             for cans         1  20.0%                   1   14.3%
          d) No               1  20.0%                   1   14.3%

          3. Willing to
             pay extra
          a) Yes              4  80.0%     1   50.0%     5   71.4%
          b) No               1  20.0%     1   50.0%     2   28.6%
          ---------------------------------------------------------

Walkthrough

Here is a the analysis of the sample data when analysis with SSPS using the techniques discussed above. Sign on to the DEC Alpha, and create a data file. Call it "RECYCLE.DAT". From the DCL prompt, type:

$ EDIT RECYCLE.DAT

Once in the editor, enter the following lines:

001 1 2 1 1 4

002 1 1 1 2 3

003 4 4 2 1 2

004 2 3 1 1 4

005 1 1 1 1 1

006 1 1 1 1 4

007 2 2 2 2 1

Save the file by pressing the <Do> key and typing EXIT.

Now create the SPSS program. Let us call it "RECYCLE1.SPS". From the DCL prompt, type:

$ EDIT RECYCLE.SPS

Once in the editor, enter the following lines:

 

/* RECYCLE.SPS SPSS Program written by Mary Howard.
/* This program does the analysis of a survey on recycling. 
/* Written to demonstrate the use of SPSS.

data list
      file = 'recycle.dat'
     /id 1-3 q1 5 q2 7 q3 9 q4 11 q5 13

value labels 
      q1 0  'No Response'  
           1  'a) Yes, for both newspapers and cans'
           2  'b) Yes, only for newspapers'
           3  'c) Yes, only for cans'
           4  'd) No'
     /q2 0  'No Response'
           1  'a) Yes, for both newspapers and cans'
           2  'b) Yes, only for newspapers'
           3  'c) Yes, only for cans'
           4  'd) No'
     /q3 0  'No Response'
           1  'a) Yes'
           2  'b) No'
           3  'c) Unsure'
     /q4 0  'No Response'
           1  'a) Male' 
           2  'b) Female'
     /q5 0  'No Response'
           1  'a) $0 to $10,000'
           2  'b) $10,001 to $20,000'
           3  'c) $20,001 to $30,000'
           4  'd) over $30,000'

 

variable labels
      q1 '1. Center at Landfill'
     /q2 '2. Center at Mall'
     /q3 '3. Willing to pay extra'
     /q4 '4. Sex'
     /q5 '5. Family Income'

 

list
     variables = all

frequencies
      variables = q1 q2 q3 q4 q5

crosstabs
      table = q1 by q4

tables
     format = dbox light margins(1,80) cwidth(15,6)
    /table = q1 by q4
    /statistics = count cpct('pct':q4)
    /ttitle = 'Results of question 1 by question 4 sex'

 

tables
     format = dbox light margins(1,80) cwidth(15,6)
    /ftotal = total
    /table = q1+q2+q3 by q4 + total
    /statistics = count cpct('pct':q4)
    /ttitle = 'Results of questions 1, 2, and 3 by sex'

Save the file by pressing the <Do> key and typing EXIT.

Now run the program. From the DCL prompt, type:

$ SPSS RECYCLE

If the program runs successfully, the following should appear:

End of job: 68 command lines 0 errors 0 warnings 2 CPU seconds

Now look at the output.

$ look recycle.lis

You can use <Next Screen> , <Prev Screen> , and the arrow keys to look at the output. To return to the DCL prompt, type "Q".

Then print the output to the printer. Type this:

$ PRINT RECYCLE.LIS

Once it is printed you will get a message on your screen, like this:

Job RECYCLE (queue SYS$PRINT,entry 297) completed

You can then logout and go to the I/O window in 19 Business Building to pick up your output.

Your rating: None Average: 3.3 (4 votes)