Note that each class has the same width and that each observation in
the data set is distributed into one and only one class. The frequency
of each class is the number of observations that lie in each class. The
data can now be displayed using a histogram.
Note that the values of the observations are graphed on the horizontal
axis and the frequency of each class is graphed on the vertical axis. The
histogram gives a picture of how the data is distributed. The higher the
bar the more data is in the class.
2) There are three different ways to determine the
"center" or most "typical" value of a data set. They
are the mean, median, and mode.
Consider the data set: 3, 5, 1, 14, and 7
Mean – The average of the observations in a
data set.

Median – The "middle" value of the data set. To
find it, first put the data in order: 1, 3, 5, 7, 14. The median is the
observation that lies in the middle, in this case 5. Consider the data: 1,
3, 5, 7, 14, 23. Now the median is the average of 5 and 7 which is 6.
Mode – The most frequent observation in the data set. There
is no mode in the data set above since each observation occurs only once.
The mode of the data set: 3, 5, 7, 1, 5, 3, 5, 14 would be 5.
3) Data that has a bell shape like that below is said to have a normal
distribution. The normal distribution is very important and many
sets of data encountered in the real world resemble a normal distribution
when displayed graphically. For a normal distribution the mean, median,
and mode are equal. The standard deviation (st
dev) is a measure of how spread out the curve is around the mean. The
larger the st dev the more spread out the curve.
The values –1, -2, -3 and 1, 2, 3 are called z-scores
and stand for one, two, or three st dev below or above the mean,
respectively. If the curve above had mean of 55 and st dev of 5, then we
could label the curve with actual observation values instead of z-scores
as follows:
We would say, for example, 45 has a z-score of –2 since it lies two
st dev below the mean and 60 has a z-score of 1 since it lies one st dev
above the mean. The mean has a z-score of zero since it is zero st dev
from the mean. The z-score of an observation such as 42 is not so clear.
It is clear, however, that 42 lies between 2 and 3 st dev below the mean
and thus it makes since that the z-score should lie somewhere between –3
and –2. The exact value can be found using the following formula:

Thus the exact z-score for 42 will be:

which is between –3 and –2 as we predicted. Thus 42 lies 2.6 st dev
below the mean. Now suppose we would like to know what observation lies
1.7 st dev above the mean, that is, has a z-score of 1.7. Clearly the
value must be somewhere between 60 and 65. It can be found using the
following formula:
Thus
the observation will be:

These two formulas can be used to find a z-score when given an
observation and an observation when given a z-score.
4) For normal distributions the empirical rule says:
68% of the observations lie between 1 st dev below the mean and 1 st
dev above the mean.
95% of the observations lie between 2 st dev below the mean and 2 st
dev above the mean.
99.7% of the observations lie between 3 st dev below the mean and 3
st dev above the mean.
Consider the example of a normal distribution with a mean of 55 and a
st dev of 5.
The empirical rule states that 68% of the observations in the data set
will lie between 50 and 60, 95.4% of the observations will lie between 45
and 65, and 99.7% of the observations will lie between 40 and 70.
Memorizing the empirical rule will give you a good intuition of how
normal data is distributed. Suppose you observed an observation that lied
4 st dev above the mean. You should immediately realize that this is a
very rare observation since only 0.3% of the observations lie outside
three st dev from the mean.