STA 113 2.0 Descriptive Statistics

Measures of Spread

Dr. Thiyanga S. Talagala
Department of Statistics, Faculty of Applied Sciences
University of Sri Jayewardenepura, Sri Lanka

Which cricketer would you recommend for the upcoming match?

The marks for the last 10 matches are listed below:

Cricketer 1

100 85 86 79 80
80 80 88 75 87

Cricketer 2

150 0 1 298 2
10 0 250 0 129

Stem and Leaf Plot

Cricketer 1

75 79 80 80 80
85 86 87 88 100

  The decimal point is 1 digit(s) to the right of the |

   7 | 59
   8 | 0005678
   9 | 
  10 | 0

Cricketer 2

0 0 0 1 2
10 129 150 250 298

  The decimal point is 1 digit(s) to the right of the |

   0 | 00012
   1 | 0
   2 | 
   3 | 
   4 | 
   5 | 
   6 | 
   7 | 
   8 | 
   9 | 
  10 | 
  11 | 
  12 | 9
  13 | 
  14 | 
  15 | 0
  16 | 
  17 | 
  18 | 
  19 | 
  20 | 
  21 | 
  22 | 
  23 | 
  24 | 
  25 | 0
  26 | 
  27 | 
  28 | 
  29 | 8

dot plot

Bidwidth: 1 (Bins: 1, 2, 3,..)

Cricketer 1

75 79 80 80 80
85 86 87 88 100

Cricketer 2

0 0 0 1 2
10 129 150 250 298

Bins: 0-5, 5-10, …..

Cricketer 1

75 79 80 80 80
85 86 87 88 100

Cricketer 2

0 0 0 1 2
10 129 150 250 298

Individual value plot

Individual value plot

Measures of Dispersion

Range: Maximum - Minimum

Cricketer 1

75 79 80 80 80
85 86 87 88 100

\[\text{Range} = 100-75 = 25\]

Cricketer 2

0 0 0 1 2
10 129 150 250 298

\[\text{Range} = 298-0 = 298\]

Advantages

  • Easy measure

  • Easy to understand

Disadvatages

  • It only takes into account the maximum and the minimum value.

  • Highly sensitive to outliers.

  • Does not provide information about the spread of data between the minimum and maximum values, nor does it indicate whether the data points are clustered or evenly distributed.

Variance

  • Variance is the mean squared deviations from the mean.

  • Measure of the spread of the data around the mean.

Population Variance

\[\text{Population variance} = \sum_{i=1}^N \frac{(x_i-\mu)^2}{N}\]

\(N\) - population size

\(\mu\) - population mean

Sample Variance

\[\text{Sample variance} = \sum_{i=1}^n \frac{(x_i-\bar{x})^2}{n-1}\]

\(n\) - sample size

\(\bar{x}\) - sample mean

Compute sample variance.

Cricketer 1

75 79 80 80 80
85 86 87 88 100

Cricketer 2

0 0 0 1 2
10 129 150 250 298
10:00

Variance: Advatages and Disadvatages

Advantages:

  • Variance considers all data points in the dataset.

Disadvantages:

  • Sensitive to outliers/ extreme values.

  • The units of variance are the square of the units of the original data, which can make interpretation difficult. For example, if the data are in meters, the variance will be in square meters.

  • Variance is less intuitive to understand than other measures of dispersion like the range or interquartile range. People often find the concept of squared deviations harder to grasp.

Standard deviation

\[\text{Standard deviation} = \sqrt{Variance}\]

  • The variance and the standard deviation are measures of the spread of the data around the mean. They summarise how close each observed data value is to the mean value.

  • Standard deviation is expressed in the same units as the original values (e.g., minutes or meters)

Compute standard deviation.

Cricketer 1

75 79 80 80 80
85 86 87 88 100

Cricketer 2

0 0 0 1 2
10 129 150 250 298
02:00

In datasets with a small spread all values are very close to the mean, resulting in a small variance and standard deviation.

Your turn

If the variance of a dataset is 0, what can you conclude about the values within the dataset?

02:00

Next week

  • Other measures of central tendency and measures of dispersion

  • Interpretation of measurements

  • Other methods of visualizing numerical data