05:00
Summary Measures (Cont.)
The variance of a sample of n
observations \(x_1, x_2, x_3,..,x_n\) having mean \(\bar{x}\) is defined as
\[s^2 = \frac{\sum_{i=1}^n(x_i - \bar{x})^2}{n-1}\]
Alternatively, the sample variance can be written in the following forms:
\[s^2 = \frac{1}{n-1}[\sum_{i=1}^n x_i^2 - \frac{(\sum_{i=1}^nx_i)^2}{n}]\]
or
\[s^2 = \frac{1}{n-1}[\sum_{i=1}^nx_i^2-n\bar{x}]\]
Quartiles
Percentiles
Quantiles are descriptive measures that split the ordered data into four quarters (four equal parts).
Q1 - first (lower) quantile
Q2 - second (middle) quantile
Q3 - third (upper) quantile
First quantile
The value which 25% of the observations are smaller and 75% are larger
\[Q_1 = \frac{n+1}{4} \text{ ordered observation}\]
Second quantile
Same as median
Third quantile
The value for which 75% of the observations are smaller and 25% are larger
\[Q_3 = \frac{3(n+1)}{4} \text{ ordered observation}\]
If the resulting positioning point is an integer, take the particular value corresponding to that positioning point.
For a non-integer position \(p\), let \(k\) be the integer part and \(d\) be the fractional part (e.g., for \(p=2.75\), \(k=2\) and \(d=0.75\))
\[Q_q = ((1-d) \times \text{ value at position }k )+ (d \times \text{ value at position } (k+1))\]
Your turn
Find quantiles for
1,3,4,6,7,8,10,12,14,15
05:00
first decile = 10th percentile
Q1 = 25th percentile
Q2 = 50th percentile
Q3 = 75th percentile
ninth decile = 90th percentile
Location of a percentile
The following formula allows us to approximate the location of any percentile.
\[L_p = (n+1)\frac{p}{100}\]
where \(L_p\) is the location of the \(p^{th}\) percentile.
5, 5, 10, 12, 13, 14, 17, 19, 27, 38
Find \(L_{25}\), \(L_{50}\) and \(L_{75}\).
05:00
\[IQR = Q_3 - Q_1\]
Measure considers the spread in the middle 50% of the data.
Not influenced by extreme values.
1 | 4 | 8 | 9 | 11 | 5 | 4 | 3 | 2 | 20 |
3 | 7 | 8 | 10 | 2 | 6 | 7 | 2 | 20 | 30 |
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 3.00 6.50 8.10 9.25 30.00
Let’s draw the box and whisker plot
s1 | s2 | s3 | s4 |
---|---|---|---|
-0.4941825 | 0.9915941 | 1.2651757 | -0.4547021 |
-0.6687905 | 0.7621325 | 0.4623908 | -0.5227580 |
-0.5469909 | 0.4472162 | 0.1096497 | -0.9385280 |
-0.6231902 | 0.9652850 | 1.6850990 | -1.0626077 |
1.4458192 | 0.7953235 | 0.1734091 | -0.7810612 |
1.4321517 | 0.9403680 | 0.2576630 | -0.9400142 |
-1.7834318 | 0.2037412 | 0.9451751 | -0.4921403 |
-0.1310000 | 0.5811571 | 0.0841218 | -0.9899774 |
-1.1957376 | 0.5004076 | 0.9543987 | -0.6680712 |
0.4298630 | 0.6224225 | 0.2428342 | -0.2485180 |
0.6814681 | 0.3328808 | 0.8645305 | -0.7541138 |
-0.7584425 | 0.6059351 | 0.8742853 | -0.6532270 |
0.4903058 | 0.5572273 | 0.0927645 | -1.1676951 |
-0.0359587 | 0.9757127 | 1.4798731 | -1.0630715 |
-0.4126305 | 0.3127026 | 0.6411980 | -0.9250913 |
-0.6958339 | 0.8282239 | 0.6666132 | -1.2498425 |
1.1344751 | 0.5325348 | 0.1791371 | -0.7756105 |
-0.2479113 | 0.2439792 | 0.5884629 | -0.8957835 |
0.3880169 | 0.0597912 | 0.5377590 | -1.2037738 |
-0.6955792 | 0.7714712 | 0.7566644 | -0.3658032 |
-1.2602048 | 0.8607332 | 0.0237823 | -0.8675267 |
-0.4121627 | 0.9078947 | 0.3648342 | -1.3747030 |
-1.0600587 | 0.5853123 | 0.0991288 | -0.9682996 |
-0.6901896 | 0.8314007 | 0.1282657 | -0.8799173 |
-0.1512943 | 0.7785955 | 0.3227742 | -1.3436420 |
-0.8817142 | 0.2555937 | 0.2120927 | -0.8572044 |
2.3139477 | 0.3352818 | 3.4199927 | -0.7016565 |
2.6254204 | 0.9125926 | 0.3246409 | -1.2752869 |
0.0997495 | 0.1433173 | 0.5174605 | -0.3899073 |
-0.6003863 | 0.5621560 | 0.8610009 | -0.8642701 |
0.7316799 | 0.7373602 | 0.2427046 | -1.1415947 |
0.5253369 | 0.4902503 | 2.0961923 | -0.7712001 |
0.2979311 | 0.4586207 | 0.2769770 | -0.1074875 |
s1 | s2 | s3 | s4 | |
---|---|---|---|---|
34 | 0.1707726 | 0.9417774 | 0.5032580 | -0.6044107 |
35 | -0.3198814 | 0.8077437 | 0.0181263 | -1.2421812 |
36 | -0.8186460 | 0.6444308 | 6.8161623 | -1.1468929 |
37 | -0.0072054 | 0.8448965 | 0.3052239 | -0.8844985 |
38 | -0.4512637 | 0.3424624 | 1.3661048 | -1.3386497 |
39 | -0.1925807 | 0.5625122 | 0.9218917 | -0.8877711 |
40 | 2.2657184 | 0.6736529 | 0.2075009 | -0.4708897 |
41 | -0.9951849 | 0.8909363 | 5.0017890 | -0.9994745 |
42 | -0.2856625 | 0.8033804 | 0.0968602 | -0.7467907 |
43 | 0.2191878 | 0.1782850 | 1.5306568 | -0.7682978 |
44 | -1.0367488 | 0.8343284 | 0.3886233 | -1.3388111 |
45 | 1.1718172 | 0.2599477 | 2.9639756 | -0.7148029 |
46 | 0.1918960 | 0.5094981 | 0.5536213 | -0.7586514 |
47 | 0.5286750 | 0.7987479 | 0.4308663 | -0.9896018 |
48 | 1.5910981 | 0.6943393 | 0.6255345 | -1.1681160 |
49 | -1.1722861 | 0.9982673 | 0.4587839 | -0.9371935 |
50 | 0.1934595 | 0.6716770 | 0.3116617 | -1.1103634 |
51 | -1.4356298 | 0.9698931 | 5.7776384 | 0.8623060 |
52 | -0.7890743 | 0.5939492 | 0.0224797 | 1.2137335 |
53 | -0.6106055 | 0.0478229 | 3.0745683 | 0.9983251 |
54 | -2.3119511 | 0.5605901 | 0.8132757 | 0.6957828 |
55 | 0.8667858 | 0.3635374 | 0.2700010 | 0.6795177 |
56 | 0.4041022 | 0.7116650 | 0.6794256 | 0.8919491 |
57 | 2.0842797 | 0.6958857 | 0.3890802 | 1.4396962 |
58 | -1.6350715 | 0.4293465 | 0.9316933 | 1.5557211 |
59 | -1.0041166 | 0.7481887 | 2.2529575 | 0.9289430 |
60 | -0.1833006 | 0.8444459 | 0.3561461 | 1.2954135 |
61 | -0.6370463 | 0.6549463 | 1.5700440 | 1.4239294 |
62 | 0.7891126 | 0.9183704 | 3.4331633 | 0.9424039 |
63 | -1.9343009 | 0.8069813 | 1.7880099 | 1.0407747 |
64 | 1.0142296 | 0.9055865 | 0.5375650 | 0.0505981 |
65 | 1.9339288 | 0.2774429 | 0.4135352 | 1.1525336 |
66 | 0.0475422 | 0.9947034 | 1.3488308 | 1.2897422 |
67 | 0.2263924 | 0.8236884 | 0.8586223 | 0.4717407 |
s1 | s2 | s3 | s4 | |
---|---|---|---|---|
68 | -0.4959669 | 0.5652228 | 1.1851333 | 1.1189474 |
69 | 0.8229532 | 0.8178838 | 0.1777389 | 0.1525808 |
70 | 0.9298399 | 0.9716167 | 2.0837673 | 1.1283459 |
71 | -0.8396403 | 0.6955667 | 1.1255832 | 0.8254749 |
72 | -2.6643294 | 0.5633113 | 2.6622062 | 0.4479653 |
73 | 1.6035095 | 0.3289823 | 0.7328378 | 1.1938157 |
74 | -0.7712137 | 0.6223501 | 3.6241769 | 1.0612159 |
75 | -0.3840300 | 0.3932550 | 0.1783265 | 0.9074435 |
76 | 0.4037572 | 0.8684061 | 0.0584704 | 0.7707421 |
77 | -0.2177293 | 0.7424245 | 1.3926143 | 0.7044549 |
78 | -0.0787400 | 0.8584837 | 0.1558730 | 0.7731302 |
79 | -1.3780882 | 0.4996357 | 0.5496975 | 1.1174710 |
80 | -0.4246498 | 0.9272462 | 0.0421114 | 1.0224212 |
81 | 0.3751303 | 0.2122582 | 0.0489879 | 1.2401247 |
82 | 1.0816647 | 0.8919946 | 5.6415263 | 1.1950053 |
83 | -0.1589649 | 0.4317767 | 0.2723074 | 0.6052308 |
84 | 0.2382120 | 0.1271924 | 0.0322854 | 0.6628394 |
85 | 0.7576892 | 0.7173759 | 0.5761504 | 0.7795326 |
86 | -0.9829056 | 0.3897945 | 0.4854600 | 0.8800581 |
87 | 0.3364830 | 0.4760740 | 2.5864094 | 1.1548429 |
88 | -0.2949718 | 0.9238953 | 1.3136616 | 0.5118688 |
89 | 0.7969451 | 0.6609144 | 7.4977078 | 1.2616219 |
90 | -0.6483374 | 0.3351259 | 1.1174526 | 0.1745040 |
91 | -0.3334201 | 0.9155915 | 0.2567393 | 1.0479585 |
92 | -0.9726304 | 0.6654774 | 0.3236723 | 0.8446453 |
93 | -0.8009399 | 0.2512914 | 1.4986115 | 0.4152249 |
94 | -1.2797944 | 0.9317872 | 2.0137554 | 0.8929735 |
95 | -1.0862999 | 0.5974472 | 0.4877782 | 0.3485280 |
96 | -1.1646197 | 0.9956436 | 0.0835399 | 0.6462357 |
97 | -0.0201359 | 0.7754757 | 0.3588838 | 1.6385065 |
98 | -0.4628470 | 0.6599176 | 2.7117010 | 0.7595622 |
99 | -0.4063630 | 0.4857695 | 0.5379405 | 0.8379056 |
100 | -0.0119749 | 0.8521035 | 1.3446080 | 0.5883827 |
Histogram
Box and whisker Plot
Prices of Chocolate in Rupees (LKR):
50, 75, 100, 125, 150, 175, 200, 225, 250, 275
Prices of Chocolate in USD:
5, 7, 9, 12, 15, 18, 20, 23, 26, 0
Which group of chocolate prices exhibits the highest variation?
Relative measure of variation
It always expressed as a percentage rather than in terms of the units of the particular data.
This is useful when comparing two or more sets of data that are measured in the different units.
\[CV = \frac{s}{\bar{x}}\times 100\%\]
The coefficient of variation of the height of 30 people selected at random from a given village is found to be 15%. The mean weight of the selected group is 72 kg and a standard deviation 8 kg.
The obtained results show that
the weight is more variable than height.
the weight is less variable than height.
height and weight have the same degree of variation.
height and weight values are identical.
In-class diagram:
Skewness describes the degree and direction of asymmetry in the data.
Formula used to calculate skewness in Excel:
\[\text{Skewness} = \frac{n}{(n-1)(n-2)}\sum_{i=1}^{n}(\frac{X_i - \bar{X}}{8})^3\]
Skewness is the degree of asymmetry of a distribution.
If the frequency distribution has a longer “tail” to the right of the central maximum than to the left, the distribution is said to be skewed to the right (or to have a positive skewness).
If the reverse is true, it is said to be skewed to the left (or to have a negative skewness)
Pearson’s first coefficient of skewness
\[\text{Skewness} = \frac{\text{mean}-\text{mode}}{\text{standard deviation}}\]
or
Pearson’s second coefficient of skewness
To avoid using the mode
\[\text{Skewness} = \frac{3(\text{mean}-\text{median})}{\text{standard deviation}}\]
\(Mean = Mode = Median\), then the coefficient of skewness is zero for symmetrical distribution.
\(Mean > Mode\), then the coefficient of skewness will be positive.
\(Mean < Mode\), then the coefficient of skewness will be negative.
Karl person`s coefficient of skewness has a positive sign for the positively skewed and a negative sign for the negatively skewed.
Using Karl Pearson’s formula we can show
\(3 (median) = mode + 2 mean\)
\(Mean = Median = Mode\)
\(Mean > Median > Mode\)
\(Mean < Median < Mode\)
Degree of peakness of a distribution
Usually taken relative to a normal distribution
Type | Kurtosis | Excess Kurtosis |
---|---|---|
Mesokurtic | =3 | =0 |
Leptokurtic | >3 | >0 |
Platykurtic | <3 | <0 |
Excess Kurtosis = Kurtosis - 3
Find kurtosis and excess kurtosis formula
species | Mean | Median | Mode | SD | Q1 | Q2 | Q3 | Kurtosis | Skewness |
---|---|---|---|---|---|---|---|---|---|
Adelie | 190.1027 | 190 | 190 | 6.521825 | 186 | 190 | 195.0 | 3.327738 | 0.0795785 |
Chinstrap | 195.8235 | 196 | 187 | 7.131894 | 191 | 196 | 201.0 | 2.956063 | -0.0092622 |
Gentoo | 217.2353 | 216 | 215 | 6.585431 | 212 | 216 | 221.5 | 2.322009 | 0.3640858 |
The distributions of flipper lengths for each species appear to be roughly symmetric.
Gentoo penguins have longer flippers than both Adélie and Chinstrap penguins, which have similar flipper lengths.
There are two notable outliers in the flipper lengths for Adélie penguins, which are visible in the boxplot.