STA 113 2.0 Descriptive Statistics
Relationships Between Numerical Variables
Dr. Thiyanga S. Talagala
Department of Statistics, Faculty of Applied Sciences
University of Sri Jayewardenepura, Sri Lanka
Scatter plot
- Allows us to visually see how two variables related to each other.
Figure 1: Scatter pllot of body mass vs flipper length
Measures of Association: Covariance
We can quantify how two variables move together by a summary measure called the covariance.
The sample covariance of two variables, \(X\) and \(Y\), is given by the formula:
\[Cov(X, Y) = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})\]
What is the covariance of \(X\) with \(X\)?
Covariance
The problem with using the covariance to measure the relationship between two quantitative variables is that we can only interpret the direction of the relationships, not the strength of it.
Measures of association: Pearson’s Product Moment Correlation Coefficient (\(r\))
\[Corr(X, Y)=\frac{\sum_{i=1}^n(X_i-\bar{X})(Y_i-\bar{Y})}{\sqrt\sum_{i=1}^n(X_i-\bar{X})\sum_{i=1}^n(Y_i-\bar{Y})}\]
We can show that
\[Corr(X, Y) = \frac{Cov(X, Y)}{S_xS_y}\]
Interpretations
r = 1 |
Perfect positive linear correlation |
1 > r ≥ 0.8 |
Strong positive linear correlation |
0.8 > r ≥ 0.4 |
Moderate positive linear correlation |
0.4 > r > 0 |
Weak positive linear correlation |
r = 0 |
No correlation |
0 > r ≥ -0.4 |
Weak negative linear correlation |
-0.4 > r ≥ -0.8 |
Moderate negative linear correlation |
-0.8 > r > -1 |
Strong negative linear correlation |
r = -1 |
Perfect negative linear correlation |
Scatter plot matrix
Pearson’s correlation coefficient = 0
Which plot has the highest correlation coefficient?