STA 113 2.0 Descriptive Statistics

Relationships Between Numerical Variables

Dr. Thiyanga S. Talagala
Department of Statistics, Faculty of Applied Sciences
University of Sri Jayewardenepura, Sri Lanka

Scatter plot

  • Allows us to visually see how two variables related to each other.

Figure 1: Scatter pllot of body mass vs flipper length

Measures of Association: Covariance

We can quantify how two variables move together by a summary measure called the covariance.

The sample covariance of two variables, \(X\) and \(Y\), is given by the formula:

\[Cov(X, Y) = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})\]

What is the covariance of \(X\) with \(X\)?

Covariance

The problem with using the covariance to measure the relationship between two quantitative variables is that we can only interpret the direction of the relationships, not the strength of it.

Measures of association: Pearson’s Product Moment Correlation Coefficient (\(r\))

\[Corr(X, Y)=\frac{\sum_{i=1}^n(X_i-\bar{X})(Y_i-\bar{Y})}{\sqrt\sum_{i=1}^n(X_i-\bar{X})\sum_{i=1}^n(Y_i-\bar{Y})}\]

We can show that

\[Corr(X, Y) = \frac{Cov(X, Y)}{S_xS_y}\]

Interpretations

r_value interpretation
r = 1 Perfect positive linear correlation
1 > r ≥ 0.8 Strong positive linear correlation
0.8 > r ≥ 0.4 Moderate positive linear correlation
0.4 > r > 0 Weak positive linear correlation
r = 0 No correlation
0 > r ≥ -0.4 Weak negative linear correlation
-0.4 > r ≥ -0.8 Moderate negative linear correlation
-0.8 > r > -1 Strong negative linear correlation
r = -1 Perfect negative linear correlation

Scatter plot matrix

Pearson’s correlation coefficient = 0

Which plot has the highest correlation coefficient?