3 Introduction to Time Series Forecasting

3.1 Notation

\hat{Y}_{T+h|T} - The forecast of the time series 𝑌 at time T+h, made using the information available up to time T.

3.2 Simple time series forecasting techniques

Average method
Naive method/ random walk method
Seasonal naive method
Drift method

3.3 Example: Electricity Demand Forecasting

library(fable)
library(fpp2)

aelec <- window(elec, start=1980)
autoplot(aelec)

# Plot some forecasts
autoplot(aelec) +
  autolayer(meanf(aelec, h=11),
    series="Mean", PI=FALSE) +
  autolayer(naive(aelec, h=11),
    series="Naïve", PI=FALSE) +
  autolayer(snaive(aelec, h=11),
    series="Seasonal naïve", PI=FALSE) +
  ggtitle("Forecasts from Mean, NAIVE and SNAIVE") +
  xlab("Year") + ylab("Value") +
  guides(colour=guide_legend(title="Forecast"))

3.4 Time Series and Stochastic Processes

The terms stochastic processes and time series are closely related but not the same.

A is a collection of random variables indexed by time (or space).

\{X_t : t \in T\},

where T is the index set (e.g., discrete or continuous time).

A is a single realization (observed data) of a stochastic process. It is the actual sequence of observations collected over time.

In short:

Stochastic process = model/theory (all possible sequences). The probability mechanism (all possible paths).

Time series = observed data (one sequence). One observed path (the single trajectory we actually have).

3.5 Statistical Properties

Mean function

Let {X_1, X_2, ...} be a sequence of time index random variables.

The mean function of {X_t} is

\mu_X(t)=E(X_t).

Covariance function

The covariance function of {X_t} is

\gamma_X(r, s)=Cov(X_r, X_s)=E[(X_r-\mu_X(r))(X_s-\mu_X(s))]

for all integers (r) and (s).

Autocovariance function

The autocovariance function of {X_t} at lag (h) is defined by \gamma_X(h):=\gamma_X(h, 0)=\gamma(t+h, t)=Cov(X_{t+h}, X_t).

The autocovariance function of {X_t} at lag (h) is

\gamma_X(h)=Cov(X_{t+h}, X_t).

Autocorrelation function

The autocorrelation function of {X_t} at lag (h) is

\rho_X(h)=\frac{\gamma_X(h)}{\gamma_X(0)}=Cor(X_{t+h}, X_t).

3.6 Weakly stationary

A time series {X_t} is called weakly stationary if

\mu_X(t) is independent of t.
Var(X_t) = \sigma^2, Variance is constant. 2
\gamma_X(t+h, t) is independent of (t) for each (h). The autocovariance depends only on the lag (\gamma(h) depends only on how far apart two points are (h), and not on the actual time t.

In other words the statistical properties of the time series (mean, variance, autocorrelation, etc.) do not depend on the time at which the series is observed, that is no trend or seasonality. However, a time series with cyclic behaviour (but with no trend or seasonality) is stationary.

3.7 Strict stationarity of a time series

A time series \{X_t\} is called strictly stationary if the random vector [X_1, X_2..., X_n] and [X_{1+h}, X_{2+h}..., X_{n+h}] have the same joint distribution for all integers (h) and (n > 0).

3.8 1. independent and identically distributed (iid) noise

no trend or seasonal component
observations are independent and identically distributed (iid) random variables with zero mean.
Notation: {X_t} \sim IID(0, \sigma^2)
plays an important role as a building block for more complicated time series.

3.9 2. White noise

If {X_t} is a sequence of uncorrelated random variables, each with zero mean and variance \sigma^2, then such a sequence is referred to as white noise.

3.10 Every (IID(0, \sigma^2) sequence is (WN(0, \sigma^2) but not conversely. Why?

1. White Noise (WN)

A sequence \{X_t\} is called white noise with mean 0 and variance \sigma^2, written WN(0, \sigma^2), if:

\mathbb{E}[X_t] = 0 for all t.
\mathrm{Var}(X_t) = \sigma^2 for all t.
\mathrm{Cov}(X_t, X_s) = 0 for all t \neq s (uncorrelated across time).

Notice: uncorrelated \neq independent.

2. i.i.d. (0, \sigma^2)

A sequence \{X_t\} is IID(0, \sigma^2) if:

\mathbb{E}[X_t] = 0.
\mathrm{Var}(X_t) = \sigma^2.
X_t are independent and identically distributed.

3. Why every IID(0, \sigma^2) is WN(0, \sigma^2)

Independence \;\Rightarrow\; zero correlation.
So, an i.i.d. sequence automatically satisfies the white noise conditions (same mean, same variance, no correlation).

Therefore:

IID(0, \sigma^2) \;\;\Rightarrow\;\; WN(0, \sigma^2).

4. Why not conversely?

The reverse is not always true, because white noise only requires uncorrelatedness, not full independence.

That means a sequence could be white noise but still have dependence in higher moments (nonlinear dependence).

3.11 5. Example of WN but not IID

Let \{Z_t\} be i.i.d. N(0,1). Define

X_t = Z_t \cdot Z_{t-1}.

Then:

\mathbb{E}[X_t] = 0,
\mathrm{Var}(X_t) = 1,
For t \neq s, \mathrm{Cov}(X_t, X_s) = 0. ✅ So it’s white noise.

But the sequence is not independent (because X_t depends on Z_{t-1}, which also appears in X_{t-1}).

Thus,

X_t \sim WN(0,1) \quad \text{but not} \quad IID(0,1).

3.12 Simulation example

IID series

set.seed(123)

# Parameters
n <- 200        # length of series
sigma <- 1      # standard deviation

# IID(0, sigma^2) ~ Normal(0, sigma^2)
iid_seq <- rnorm(n, mean = 0, sd = sigma)

# Quick check
mean(iid_seq)      # should be ~0

[1] -0.008570445

var(iid_seq)       # should be ~sigma^2

[1] 0.8895506

acf(iid_seq)       # autocorrelations ~ 0

White noise

set.seed(123)

n <- 200
Z <- rnorm(n, mean = 0, sd = 1)

# Construct WN but not IID
wn_not_iid <- Z[-1] * Z[-n]   # X_t = Z_t * Z_{t-1}, length n-1

# Quick check
mean(wn_not_iid)        # ~0

[1] -0.05650406

var(wn_not_iid)         # ~1

[1] 0.8196189

acf(wn_not_iid)         # uncorrelated -> ACF ~ 0

Side-by-side visualisation

3.13 3. Random walk

A random walk process is obtained by cumulatively summing iid random variables. If {S_t, t=0, 1, 2, ...} is a random walk process, then S_0 =0

S_1=0+X_1

S_2=0+X_1+X_2

...

S_t=X_1+X_2+...+X_t.

Question

Is {S_t, t=0, 1, 2, ...} a weak stationary process?

3.14 Identifying non-stationarity in the mean

Using time series plot
ACF plot
- ACF of stationary time series will drop to relatively quickly.
- The ACF of non-stationary series decreases slowly.
- For non-stationary series, the ACF at lag 1 is often large and positive.

3.15 Backshift notation:

BX_t=X_{t-1}

3.16 Ordinary differencing

The first-order differencing can be defined as

\nabla X_t = X_t-X_{t-1}=X_t-BX_t=(1-B)X_t where \nabla=1-B.

The second-order differencing

\nabla^2X_t=\nabla(\nabla X_t)=\nabla(X_t-X_{t-1})=\nabla X_t - \nabla X_{t-1}

\nabla X_t - \nabla X_{t-1}=(X_t-X_{t-1})-(X_{t-1}-X_{t-2})

In practice, we seldom need to go beyond second order differencing.

3.17 Seasonal differencing

Differencing between an observation and the corresponding observation from the previous year.

\nabla_mX_t=X_t-X_{t-m}=(1-B^m)X_t where (m) is the number of seasons. For monthly, (m=12), for quarterly (m=4).

For monthly series

\nabla_{12}X_t=X_t-X_{t-12}

3.18 Twice-differenced series

\nabla^2_{12}X_t=\nabla_{12}X_t-\nabla_{12}X_{t-1} \nabla_{12}X_t-\nabla_{12}X_{t-1}=(X_t-X_{t-12})-(X_{t-1}-X_{t-13}) If seasonality is strong, the seasonal differencing should be done first.

3.19 Deterministic trend vs Stochastic trend

Deterministic trend

Y_t = f(t) + \epsilon_t

where \epsilon_t \sim iid(0, \sigma^2), t = 1, 2, ...T

Mean of the process is time dependent, but the variance of the process is constant.

A trend is deterministic if it is a nonrandom function of time. A deterministic trend is a predictable, fixed function of time. If you know the form of the function, you can determine the trend exactly.

Stochastic trend

A stochastic trend is driven by random shocks that accumulate over time. A stochastic trend is driven by random shocks (also called innovations, disturbances, or error terms) that accumulate over time.

1. Random walk

Y_t = Y_{t-1} + \epsilon_t

Random walk has a stochastic trend.
Model behind naive method.

A trend is said to be stochastic if it is a random function of time.

2. Random walk with drift

Y_t = \alpha + Y_{t-1} + \epsilon_t

Random walk with drift has a stochastic trend and a deterministic trend.
Model behind drift method.

3.20 Random walk

\begin{aligned} Y_t &= Y_{t-1} + \epsilon_t \\ Y_1 &= Y_0 + \epsilon_1 \\ Y_2 &= Y_1 + \epsilon_2=Y_0 + \epsilon_1 + \epsilon_2\\ Y_3 &= Y_2 + \epsilon_3=Y_0 + \epsilon_1 + \epsilon_2 +\epsilon_3\\ . \\ Y_t &=Y_{t-1} + \epsilon_t=Y_0 + \epsilon_1 + \epsilon_2 + \epsilon_3 +...+ \epsilon_t = Y_0 + \sum_{i=1}^{t} \epsilon_t \end{aligned}

Mean: E(Y_t) = Y_0.

Variance: Var(Y_t)=t \sigma^2.

3.21 Random walk with drift

\begin{aligned} Y_t &= \alpha + Y_{t-1} + \epsilon_t \\ Y_1 &= \alpha+Y_0 + \epsilon_1 \\ Y_2 &= \alpha+ Y_1 + \epsilon_2=2 \alpha+Y_0 + \epsilon_1 + \epsilon_2\\ Y_3 &= \alpha+ Y_2 + \epsilon_3= 3 \alpha+ Y_0 + \epsilon_1 + \epsilon_2 +\epsilon_3\\ . \\ Y_t &= \alpha+Y_{t-1} + \epsilon_t= t \alpha+ Y_0 + \epsilon_1 + \epsilon_2 + \epsilon_3 +...+ \epsilon_t \\ Y_t &= t \alpha + Y_0 + \sum_{i=1}^{t} \epsilon_t \end{aligned}

It has a deterministic trend (Y_0 + t \alpha) and a stochastic trend \sum_{i=1}^{t} \epsilon_t.

Mean: E(Y_t) = Y_0 + t\alpha

Variance: Var(Y_t) = t\sigma^2.

There is a trend in both mean and variance.

3.23 Remove seasonality

Take seasonal differencing

3.24 Example: Differencing on AirPassengers Data

The built-in AirPassengers dataset (monthly airline passengers, 1949–1960) has trend + seasonality.

# Load data
data("AirPassengers")
ts_data <- AirPassengers

#par(mfrow = c(3,2))

# 1. Original series
plot(ts_data, main = "Original Series", ylab = "Passengers")

acf(ts_data, main = "ACF: Original Series")

# 2. First difference (remove trend)
diff1 <- diff(ts_data, differences = 1)
plot(diff1, main = "1st Difference (Remove Trend)", ylab = "Difference")

acf(diff1, main = "ACF: 1st Difference")

# 3. Seasonal difference (lag = 12, remove seasonality)
diff_seasonal <- diff(diff1, lag = 12)
plot(diff_seasonal, main = "Seasonal Difference (Remove Seasonality)", ylab = "Difference")

acf(diff_seasonal, main = "ACF: Seasonal Difference")

# 3. Seasonal difference (lag = 12, remove seasonality from the original series)
diff_seasonal_only <- diff(ts_data, lag = 12)
plot(diff_seasonal_only, main = "Seasonal Difference (Remove Seasonality)", ylab = "Difference")

acf(diff_seasonal_only, main = "ACF: Seasonal Difference")

head(ts_data, 14)

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949 112 118 132 129 121 135 148 148 136 119 104 118
1950 115 126

head(diff1, 14)

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949       6  14  -3  -8  14  13   0 -12 -17 -15  14
1950  -3  11  15

head(diff_seasonal, 14)

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1950       5   1  -3  -2  10   8   0   0  -8  -4  12
1951   8  -6  13

head(diff_seasonal_only, 14)

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1950   3   8   9   6   4  14  22  22  22  14  10  22
1951  30  24

3.25 Notation: I(d)

Integrated to order d: Series can be made stationary by differencing d times.

Known as I(d) process.

Question: Show that random walk process is an I(1) process.

The random walk process is called a unit root process. (If one of the roots turns out to be one, then the process is called unit root process.)

3.26 Variance stabilization

Transform the series.

Eg:

Square root: W_t = \sqrt{Y_t}
Logarithm: W_t = log({Y_t})
- This very useful.
- Interpretable: Changes in a log value are relative (percent) changes on the original sclae.

log_ts <- log(ts_data)
plot(log_ts, main = "Log-Transformed Series", ylab = "log(Passengers)", col = "steelblue")