8  Statistical Inference: Estimation

In this section we discuss how can we use sample data to estimate values of population parameters?

Point estimation and Interval estimation are the two forms of population parameter estimation based on sample data.

8.1 Point Estimation

Provides a single best guess for the population parameter.

The statistical properties of point estimators (Unbiased, Efficiency, Consistency) are out of the scope of this book.

Population Parameter Point Estimator
Population Mean Sample Mean
Population Variance Sample Variance
Population Proportion Sample Proportion

Note: Each point estimator provides the best single-value estimate of its corresponding population parameter, based on a random sample.

Question:

A zoologist collected data on the body weight (in kg) of 6 randomly selected adult cheetahs from a wildlife reserve.
The recorded weights are as follows:

42,\ 47,\ 39,\ 45,\ 44,\ 43

Additionally, out of these 6 cheetahs, 4 were identified as healthy based on veterinary examination.

Tasks:

  1. Estimate the population mean body weight of adult cheetahs.
  2. Estimate the population variance of body weight.
  3. Estimate the population proportion of healthy cheetahs.

8.2 Interval Estimation

While a point estimate gives a single best guess for a population parameter, it does not indicate how reliable that estimate is.
Interval estimation provides a range of plausible values within which the true population parameter is likely to lie.

The general form of an interval estimate is as follows:

The general form of a confidence interval (CI) is:

\text{Point Estimate} \ \pm\ \text{Margin of Error}

Confidence Interval for the Population Mean (\mu)

Condition Confidence Interval Formula Distribution Used
Population standard deviation \sigma known \bar{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}} Standard Normal z)
\sigma unknown, population normal \bar{x} \pm t_{\alpha/2,\,df} \frac{s}{\sqrt{n}} Student’s (t)-distribution (df = n-1)
\sigma unknown, large sample \bar{x} \pm t_{\alpha/2,\,df} \frac{s}{\sqrt{n}} Student’s (t)-distribution or Standard Normal distribution
\bar{x} \pm z_{\alpha/2} \frac{s}{\sqrt{n}}

Confidence Interval for the Population Proportion (\theta)

Condition Confidence Interval Formula Distribution Used
Large sample n\hat{p} \ge 5, n(1-\hat{p} $ \hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} Standard Normal
Small sample or extreme proportions Use exact (Clopper–Pearson) or Wilson interval methods Binomial-based

Example:

A zoologist is studying the body length (in cm) of a rare frog species in a rainforest. She randomly captures 10 frogs and measures their lengths:

Sample data (body lengths in cm):

7.8, 8.2, 7.5, 8.0, 7.9, 8.1, 7.6, 8.3, 7.7, 8.0

Assume body lengths are normally distributed. Construct a 95% confidence interval (CI) for the population mean.

Interpretation:

We are 95% confident that the true mean body length of this frog species in the rainforest lies between 7.73 cm and 8.09 cm.

  • This does not mean that 95% of the frogs have lengths in this range. It refers to the population mean.

  • If we repeated this sampling many times, 95% of the calculated confidence intervals would contain the true mean.

Rcode:

# Sample data (frog body lengths in cm)
frog_lengths <- c(7.8, 8.2, 7.5, 8.0, 7.9, 8.1, 7.6, 8.3, 7.7, 8.0)
# Compute 95% confidence interval using t-distribution
t_test_result <- t.test(frog_lengths, conf.level = 0.95)
# Display results
t_test_result

    One Sample t-test

data:  frog_lengths
t = 96.159, df = 9, p-value = 7.215e-15
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 7.723916 8.096084
sample estimates:
mean of x 
     7.91 
# Access CI directly
t_test_result$conf.int
[1] 7.723916 8.096084
attr(,"conf.level")
[1] 0.95

Question 1

A researcher measures the wing length (cm) of a random sample of 15 hummingbirds:

Data: 6.5, 6.7, 6.9, 6.4, 6.8, 6.6, 6.7, 6.5, 6.9, 6.8, 6.6, 6.7, 6.5, 6.8, 6.6

Construct a 95% confidence interval for the average wing length of hummingbirds. Assume that wing lengths are normally distributed.

Question 2

A zoologist studies a population of butterflies in a forest. She randomly captures 120 butterflies and finds that 78 of them have blue wings.

Estimate the population proportion of blue-winged butterflies with a 95% confidence interval. Assume the sample is random and large enough for the normal approximation.