STA 529 2.0 Data Mining
Week 1: June 29, 2024
Week 2: July 7, 2024
Data
library(mlbench)
library(BostonHousing)
Practical 1: Working with missing data
Practical 2: Classification
Week 3: July 14, 2024
Preactical 3: Introduction to tidymodels
Extra reading: Click here
Week 4: August 11, 2024
Week 5: August 18, 2024
Week 6: August 25, 2024
Association rule:
Example 1: Lab 5.1
Example 2: Lab 5.2
Example 3:
library(arules)
data(Groceries)
Example 3: Lab 5.3
Lab 6:
This question is based on “house.datamining2023.csv”. The dataset is saved on your local machine.This dataset contains information related to housing characteristics in various geographical locations.
The variable description is as follows:
longitude - The geographical coordinate specifying the east-west position of a location.
latitude - The geographical coordinate specifying the north-south position of a location. -
housing_median_age - The median age of houses in a specific area.
total_rooms - The total number of rooms in all housing units in a specific area.
total_bedrooms - The total number of bedrooms in all households in a specific area.
population - The total population of a specific area.
households - The total number of households in a specific area.
median_income - The median income of households in a specific area.
median_house_value - The median value of houses in a specific area.
ocean_proximity - The proximity of the housing unit to the ocean, categorized into different classes.
Week 7: September 1, 2024
Week 8: September 15, 2024
Week 9: October 6, 2024
Mid semester project presentation and discussing errors
Week 10: October 13, 2024
Dataset: https://github.com/rfordatascience/tidytuesday/tree/master/data/2020/2020-02-11#data-dictionary
Develop a model to predict which actual hotel stays included children and/or babies
library(readr)
<-
hotels read_csv("https://tidymodels.org/start/case-study/hotels.csv")