The process of discovering interesting patterns and knowledge from massive amounts of data.
What makes a pattern interesting?
Healthcare: Google Flue Trend (GFT) analysis project
Fraud Detection: Identifying fraudulent transactions by analyzing patterns
Retail: Understanding purchasing patterns to optimize product placement.
Telecommunication : Identifying customers likely to leave and targeting retention efforts (Churn Prediction)
Education: Student Performance Analysis by predicting student outcomes and identifying at-risk students.
Data mining is the core step in KDD process.
Data Preparation
Data Mining
Pattern/ Model Evaluation
Knowledge Presentation
Scaling data
Data reduction
Data discretization
Data aggregation
Improves data quality
Mask sensitive data
Improve completeness of data
Time-consuming
Require specialized skills and knowledge
Data loss
High cost
Structured, Semi-structures, Unstructured data
Spatial, Temporal, Spatio-temporal
Stored vs streaming data
Data mining tasks are generally divided into two major categories:
Predictive tasks
Descriptive tasks
In-class demo
In-class demo