1 Introduction to Data Mining
1.1 What is Data Mining?
• Process of discovering interesting patterns of knowledge from huge amounts of data.
1.2 What do we mean by interesting patterns?
• Interesting patterns: Valid, Novel, Useful, Understandable
Example
• Retailers collect data about customer purchases at the checkout counters
• Customer purchasing patterns: Identify which items are frequently sold together?
• Products that are likely to be purchased together.
Why it is useful?
• Can make a purchase suggestion to their customers
• Gives an idea that how we can arrange items in a store to as a strategy for boosting sales.
1.3 Characteristics of Big Data: 5 V’s of Big Data
Volume: size
Velocity: how quickly data is generated?
Variety: diversity
Veracity: quality of data
Value: how useful?
1.4 What motivates the development of data mining field?
• Scalability
• High dimensionality
• Heterogeneous and complex data
• Data ownership and distribution
1.5 Data Mining Tasks
Predictive tasks: Predict the value of a particular attribute based on the values of other attributes
Descriptive tasks: Find human-interpretable patterns that describe data
1.6 Data Quality
Range: How narrow or wide of the scope of these data?
Relevancy: Is the data relevant to the problem?
Recency: How recent the data is generated?
Robustness: Signal to noise ratio
Reliability: How accurate?
1.7 Applications
Web mining: recommendation systems
Screening images: Early warning of ecological disasters
Marketing and sales
Diagnosis
Load forecasting
Decision involving judgement
Many more…