| TID | Items |
|---|---|
| 1 | i1, i2, i5 |
| 2 | i2, i4 |
| 3 | i2, i3 |
| 4 | i1, i2, i4 |
| 5 | i1, i3 |
| 6 | i2, i3 |
| 7 | i1, i3 |
| 8 | i1, i2, i3, i5 |
| 9 | i1, i2, i3 |
Pattern Mining: Market Basket Analysis
Market Basket Analysis
Affinity analysis
Unsupervised learning
Frequent itemset mining: To discover which groups of products tend to be purchased together.
Basic concepts
Transaction dataset
Item set: Set of items
Suppose we have 100 items. Find the total number of itemsets.
Association rule
\[ Milk \Rightarrow Bread \text{ [Support = 2%, Confidence = 60%]}\]
IF (Antecedent)
THEN (Consequent)
Support and Confidence measures the strength of association between antecedent and consequent itemset.
Apriori algorithm
Desired support count: 2 (22%)
Desired confidence: 70%
Step 1:
Translate data into binary incidence matrix format.
Transaction dataset
| TID | Items |
|---|---|
| 1 | i1, i2, i5 |
| 2 | i2, i4 |
| 3 | i2, i3 |
| 4 | i1, i2, i4 |
| 5 | i1, i3 |
| 6 | i2, i3 |
| 7 | i1, i3 |
| 8 | i1, i2, i3, i5 |
| 9 | i1, i2, i3 |
Step 2:
Select itemsets where the minimum support count is 2.
Step 3:
Generate Associate Rules: Compute confidence and lift
Confidence and Lift
In-class demonstration