TID | Items |
---|---|
1 | i1, i2, i5 |
2 | i2, i4 |
3 | i2, i3 |
4 | i1, i2, i4 |
5 | i1, i3 |
6 | i2, i3 |
7 | i1, i3 |
8 | i1, i2, i3, i5 |
9 | i1, i2, i3 |
Pattern Mining: Market Basket Analysis
Market Basket Analysis
Affinity analysis
Unsupervised learning
Frequent itemset mining: To discover which groups of products tend to be purchased together.
Basic concepts
Transaction dataset
Item set: Set of items
Suppose we have 100 items. Find the total number of itemsets.
Association rule
\[ Milk \Rightarrow Bread \text{ [Support = 2%, Confidence = 60%]}\]
IF (Antecedent)
THEN (Consequent)
Support and Confidence measures the strength of association between antecedent and consequent itemset.
Apriori algorithm
Desired support count: 2 (22%)
Desired confidence: 70%
Step 1:
Translate data into binary incidence matrix format.
Transaction dataset
TID | Items |
---|---|
1 | i1, i2, i5 |
2 | i2, i4 |
3 | i2, i3 |
4 | i1, i2, i4 |
5 | i1, i3 |
6 | i2, i3 |
7 | i1, i3 |
8 | i1, i2, i3, i5 |
9 | i1, i2, i3 |
Step 2:
Select itemsets where the minimum support count is 2.
Step 3:
Generate Associate Rules: Compute confidence and lift
Confidence and Lift
In-class demonstration