End-to-end distributed pipeline on 3.2M+ Instacart transactions — from raw CSV to association rules — using Apache Spark. Demonstrates scalability across dataset sizes with real benchmarking results.
3.2M+
Transactions processed
797
Frequent itemsets
436
Association rules
73.8×
Max lift score
| Antecedent | Consequent | Confidence | Lift | Support |
|---|---|---|---|---|
| Total 2% Lowfat Greek Yogurt (Blueberry) | Total 2% Greek Yogurt (Strawberry) | 45.8% | 48.8× | 0.003 |
| Non Fat Raspberry Yogurt | Icelandic Skyr Blueberry Yogurt | 44.2% | 73.8× | 0.0023 |
| Apple Honeycrisp Organic + Org. Hass Avocado | Bag of Organic Bananas | 44.2% | 3.8× | 0.0021 |
| Cucumber Kirby + Organic Avocado | Banana | 41.8% | 2.8× | 0.002 |
| Organic Raspberries + Org. Hass Avocado | Bag of Organic Bananas | 43.3% | 3.7× | 0.0034 |
| Boneless Skinless Chicken Breasts | Banana | 28.8% | 1.95× | 0.0045 |
| Green Bell Pepper | Organic Baby Spinach | 16.8% | 2.23× | 0.0029 |
| Limes + Banana | Organic Avocado | 23.1% | 4.2× | 0.0023 |