Implementing Apriori algorithm in Python
Apriori Algorithm is a machine learning algorithm used for market basket analysis. It helps to find associations or relationships between items in large transactional datasets. A common real-world application is product recommendation where items are suggested to users based on their shopping cart contents. Companies like Walmart have used this algorithm to improve product suggestions and drive sales.
In this article we’ll do step-by-step implementation of the Apriori algorithm in Python using the mlxtend library.
Step 1: Importing Required Libraries
Before we begin we need to import the necessary Python libraries like Pandas , Numpy and mlxtend.
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
Step 2: Loading and exploring the data
We start by loading a popular groceries dataset. This dataset contains customer transactions with details like customer ID, transaction date, and the item purchased. you can download the dataset from here.
import pandas as pd
df = pd.read_csv("/content/Groceries_dataset.csv")
print(df.head())
Output:

- Each row represents one item in a customer's basket on a given date.
- To use the Apriori algorithm we must convert this into full transactions per customer per visit.
Step 3: Group Items by Transaction
We group items purchased together by the same customer on the same day to form one transaction.
basket = df.groupby(['Member_number', 'Date'])['itemDescription'].apply(list).reset_index()
transactions = basket['itemDescription'].tolist()
print(transactions)
Output:

Step 4: Convert to One-Hot Format
Apriori needs data in True/False format like Did the item appear in the basket?. We use Transaction Encoder for this:
from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_array = te.fit(transactions).transform(transactions)
df_encoded = pd.DataFrame(te_array, columns=te.columns_)
Step 5: Run Apriori Algorithm
Now we find frequent itemsets combinations of items that often occur together. Here min_support=0.01 means itemsets that appear in at least 1% of transactions. This gives us common combinations of items.
from mlxtend.frequent_patterns import apriori
frequent_itemsets = apriori(df_encoded, min_support=0.01, use_colnames=True)
print("Total Frequent Itemsets:", frequent_itemsets.shape[0])
Output:
Total Frequent Itemsets: 69
Step 6: Generate Association Rules
Now we find rules like If bread and butter are bought, milk is also likely to be bought.
- Support: How often the rule appears in the dataset.
- Confidence: Probability of buying item B if item A is bought.
- Lift: Strength of the rule over random chance. (>1 means it's a good rule)
from mlxtend.frequent_patterns import association_rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.1)
rules = rules[rules['antecedents'].apply(lambda x: len(x) >= 1) & rules['consequents'].apply(lambda x: len(x) >= 1)]
print("Association Rules:", rules.shape[0])
rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']].head(5)
Output:

Step 7: Visualize the Most Popular Items
Let’s see which items are most frequently bought:
import matplotlib.pyplot as plt
top_items = df['itemDescription'].value_counts().head(10)
top_items.plot(kind='bar', title='Top 10 Most Purchased Items')
plt.xlabel("Item")
plt.ylabel("Count")
plt.show()
Output:

As shown in the above output Whole milk is the most frequently bought item, followed by other vegetables, rolls/bun and soda.