Open In App

Implementing Apriori algorithm in Python

Last Updated : 26 May, 2025
Summarize
Comments
Improve
Suggest changes
Share
Like Article
Like
Report

Apriori Algorithm is a machine learning algorithm used for market basket analysis. It helps to find associations or relationships between items in large transactional datasets. A common real-world application is product recommendation where items are suggested to users based on their shopping cart contents. Companies like Walmart have used this algorithm to improve product suggestions and drive sales.

In this article we’ll do step-by-step implementation of the Apriori algorithm in Python using the mlxtend library.

Step 1: Importing Required Libraries

Before we begin we need to import the necessary Python libraries like Pandas , Numpy and mlxtend.

Python
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

Step 2: Loading and exploring the data

We start by loading a popular groceries dataset. This dataset contains customer transactions with details like customer ID, transaction date, and the item purchased. you can download the dataset from here.

Python
import pandas as pd
df = pd.read_csv("/content/Groceries_dataset.csv")
print(df.head())

Output:

Dataset
Dataset
  • Each row represents one item in a customer's basket on a given date.
  • To use the Apriori algorithm we must convert this into full transactions per customer per visit.

Step 3: Group Items by Transaction

We group items purchased together by the same customer on the same day to form one transaction.

Python
basket = df.groupby(['Member_number', 'Date'])['itemDescription'].apply(list).reset_index()
transactions = basket['itemDescription'].tolist()
print(transactions)

Output:

Group-data-items
Group items

Step 4: Convert to One-Hot Format

Apriori needs data in True/False format like Did the item appear in the basket?. We use Transaction Encoder for this:

Python
from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_array = te.fit(transactions).transform(transactions)
df_encoded = pd.DataFrame(te_array, columns=te.columns_)

Step 5: Run Apriori Algorithm

Now we find frequent itemsets combinations of items that often occur together. Here min_support=0.01 means itemsets that appear in at least 1% of transactions. This gives us common combinations of items.

Python
from mlxtend.frequent_patterns import apriori
frequent_itemsets = apriori(df_encoded, min_support=0.01, use_colnames=True)
print("Total Frequent Itemsets:", frequent_itemsets.shape[0])

Output:

Total Frequent Itemsets: 69

Step 6: Generate Association Rules

Now we find rules like If bread and butter are bought, milk is also likely to be bought.

  • Support: How often the rule appears in the dataset.
  • Confidence: Probability of buying item B if item A is bought.
  • Lift: Strength of the rule over random chance. (>1 means it's a good rule)
Python
from mlxtend.frequent_patterns import association_rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.1)
rules = rules[rules['antecedents'].apply(lambda x: len(x) >= 1) & rules['consequents'].apply(lambda x: len(x) >= 1)]
print("Association Rules:", rules.shape[0])
rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']].head(5)

Output:

Generate-Association-rules
Association rules

Step 7: Visualize the Most Popular Items

Let’s see which items are most frequently bought:

Python
import matplotlib.pyplot as plt
top_items = df['itemDescription'].value_counts().head(10)
top_items.plot(kind='bar', title='Top 10 Most Purchased Items')
plt.xlabel("Item")
plt.ylabel("Count")
plt.show()

Output:

download-
Most Purchased Items

As shown in the above output Whole milk is the most frequently bought item, followed by other vegetables, rolls/bun and soda.


Next Article
Article Tags :
Practice Tags :

Similar Reads