Open In App

How to Perform a Two-Way ANOVA in Python

Last Updated : 05 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Two-Way ANOVA in statistics stands for Analysis of Variance and it is used to check whether there is a statistically significant difference between the mean value of three or more. It interprets the difference between the mean value of at least three groups. Its main objective is to find out how two factors affect a response variable and to find out whether there is a relation between the two factors on the response variable.

Implementing Two-Way ANOVA in Python

Let us consider an example in which scientists need to know whether plant growth is affected by fertilizers and watering frequency. They planted exactly 30 plants and allowed them to grow for six months under different fertilizers and watering frequency. After six months, they recorded the heights of each plant in centimeters. Below are the step by step implementation:

Step 1: Import libraries.

First we will import numpy, pandas and statsmodels

Python
import numpy as np 
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

Step 2: Loading data.

Let us create a pandas DataFrame that consist of the following three variables:

  • fertilizers: how frequently each plant was fertilized that is daily or weekly.
  • watering: how frequently each plant was watered that is daily or weekly.
  • height: the height of each plant (in inches) after six months.
Python
dataframe = pd.DataFrame({'Fertilizer': np.repeat(['daily', 'weekly'], 15),
                          'Watering': np.repeat(['daily', 'weekly'], 15),
                          'height': [14, 16, 15, 15, 16, 13, 12, 11, 14, 
                                     15, 16, 16, 17, 18, 14, 13, 14, 14, 
                                     14, 15, 16, 16, 17, 18, 14, 13, 14, 
                                     14, 14, 15]})

Step 3: Conducting two-way ANOVA:

To perform the two-way ANOVA the Statsmodels library provides us with anova_lm() function.

  • sm.stats.anova_lm(model, type=2): Here model represents model statistic and type represents the type of Anova test to perform that is {1, 2 or 3}
Python
model = ols('height ~ C(Fertilizer) + C(Watering) + C(Fertilizer):C(Watering)', data=dataframe).fit()
anova_result = sm.stats.anova_lm(model, type=2)

Output:

Output

The p-values for the factors in the output show that none of the factors significantly affect plant height. The p-values for Fertilizer (0.913305), Watering (0.990865) and the interaction between Fertilizer and Watering (0.904053) are all greater than 0.05 indicating no significant effects on plant height i.e there is no evidence to reject the null hypothesis for each factor. The residual represents the unexplained variance in the model with 28 degrees of freedom meaning there are 28 residual observations.


Next Article

Similar Reads