Analyzing Selling Price of used Cars using Python

Last Updated : 28 Mar, 2025

Analyzing the selling price of used cars is essential for making informed decisions in the automotive market. Using Python, we can efficiently process and visualize data to uncover key factors influencing car prices. This analysis not only aids buyers and sellers but also enables predictive modeling for future price estimation. This article will explore how to analyze the selling price of used cars using Python.

Step 1: Understanding the Dataset

The dataset contains various attributes of used cars, including price, brand, color, horsepower and more. Our goal is to analyze these factors and determine their impact on selling price. To download the file used in this example, click here.

Problem Statement: Our friend Otis wants to sell his car but isn't sure about the price. He wants to maximize profit while ensuring a reasonable deal for buyers. To help Otis we will analyze the dataset and determine the factors affecting car prices.

Step 2: Converting `.data` File to `.csv`

If the dataset is in .data format, follow these steps to convert it to .csv:

Open MS Excel.
Go to Data > From Text.
Select Comma Delimiter.
Save the file as .csv.

Now we can proceed with loading the dataset into Python.

Step 3: Install and Import Required Python Libraries

To analyze the data install the following Python libraries using the command below:

pip install pandas numpy matplotlib seaborn scipy

Import the following python libraries: numpy, pandas, matplotlib, seaborn and scipy.

Python

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy as sp

Step 4: Load the Dataset

Now, we load the dataset into a Pandas DataFrame and preview the first few rows. Let's check the first five entries of dataset.

Python

df = pd.read_csv('output.csv')

df = df.iloc[: , 1:]

df.head()

Output:

Step 5: Assign Column Headers

To make our dataset more readable we assign column headers:

Python

headers = ["symboling", "normalized-losses", "make", 
           "fuel-type", "aspiration","num-of-doors",
           "body-style","drive-wheels", "engine-location",
           "wheel-base","length", "width","height", "curb-weight",
           "engine-type","num-of-cylinders", "engine-size", 
           "fuel-system","bore","stroke", "compression-ratio",
           "horsepower", "peak-rpm","city-mpg","highway-mpg","price"]

df.columns=headers
df.head()

Output:

Step 6: Check for Missing Values

Missing values can impact our analysis. Let's check if any columns contain missing values.

Python

data = df

data.isna().any()

data.isnull().any()

Output:

Screenshot-2025-03-15-184537 — Missing Values

Step 7: Convert MPG to L/100km

Since fuel consumption is measured differently in different regions, we convert miles per gallon (MPG) to liters per 100 kilometers (L/100km)

Python

data['city-mpg'] = 235 / df['city-mpg']
data.rename(columns = {'city_mpg': "city-L / 100km"}, inplace = True)

print(data.columns)

data.dtypes

Output:

Step 8: Convert Price Column to Integer

The price column should be numerical, but it may contain string values like ?. We need to clean and convert it:

Python

data.price.unique()

data = data[data.price != '?']

data['price'] = data['price'].astype(int)

data.dtypes

Output:

Step 9: Normalize Features

To ensure fair comparisons between different features, we normalize numerical columns. To categorize cars based on their price we divide the price range into three categories: Low, Medium and High.

Python

data['length'] = data['length']/data['length'].max()
data['width'] = data['width']/data['width'].max()
data['height'] = data['height']/data['height'].max()

# binning- grouping values
bins = np.linspace(min(data['price']), max(data['price']), 4) 
group_names = ['Low', 'Medium', 'High']
data['price-binned'] = pd.cut(data['price'], bins, 
                              labels = group_names, 
                              include_lowest = True)

print(data['price-binned'])
plt.hist(data['price-binned'])
plt.show()

Output:

normalization-features — Normalization Features

Step 10: Convert Categorical Data to Numerical

Machine learning models require numerical data. We convert categorical variables into numerical ones using one-hot encoding:

Python

pd.get_dummies(data['fuel-type']).head()

data.describe()

Output:

convert-categorical-to-numerical — Convert Categorical Data to Numerical

Step 11: Data Visualization

Python

plt.boxplot(data['price'])

sns.boxplot(x ='drive-wheels', y ='price', data = data)

plt.scatter(data['engine-size'], data['price'])
plt.title('Scatterplot of Enginesize vs Price')
plt.xlabel('Engine size')
plt.ylabel('Price')
plt.grid()
plt.show()

Output:

Step 12: Grouping Data by Drive-Wheels and Body-Style

Grouping data helps identify trends based on key variables:

Python

test = data[['drive-wheels', 'body-style', 'price']]
data_grp = test.groupby(['drive-wheels', 'body-style'], 
                         as_index = False).mean()

data_grp

Output:

Step 13: Create a Pivot Table & Heatmap

Python

data_pivot = data_grp.pivot(index = 'drive-wheels',
                            columns = 'body-style')
data_pivot

plt.pcolor(data_pivot, cmap ='RdBu')
plt.colorbar()
plt.show()

Output:

Step 14: Perform ANOVA Test

The Analysis of Variance (ANOVA) test helps determine if different groups have significantly different means.

Python

data_annova = data[['make', 'price']]
grouped_annova = data_annova.groupby(['make'])
annova_results_l = sp.stats.f_oneway(
                             grouped_annova.get_group('honda')['price'],
                             grouped_annova.get_group('subaru')['price']
                                    )
print(annova_results_l)

sns.regplot(x ='engine-size', y ='price', data = data)
plt.ylim(0, )

Output:

This step-by-step analysis helps in understanding the key factors influencing the selling price of used cars. Proper data cleaning, visualization and statistical tests ensure that our findings are accurate and insightful.

Box Office Revenue Prediction Using Linear Regression in ML

sanmaypaniker

Improve

Article Tags :

Practice Tags :

python

Analyzing Selling Price of used Cars using Python

Step 1: Understanding the Dataset

Step 2: Converting .data File to .csv

Step 3: Install and Import Required Python Libraries

Step 4: Load the Dataset

Step 5: Assign Column Headers

Step 6: Check for Missing Values

Step 7: Convert MPG to L/100km

Step 8: Convert Price Column to Integer

Step 9: Normalize Features

Step 10: Convert Categorical Data to Numerical

Step 11: Data Visualization

Step 12: Grouping Data by Drive-Wheels and Body-Style

Step 13: Create a Pivot Table & Heatmap

Step 14: Perform ANOVA Test

Similar Reads

Classification Projects

Regression Projects

Computer Vision Projects

Natural Language Processing Projects

Clustering Projects

Recommender System Project

Thank You!

What kind of Experience do you want to share?

Step 2: Converting `.data` File to `.csv`