Open In App

Predicting Stock Price Direction using Support Vector Machines

Last Updated : 02 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Predicting stock price direction is a key goal for traders and analysts. Support Vector Machines (SVM) is a machine learning algorithm that can help classify whether a stock's price will rise or fall. In this article, we'll demonstrate how to apply SVM to predict stock price movements using historical data, covering data preparation, model training, and evaluation.

Step-by-Step Implementation

Step 1: Import the libraries

Here we will import pandas, scikit learn and matplotlib.

Python
# Machine learning
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# For data manipulation
import pandas as pd
import numpy as np

# To plot
import matplotlib.pyplot as plt
# To ignore warnings
import warnings
warnings.filterwarnings("ignore")

Step 2: Read Stock  data

We will Read the Stock Data Downloaded From Yahoo Finance Website. You can download dataset from here.

Python
# Read the csv file using read_csv 
# method of pandas
df = pd.read_csv('RELIANCE.csv')
df

Output:

Screenshot-2024-09-17-232800
dataset

Step 3: Data Preparation 

The data needed to be processed before use such that the date column should act as an index to do that and will drop date column.

Python
# Changes The Date column as index columns
df.index = pd.to_datetime(df['Date'])
df

# drop The original date column
df = df.drop(['Date'], axis='columns')
df

Output:

Screenshot-2024-09-17-232935
Data Preparation 

Step 4: Define the explanatory variables

Explanatory or independent variables are used to predict the value response variable. The X is a dataset that holds the variables which are used for prediction. The X consists of variables such as 'Open - Close' and 'High - Low'. These can be understood as indicators based on which the algorithm will predict tomorrow's trend. Feel free to add more indicators and see the performance.

Python
# Create predictor variables
df['Open-Close'] = df.Open - df.Close
df['High-Low'] = df.High - df.Low

# Store all predictor variables in a variable X
X = df[['Open-Close', 'High-Low']]
X.head()

Output:

Screenshot-2025-07-02-100504
Explanatory variables

Step 5: Define the target variable

The target variable is the outcome which the machine learning model will predict based on the explanatory variables. If tomorrow's price is greater than today's price then we will buy the particular Stock else we will have no position in the. We will store +1 for a buy signal and 0 for a no position in y. We will use where() function from NumPy to do this.

Python
# Target variables
y = np.where(df['Close'].shift(-1) > df['Close'], 1, 0)
y

Output:

array([1, 1, 0, ..., 1, 0, 0])

Step 6: Split the data into train and test

We will split data into training and test data sets. This is done so that we can evaluate the effectiveness of the model in the test dataset. We will split 80% data for training and 20% for testing.

Python
split_percentage = 0.8
split = int(split_percentage*len(df))

# Train data set
X_train = X[:split]
y_train = y[:split]

# Test data set
X_test = X[split:]
y_test = y[split:]

Step 7: Support Vector Classifier (SVC)

We will use Support Vector Machines by SVC() function from sklearn.svm.SVC library to create our classifier model using the fit() method on the training data set.

Python
# Support vector classifier
cls = SVC().fit(X_train, y_train)

Step 8: Classifier accuracy

This code calculates and prints the accuracy of your model on both the training and testing data which were split 80/20 to check for overfitting.

Python
print("The data was split into training and testing sets using an 80/20 split.")

# Calculate training accuracy
train_accuracy = accuracy_score(y_train, cls.predict(X_train))

# Calculate testing accuracy
test_accuracy = accuracy_score(y_test, cls.predict(X_test))

print(f"Training Accuracy: {train_accuracy}")
print(f"Testing Accuracy: {test_accuracy}")

Output:

Screenshot-2025-07-02-101052
Training and testing accuracy

Step 9: Different Kernels

This code trains and tests SVC models with different kernels like linear, polynomial, RBF and sigmoid to see how the kernel affects prediction accuracy on the test data. It then prints the accuracy for each kernel.

Python
from sklearn.metrics import accuracy_score

# Linear kernel
cls_linear = SVC(kernel='linear').fit(X_train, y_train)
y_pred_linear = cls_linear.predict(X_test)
accuracy_linear = accuracy_score(y_test, y_pred_linear)
print(f"Accuracy with Linear Kernel: {accuracy_linear}")

# Polynomial kernel
cls_poly = SVC(kernel='poly', degree=3).fit(X_train, y_train)
y_pred_poly = cls_poly.predict(X_test)
accuracy_poly = accuracy_score(y_test, y_pred_poly)
print(f"Accuracy with Polynomial Kernel (degree=3): {accuracy_poly}")

# RBF kernel (default)
cls_rbf = SVC(kernel='rbf').fit(X_train, y_train)
y_pred_rbf = cls_rbf.predict(X_test)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)
print(f"Accuracy with RBF Kernel: {accuracy_rbf}")

# Sigmoid kernel
cls_sigmoid = SVC(kernel='sigmoid').fit(X_train, y_train)
y_pred_sigmoid = cls_sigmoid.predict(X_test)
accuracy_sigmoid = accuracy_score(y_test, y_pred_sigmoid)
print(f"Accuracy with Sigmoid Kernel: {accuracy_sigmoid}")

Output:

Screenshot-2025-07-02-100816
Accuracy with different Kernels

Step 10: Strategy implementation

We will predict the signal (buy or sell) using the cls.predict() function.

Python
df['Predicted_Signal'] = cls.predict(X)

Calculate Daily returns 

Python
# Calculate daily returns
df['Return'] = df.Close.pct_change()

Calculate Strategy Returns

Python
# Calculate strategy returns
df['Strategy_Return'] = df.Return *df.Predicted_Signal.shift(1)

Calculate Cumulative Returns

Python
# Calculate Cumulutive returns
df['Cum_Ret'] = df['Return'].cumsum()
df

Output :

Screenshot
Calculate Cumulative Returns

Calculate Strategy Cumulative Returns

Python
# Plot Strategy Cumulative returns 
df['Cum_Strategy'] = df['Strategy_Return'].cumsum()
df

Output

Screenshot-2024-09-17-233341
Calculate Strategy Cumulative Returns

Step 11: Visualizing Strategy Returns vs Original Returns

Python
import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(Df['Cum_Ret'],color='red')
plt.plot(Df['Cum_Strategy'],color='blue')

Output:

download
Plot Strategy Returns vs Original Returns

As You Can See Our Strategy Seem to be Totally Outperforming the Performance of The Reliance Stock. Our Strategy (Blue Line) Provided the return of 18.87 % in the last 1 year whereas the stock of Reliance Industries (Red Line) Provides the Return of just 5.97%  in the last 1 year. We can further fine tune our model for better accuracy.

Step 12: Let's see our model result on different stocks

1.  TCS

Stock Return Over Last 1 year -  48%
Strategy result - 48.9 %

2. ICICI BANK

Stock Return Over Last 1 year -  48%
Strategy result - 48.9 %

Get the complete notebook link from here:

Notebook link : click here.


Next Article

Similar Reads