Python in Finance: Real Time Data Streaming within Jupyter Notebook

Last Updated : 28 Jun, 2024

Python is the most commonly used and preferred language in the field of Data Science and Machine Learning. This is because, Python offers various user friendly and extensive libraries such as numpy, pandas, matplotlib, seaborn, etc. Using these financial professionals can easily analyse large datasets.

Why Real-Time Data Streaming?

Real-time data streaming enables financial professionals to monitor and analyze live market data, track stock prices, analyze trading volumes, and react promptly to market changes. This capability is essential for tasks such as algorithmic trading, portfolio management, and risk assessment.

Setting Up the Environment and Imports

To get started with real-time data streaming in a Jupyter Notebook, you need to set up your environment with the necessary libraries. The key libraries we will use are pandas, websockets, and plotly for visualization.

Install Required Libraries:

pip install pandas requests plotly

Import Libraries:

import pandas as pdimport requestsimport plotly.graph_objects as gofrom plotly.subplots import make_subplotsimport time

Get an API Key:

You need to sign up at Alpha Vantage to get your free API key.

Fetching Real-Time Data from Alpha Vantage

Alpha Vantage provides real-time data through its TIME_SERIES_INTRADAY endpoint. Here’s how to set it up:

Fetch Data:

fetch_data() function makes an API call to Alpha Vantage to fetch the latest data.

Python

def fetch_data():
    url = f'https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol={SYMBOL}\
    									&interval={INTERVAL}&apikey={API_KEY}'
    response = requests.get(url)
    data = response.json()
    return data

Process Data:

process_data() function processes the JSON response and converts it into a pandas DataFrame.

Python

def process_data(data):
    time_series = data[f'Time Series ({INTERVAL})']
    df = pd.DataFrame.from_dict(time_series, orient='index')
    df = df.astype(float)
    df.index = pd.to_datetime(df.index)
    df = df.sort_index()
    return df

Initial Plot

The initial plot is created using plotly.

Python

# Fetch initial data
data = fetch_data()
df = process_data(data)

# Create initial plot
fig = make_subplots(rows=1, cols=1)
trace = go.Scatter(x=df.index, y=df['4. close'], mode='lines', name='IBM')
fig.add_trace(trace)
fig.show()

Update Plot

The update_plot() function updates the plot with new data. This function is called every minute to fetch the latest data and update the plot.

Python

# Periodically fetch and update data
while True:
    new_data = fetch_data()
    new_df = process_data(new_data)
    update_plot(new_df)
    time.sleep(60)  # Update every minute

Complete Code

Python

API_KEY = 'YOUR-API-KEY'
SYMBOL = 'IBM'
INTERVAL = '1min'  # 1min, 5min, 15min, 30min, 60min

def fetch_data():
    url = f'https://www.alphavantage.co/query?function=TIME_SERIES_INTR\
    				ADAY&symbol={SYMBOL}&interval={INTERVAL}&apikey={API_KEY}'
    response = requests.get(url)
    data = response.json()
    return data

def process_data(data):
    time_series = data[f'Time Series ({INTERVAL})']
    df = pd.DataFrame.from_dict(time_series, orient='index')
    df = df.astype(float)
    df.index = pd.to_datetime(df.index)
    df = df.sort_index()
    return df

# Fetch initial data
data = fetch_data()
df = process_data(data)

# Create initial plot
fig = make_subplots(rows=1, cols=1)
trace = go.Scatter(x=df.index, y=df['4. close'], mode='lines', name='IBM')
fig.add_trace(trace)
fig.show()

# Function to update the plot
def update_plot(new_data):
    global df
    df = df.append(new_data)
    with fig.batch_update():
        fig.data[0].x = df.index
        fig.data[0].y = df['4. close']
    fig.show()

# Periodically fetch and update data
while True:
    new_data = fetch_data()
    new_df = process_data(new_data)
    update_plot(new_df)
    time.sleep(60)  # Update every minute