Python in Finance: Real Time Data Streaming within Jupyter Notebook
Python is the most commonly used and preferred language in the field of Data Science and Machine Learning. This is because, Python offers various user friendly and extensive libraries such as numpy, pandas, matplotlib, seaborn, etc. Using these financial professionals can easily analyse large datasets.
Why Real-Time Data Streaming?
Real-time data streaming enables financial professionals to monitor and analyze live market data, track stock prices, analyze trading volumes, and react promptly to market changes. This capability is essential for tasks such as algorithmic trading, portfolio management, and risk assessment.
Setting Up the Environment and Imports
To get started with real-time data streaming in a Jupyter Notebook, you need to set up your environment with the necessary libraries. The key libraries we will use are pandas
, websockets
, and plotly
for visualization.
Install Required Libraries:
pip install pandas requests plotly
Import Libraries:
import pandas as pdimport requestsimport plotly.graph_objects as gofrom plotly.subplots import make_subplotsimport time
Get an API Key:
You need to sign up at Alpha Vantage to get your free API key.
Fetching Real-Time Data from Alpha Vantage
Alpha Vantage provides real-time data through its TIME_SERIES_INTRADAY
endpoint. Here’s how to set it up:
Fetch Data:
fetch_data()
function makes an API call to Alpha Vantage to fetch the latest data.
def fetch_data():
url = f'https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol={SYMBOL}\
&interval={INTERVAL}&apikey={API_KEY}'
response = requests.get(url)
data = response.json()
return data
Process Data:
process_data() function processes the JSON response and converts it into a pandas DataFrame.
def process_data(data):
time_series = data[f'Time Series ({INTERVAL})']
df = pd.DataFrame.from_dict(time_series, orient='index')
df = df.astype(float)
df.index = pd.to_datetime(df.index)
df = df.sort_index()
return df
Initial Plot
The initial plot is created using plotly.
# Fetch initial data
data = fetch_data()
df = process_data(data)
# Create initial plot
fig = make_subplots(rows=1, cols=1)
trace = go.Scatter(x=df.index, y=df['4. close'], mode='lines', name='IBM')
fig.add_trace(trace)
fig.show()
Update Plot
The update_plot() function updates the plot with new data. This function is called every minute to fetch the latest data and update the plot.
# Periodically fetch and update data
while True:
new_data = fetch_data()
new_df = process_data(new_data)
update_plot(new_df)
time.sleep(60) # Update every minute
Complete Code
API_KEY = 'YOUR-API-KEY'
SYMBOL = 'IBM'
INTERVAL = '1min' # 1min, 5min, 15min, 30min, 60min
def fetch_data():
url = f'https://www.alphavantage.co/query?function=TIME_SERIES_INTR\
ADAY&symbol={SYMBOL}&interval={INTERVAL}&apikey={API_KEY}'
response = requests.get(url)
data = response.json()
return data
def process_data(data):
time_series = data[f'Time Series ({INTERVAL})']
df = pd.DataFrame.from_dict(time_series, orient='index')
df = df.astype(float)
df.index = pd.to_datetime(df.index)
df = df.sort_index()
return df
# Fetch initial data
data = fetch_data()
df = process_data(data)
# Create initial plot
fig = make_subplots(rows=1, cols=1)
trace = go.Scatter(x=df.index, y=df['4. close'], mode='lines', name='IBM')
fig.add_trace(trace)
fig.show()
# Function to update the plot
def update_plot(new_data):
global df
df = df.append(new_data)
with fig.batch_update():
fig.data[0].x = df.index
fig.data[0].y = df['4. close']
fig.show()
# Periodically fetch and update data
while True:
new_data = fetch_data()
new_df = process_data(new_data)
update_plot(new_df)
time.sleep(60) # Update every minute
Output