Fastest Way to Read Excel File in Python
Reading Excel files is a common task in data analysis and processing. Python provides several libraries to handle Excel files, each with its advantages in terms of speed and ease of use. This article explores the fastest methods to read Excel files in Python.
Using pandas
pandas is a powerful and flexible data analysis library in Python. It provides the read_excel function to read Excel files. While not the fastest, it is highly efficient and widely used for its ease of use and versatility.
import pandas as pd
# Read Excel file
df = pd.read_excel('data.xlsx')
print(df.head())
Output:

pandas.read_excel can handle large datasets efficiently and supports various Excel formats. It can also read multiple sheets by specifying the sheet_name parameter.
Note: Use usecols to load only specific columns. Use nrows to limit the number of rows read. Specify dtype for columns to avoid type inference overhead.
Using openpyxl
openpyxl
is another popular library for reading and writing Excel files. It is particularly useful for working with .xlsx
files.
from openpyxl import load_workbook
# Load workbook
wb = load_workbook('data.xlsx', read_only=True)
# Select a sheet
ws = wb['Sheet1']
# Read data
li = []
for i in ws.iter_rows(values_only=True):
li.append(i)
print(li[:5])
Output:
[('Topic', 'Description', 'Difficulty Level', 'Link'),
('Python', 'Learn Python programming language.', 'Beginner', 'https://www.geeksforgeeks.org/python-programming-language/'),
('Data Structures', 'Study of data structures', 'Intermediate', 'https://www.geeksforgeeks.org/data-structures/'),
('Algorithms', 'Learn various algorithms', 'Advanced', 'https://www.geeksforgeeks.org/fundamentals-of-algorithms/'),
('Machine Learning', 'Introduction to Machine Learning ', 'Intermediate', 'https://www.geeksforgeeks.org/machine-learning/')]
openpyxl provides fine-grained control over reading and writing Excel files. The read_only mode significantly improves performance when reading large files.
Using xlrd for .xls format
xlrd is a library for reading data and formatting information from Excel files in the historical .xls format. For newer .xlsx files, consider using openpyxl or pandas.
import xlrd
# Open workbook
wb = xlrd.open_workbook('data.xls')
# Select a sheet
sheet = wb.sheet_by_name('Sheet1')
# Read data
li = []
for idx in range(sheet.nrows):
row = sheet.row(idx)
li.append([cell.value for cell in row])
print(li[:5])
Output
[['Topic', 'Description', 'Difficulty Level', 'Link'],
['Python', 'Learn Python programming language.', 'Beginner', 'https://www.geeksforgeeks.org/python-programming-language/'],
['Data Structures', 'Study of data structures', 'Intermediate', 'https://www.geeksforgeeks.org/data-structures/'],
['Algorithms', 'Learn various algorithms', 'Advanced', 'https://www.geeksforgeeks.org/fundamentals-of-algorithms/'],
['Machine Learning', 'Introduction to Machine Learning', 'Intermediate', 'https://www.geeksforgeeks.org/machine-learning/']]
Using pyxlsb for .xlsb Binary Excel Format
pyxlsb is a library for reading Excel files in the Binary Excel format (.xlsb). It is significantly faster for large files compared to other libraries.
import pyxlsb
# Read .xlsb file
with pyxlsb.open_workbook('data.xlsb') as wb:
with wb.get_sheet(1) as sheet:
li = []
for i in sheet.rows():
li.append([item.v for j in row])
print(li[:5])
Output
[['Topic', 'Description', 'Difficulty Level', 'Link'],
['Python', 'Learn Python programming language.', 'Beginner', 'https://www.geeksforgeeks.org/python-programming-language/'],
['Data Structures', 'Study of data structures', 'Intermediate', 'https://www.geeksforgeeks.org/data-structures/'],
['Algorithms', 'Learn various algorithms', 'Advanced', 'https://www.geeksforgeeks.org/fundamentals-of-algorithms/'],
['Machine Learning', 'Introduction to Machine Learning', 'Intermediate', 'https://www.geeksforgeeks.org/machine-learning/']]
pyxlsb is optimized for reading binary Excel files, which can be much faster than other formats for large datasets.