Library companion to the paper Efficient Shapley Performance Attribution for Least-Squares Regression by Logan Bell, Nikhil Devanathan, and Stephen Boyd.
The results provided in the reference paper were generated using a more performant, but harder to use implementation of the same algorithm. This benchmark code and the numerical experiments from the reference paper can be found at cvxgrp/ls-spa-benchmark. We recommend caution in trying to use the benchmark code.
To install this package, execute
pip install ls_spaImport ls_spa by adding
from ls_spa import ls_spato the top of your Python file.
ls_spa has the following dependencies:
numpyscipypandasjoblib
Optional dependencies are
marimofor using the demo notebookmatplotlibfor plotting in the demo notebook
We assume that you have imported ls_spa and you have a X_train, a X_test,
a y_train, and a y_test
for positive integers
attrs = ls_spa(X_train, X_test, y_train, y_test).attributionattrs will be a NumPy array containing the Shapley values of your features.
The ls_spa function computes Shapley values for the given data using
the LS-SPA method described in the companion paper. It takes arguments:
X_train: Training feature matrix (NumPy array or pandas DataFrame).X_test: Testing feature matrix (NumPy array or pandas DataFrame).y_train: Training response vector (NumPy array or pandas Series).y_test: Testing response vector (NumPy array or pandas Series).
We present a complete Python script that utilizes LS-SPA to compute the Shapley attribution on the data from the toy example described in the companion paper.
# Imports
import numpy as np
from ls_spa import ls_spa
# Data loading
X_train, X_test, y_train, y_test = [np.load("./data/toy_data.npz")[key] for key in ["X_train","X_test","y_train","y_test"]]
# Compute Shapley attribution with LS-SPA
results = ls_spa(X_train, X_test, y_train, y_test)
# Print attribution
print(results)This example uses data from the data
directory of this repository.
The line print(results) prints a dashboard of information generated while
computing the Shapley attribution such as the attribution, the
To extract just the vector of Shapley values, use results.attribution.
For more info, see optional arguments.
In this demo, we walk through the process of
computing Shapley values on the data for the toy example in the
companion paper. We then use ls_spa to compute the Shapley attribution
on the same data.
ls_spa takes the optional arguments:
reg: Ridge regularization parameter (Default0.0).max_samples: Maximum number of feature permutations to sample (Default8192).batch_size: Number of permutations to process per batch (Default256).tolerance: Stopping criterion for estimation error (Default0.01).seed: Seed for random number generation (Default42).perms: Permutation sampling method (DefaultNone). Options include:None: Auto-select"exact"for p < 9 features, otherwise"random""exact": Enumerate all permutations (only feasible for p < 9)"random": Uniformly random permutations"argsort": Quasi-Monte Carlo permutations using argsort"permutohedron": Quasi-Monte Carlo permutations from permutohedron lattice- Custom array or tuple of permutations
antithetical: Use antithetical (paired) sampling for variance reduction (DefaultTrue).return_attribution_history: Return convergence history of attributions (DefaultFalse).n_jobs: Number of parallel jobs; use-1for all CPU cores (Default1).
ls_spa returns a ShapleyResults object. The ShapleyResults object
has the fields:
attribution: Array of Shapley values for each feature.theta: Array of regression coefficients with all features.r_squared: Out-of-sample R² with all features.overall_error: Estimated error (95th percentile L2 norm) in Shapley attribution vector.attribution_errors: Array of estimated errors for each feature's attribution.error_history: Array of error estimates after each batch.Noneif using exact computation.attribution_history: Array of attribution estimates over time.Noneifreturn_attribution_history=False.
If you use this code for research, please cite the associated paper.
@article{Bell2024,
title = {Efficient Shapley performance attribution for least-squares regression},
volume = {34},
ISSN = {1573-1375},
url = {http://dx.doi.org/10.1007/s11222-024-10459-9},
DOI = {10.1007/s11222-024-10459-9},
number = {5},
journal = {Statistics and Computing},
publisher = {Springer Science and Business Media LLC},
author = {Bell, Logan and Devanathan, Nikhil and Boyd, Stephen},
year = {2024},
month = jul
}