-
Notifications
You must be signed in to change notification settings - Fork 7.2k
[Data] Add credential provider support to read_unity_catalog API #60606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[Data] Add credential provider support to read_unity_catalog API #60606
Conversation
Introduce a credential provider pattern for Databricks authentication, enabling custom credential sources while maintaining backward compatibility. Changes: - Add DatabricksCredentialProvider base class and StaticCredentialProvider - Add credential_provider parameter to DatabricksUCDatasource and read_databricks_tables() - Add retry on 401 with credential invalidation - Move Databricks tests to dedicated test file The credential provider abstraction allows users to implement custom credential sources that support token refresh and other authentication patterns beyond static tokens. Backward compatibility: Existing code using the token parameter or DATABRICKS_TOKEN environment variable continues to work unchanged. Signed-off-by: ankur <ankur@anyscale.com>
Signed-off-by: ankur <ankur@anyscale.com>
Signed-off-by: ankur <ankur@anyscale.com>
- Add module-level constants DEFAULT_TOKEN_ENV_VAR and DEFAULT_HOST_ENV_VAR
for environment variable names
- Add logging to _detect_databricks_host() instead of silently swallowing
exceptions
- Extract 401 retry logic into reusable _request_with_401_retry() function
to reduce code duplication
Signed-off-by: ankur <ankur@anyscale.com>
Signed-off-by: ankur <ankur@anyscale.com>
Signed-off-by: ankur <ankur@anyscale.com>
Signed-off-by: ankur <ankur@anyscale.com>
Signed-off-by: ankur <ankur@anyscale.com>
Signed-off-by: ankur <ankur@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a credential_provider abstraction for Databricks authentication, which is a great improvement for flexibility and maintainability. It refactors read_unity_catalog and read_databricks_tables to use this new abstraction, including a well-implemented 401 retry mechanism for token refresh. The changes are well-tested with comprehensive new test files. My main feedback is to consolidate duplicated helper functions for handling HTTP requests and retries into a single shared module to improve code reuse.
a4acafe to
7d2429f
Compare
Signed-off-by: ankur <ankur@anyscale.com>
Signed-off-by: ankur <ankur@anyscale.com>
7b8c8ab to
c477f37
Compare
Summary
credential_providerparameter toread_unity_catalog()API for flexible authenticationUnityCatalogConnectorto use credential provider abstraction with 401 retry logicChanges
credential_providerparameter toread_unity_catalog(). Maintains backward compatibility - accepts eithercredential_providerORurl/tokenparameters._build_headers()helper for constructing auth headers from credential provider_request_with_401_retry()for automatic token refresh on authentication failuresUnityCatalogConnectorto use credential provider instead of raw tokenWhy
This aligns Unity Catalog with the credential provider abstraction introduced for Databricks UC datasource, enabling: