Skip to content

Conversation

@ankur-anyscale
Copy link
Contributor

Summary

  • Add credential_provider parameter to read_unity_catalog() API for flexible authentication
  • Refactor UnityCatalogConnector to use credential provider abstraction with 401 retry logic
  • Add comprehensive test coverage for Unity Catalog datasource

Changes

  • read_api.py: Add optional credential_provider parameter to read_unity_catalog(). Maintains backward compatibility - accepts either credential_provider OR url/token parameters.
  • uc_datasource.py:
    • Add _build_headers() helper for constructing auth headers from credential provider
    • Add _request_with_401_retry() for automatic token refresh on authentication failures
    • Update UnityCatalogConnector to use credential provider instead of raw token
  • test_uc_datasource.py: New test file with tests for header building, 401 retry logic, connector initialization, and public API. Note that no unit test existed before.

Why

This aligns Unity Catalog with the credential provider abstraction introduced for Databricks UC datasource, enabling:

  • Token refresh on 401 errors (expired tokens)
  • Custom credential providers (e.g., OAuth, service principals)
  • Consistent authentication pattern across Databricks datasources
  Introduce a credential provider pattern for Databricks authentication,
  enabling custom credential sources while maintaining backward compatibility.

  Changes:
  - Add DatabricksCredentialProvider base class and StaticCredentialProvider
  - Add credential_provider parameter to DatabricksUCDatasource and read_databricks_tables()
  - Add retry on 401 with credential invalidation
  - Move Databricks tests to dedicated test file

  The credential provider abstraction allows users to implement custom
  credential sources that support token refresh and other authentication
  patterns beyond static tokens.

  Backward compatibility: Existing code using the token parameter or
  DATABRICKS_TOKEN environment variable continues to work unchanged.

Signed-off-by: ankur <ankur@anyscale.com>
Signed-off-by: ankur <ankur@anyscale.com>
Signed-off-by: ankur <ankur@anyscale.com>
  - Add module-level constants DEFAULT_TOKEN_ENV_VAR and DEFAULT_HOST_ENV_VAR
    for environment variable names
  - Add logging to _detect_databricks_host() instead of silently swallowing
    exceptions
  - Extract 401 retry logic into reusable _request_with_401_retry() function
    to reduce code duplication

Signed-off-by: ankur <ankur@anyscale.com>
Signed-off-by: ankur <ankur@anyscale.com>
Signed-off-by: ankur <ankur@anyscale.com>
Signed-off-by: ankur <ankur@anyscale.com>
Signed-off-by: ankur <ankur@anyscale.com>
@ankur-anyscale ankur-anyscale requested a review from a team as a code owner January 30, 2026 13:08
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a credential_provider abstraction for Databricks authentication, which is a great improvement for flexibility and maintainability. It refactors read_unity_catalog and read_databricks_tables to use this new abstraction, including a well-implemented 401 retry mechanism for token refresh. The changes are well-tested with comprehensive new test files. My main feedback is to consolidate duplicated helper functions for handling HTTP requests and retries into a single shared module to improve code reuse.

@ray-gardener ray-gardener bot added the data Ray Data-related issues label Jan 30, 2026
@ankur-anyscale ankur-anyscale force-pushed the ankur/add_unity_catalog_changes branch from a4acafe to 7d2429f Compare January 30, 2026 14:10
@ankur-anyscale ankur-anyscale force-pushed the ankur/add_unity_catalog_changes branch from 7b8c8ab to c477f37 Compare January 30, 2026 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues

1 participant