Skip to content

Conversation

@elliot-barn
Copy link
Contributor

Generating data depsets for data ci images

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
@elliot-barn elliot-barn added the go add ONLY when ready to merge, run all tests label Jan 30, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the data CI image build process to use depsets, which is a good step towards better dependency management. It also includes several dependency version upgrades. I've found a couple of critical issues in the Wanda configuration files that will likely cause build failures. There's also a minor improvement that can be made to the new Dockerfile for clarity and best practices.

@@ -0,0 +1,9 @@
name: "databuild-py$PYTHON"
froms: ["cr.ray.io/rayproject/oss-ci-base_ml-py$PYTHON"]
dockerfile: ci/docker/data.build.Dockerfile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The dockerfile path appears to be incorrect. It's pointing to ci/docker/data.build.Dockerfile, but given the changes in this PR, it should probably point to the new ci/docker/data.base.build.Dockerfile to use the new depset-based installation logic.

dockerfile: ci/docker/data.base.build.Dockerfile
name: "datamongobuild-py$PYTHON"
froms: ["cr.ray.io/rayproject/oss-ci-base_ml-py$PYTHON"]
dockerfile: ci/docker/data.build.Dockerfile
dockerfile: t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The dockerfile path has been changed to t, which is not a valid file path. This will likely cause the build to fail. Please correct the path. If this build is being deprecated, consider removing the file instead.

dockerfile: ci/docker/data.build.Dockerfile
- name: databuild-multipy
label: "wanda: databuild-py{{matrix}}"
wanda: ci/docker/data.build.wanda.yaml
wanda: ci/docker/data.base.build.wanda.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The wanda configuration file path has been updated. Please ensure that the new file ci/docker/data.base.build.wanda.yaml correctly points to the new ci/docker/data.base.build.Dockerfile. I've left a specific comment on that file regarding a potential mismatch.


SHELL ["/bin/bash", "-ice"]

COPY . .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The COPY . . command seems redundant as the required lock file is already explicitly copied on line 10. This broad COPY could unintentionally bring in other files from the build context, potentially increasing image size or causing unexpected behavior. For better clarity, hermeticity, and caching, it's best to remove this line if it's not strictly necessary for other files not declared in the wanda.yaml srcs.

name: "datamongobuild-py$PYTHON"
froms: ["cr.ray.io/rayproject/oss-ci-base_ml-py$PYTHON"]
dockerfile: ci/docker/data.build.Dockerfile
dockerfile: t
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dockerfile path set to invalid value t

High Severity

The dockerfile field in datamongo.build.wanda.yaml is changed to t, which is not a valid Dockerfile path. This appears to be an accidental edit (possibly partial deletion). The previous value was ci/docker/data.build.Dockerfile. This will cause the datamongobuild-multipy CI build step to fail when trying to locate the Dockerfile.

Fix in Cursor Fix in Web

@@ -0,0 +1,9 @@
name: "databuild-py$PYTHON"
froms: ["cr.ray.io/rayproject/oss-ci-base_ml-py$PYTHON"]
dockerfile: ci/docker/data.build.Dockerfile
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanda config references wrong Dockerfile for depset build

High Severity

The new data.base.build.wanda.yaml references ci/docker/data.build.Dockerfile instead of the newly created ci/docker/data.base.build.Dockerfile. The wanda config includes data_base_depset_py$PYTHON.lock as a source, but the referenced Dockerfile uses install-dependencies.sh instead of the depset lock file. The lock file will be copied but never used.

Fix in Cursor Fix in Web

srcs:
- python/deplocks/ci/data_base_depset_py$PYTHON.lock
build_args:
- DOCKER_IMAGE_BASE_BUILD=cr.ray.io/rayproject/oss-ci-base_ml-py$PYTHON
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing PYTHON_DEPSET build argument in wanda config

Medium Severity

The data.base.build.wanda.yaml does not pass PYTHON_DEPSET as a build argument, so the Dockerfile would use its default value. However, the default python/deplocks/ci/data_base_depset_py$PYTHON.lock references $PYTHON which is never defined as a Dockerfile ARG, causing it to expand to an empty string. The resulting path data_base_depset_py.lock (without version) won't match the actual lock file. The similar datatfds.build.wanda.yaml correctly passes PYTHON_DEPSET explicitly.

Additional Locations (1)

Fix in Cursor Fix in Web

- python/requirements/ml/data-test-requirements.txt
- python/requirements/data/data-base.txt
constraints:
- /tmp/ray-deps/requirements_compiled_py3.13.txt
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded Python 3.13 constraints for 3.10/3.12 builds

Medium Severity

The constraints path hardcodes requirements_compiled_py3.13.txt but the depset builds target Python 3.10 and 3.12 (via build_arg_sets). All other depset configs in rayimg.depsets.yaml use ${PYTHON_VERSION} to match the constraints file to the target Python version. Using 3.13 constraints for 3.10/3.12 builds could result in dependency versions that are incompatible with those Python versions.

Fix in Cursor Fix in Web

if [[ -n "$ARROW_MONGO_VERSION" ]]; then
# Older versions of Arrow Mongo require an older version of NumPy.
pip install numpy==1.23.5
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code conditional branches never execute

Low Severity

The conditional blocks for ARROW_MONGO_VERSION and RAY_CI_JAVA_BUILD will never execute because the corresponding data.base.build.wanda.yaml doesn't pass these build args. Both ARGs default to empty, so the if [[ -n "$ARROW_MONGO_VERSION" ]] and if [[ $RAY_CI_JAVA_BUILD == 1 ]] conditions always evaluate to false. This appears to be code copied from data.build.Dockerfile that isn't needed for this simpler depset-based build.

Additional Locations (1)

Fix in Cursor Fix in Web

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.


ARG ARROW_MONGO_VERSION=
ARG RAY_CI_JAVA_BUILD=
ARG PYTHON_DEPSET=python/deplocks/ci/data_base_depset_py$PYTHON.lock
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing ARG declaration causes empty variable expansion

High Severity

The PYTHON_DEPSET ARG default value uses $PYTHON, but PYTHON is never declared as an ARG in this Dockerfile. Without an ARG PYTHON declaration, the variable expands to an empty string, causing the path to resolve to python/deplocks/ci/data_base_depset_py.lock (missing the version number). The wanda file also doesn't pass PYTHON or PYTHON_DEPSET as a build argument to override this default.

Fix in Cursor Fix in Web

@aslonnie aslonnie marked this pull request as draft January 31, 2026 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

2 participants