-
Notifications
You must be signed in to change notification settings - Fork 7.2k
[DNR][deps] generating data depsets #60616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the data CI image build process to use depsets, which is a good step towards better dependency management. It also includes several dependency version upgrades. I've found a couple of critical issues in the Wanda configuration files that will likely cause build failures. There's also a minor improvement that can be made to the new Dockerfile for clarity and best practices.
| @@ -0,0 +1,9 @@ | |||
| name: "databuild-py$PYTHON" | |||
| froms: ["cr.ray.io/rayproject/oss-ci-base_ml-py$PYTHON"] | |||
| dockerfile: ci/docker/data.build.Dockerfile | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| name: "datamongobuild-py$PYTHON" | ||
| froms: ["cr.ray.io/rayproject/oss-ci-base_ml-py$PYTHON"] | ||
| dockerfile: ci/docker/data.build.Dockerfile | ||
| dockerfile: t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - name: databuild-multipy | ||
| label: "wanda: databuild-py{{matrix}}" | ||
| wanda: ci/docker/data.build.wanda.yaml | ||
| wanda: ci/docker/data.base.build.wanda.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ci/docker/data.base.build.Dockerfile
Outdated
|
|
||
| SHELL ["/bin/bash", "-ice"] | ||
|
|
||
| COPY . . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The COPY . . command seems redundant as the required lock file is already explicitly copied on line 10. This broad COPY could unintentionally bring in other files from the build context, potentially increasing image size or causing unexpected behavior. For better clarity, hermeticity, and caching, it's best to remove this line if it's not strictly necessary for other files not declared in the wanda.yaml srcs.
| name: "datamongobuild-py$PYTHON" | ||
| froms: ["cr.ray.io/rayproject/oss-ci-base_ml-py$PYTHON"] | ||
| dockerfile: ci/docker/data.build.Dockerfile | ||
| dockerfile: t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dockerfile path set to invalid value t
High Severity
The dockerfile field in datamongo.build.wanda.yaml is changed to t, which is not a valid Dockerfile path. This appears to be an accidental edit (possibly partial deletion). The previous value was ci/docker/data.build.Dockerfile. This will cause the datamongobuild-multipy CI build step to fail when trying to locate the Dockerfile.
| @@ -0,0 +1,9 @@ | |||
| name: "databuild-py$PYTHON" | |||
| froms: ["cr.ray.io/rayproject/oss-ci-base_ml-py$PYTHON"] | |||
| dockerfile: ci/docker/data.build.Dockerfile | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanda config references wrong Dockerfile for depset build
High Severity
The new data.base.build.wanda.yaml references ci/docker/data.build.Dockerfile instead of the newly created ci/docker/data.base.build.Dockerfile. The wanda config includes data_base_depset_py$PYTHON.lock as a source, but the referenced Dockerfile uses install-dependencies.sh instead of the depset lock file. The lock file will be copied but never used.
| srcs: | ||
| - python/deplocks/ci/data_base_depset_py$PYTHON.lock | ||
| build_args: | ||
| - DOCKER_IMAGE_BASE_BUILD=cr.ray.io/rayproject/oss-ci-base_ml-py$PYTHON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing PYTHON_DEPSET build argument in wanda config
Medium Severity
The data.base.build.wanda.yaml does not pass PYTHON_DEPSET as a build argument, so the Dockerfile would use its default value. However, the default python/deplocks/ci/data_base_depset_py$PYTHON.lock references $PYTHON which is never defined as a Dockerfile ARG, causing it to expand to an empty string. The resulting path data_base_depset_py.lock (without version) won't match the actual lock file. The similar datatfds.build.wanda.yaml correctly passes PYTHON_DEPSET explicitly.
Additional Locations (1)
| - python/requirements/ml/data-test-requirements.txt | ||
| - python/requirements/data/data-base.txt | ||
| constraints: | ||
| - /tmp/ray-deps/requirements_compiled_py3.13.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded Python 3.13 constraints for 3.10/3.12 builds
Medium Severity
The constraints path hardcodes requirements_compiled_py3.13.txt but the depset builds target Python 3.10 and 3.12 (via build_arg_sets). All other depset configs in rayimg.depsets.yaml use ${PYTHON_VERSION} to match the constraints file to the target Python version. Using 3.13 constraints for 3.10/3.12 builds could result in dependency versions that are incompatible with those Python versions.
| if [[ -n "$ARROW_MONGO_VERSION" ]]; then | ||
| # Older versions of Arrow Mongo require an older version of NumPy. | ||
| pip install numpy==1.23.5 | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code conditional branches never execute
Low Severity
The conditional blocks for ARROW_MONGO_VERSION and RAY_CI_JAVA_BUILD will never execute because the corresponding data.base.build.wanda.yaml doesn't pass these build args. Both ARGs default to empty, so the if [[ -n "$ARROW_MONGO_VERSION" ]] and if [[ $RAY_CI_JAVA_BUILD == 1 ]] conditions always evaluate to false. This appears to be code copied from data.build.Dockerfile that isn't needed for this simpler depset-based build.
Additional Locations (1)
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
|
|
||
| ARG ARROW_MONGO_VERSION= | ||
| ARG RAY_CI_JAVA_BUILD= | ||
| ARG PYTHON_DEPSET=python/deplocks/ci/data_base_depset_py$PYTHON.lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing ARG declaration causes empty variable expansion
High Severity
The PYTHON_DEPSET ARG default value uses $PYTHON, but PYTHON is never declared as an ARG in this Dockerfile. Without an ARG PYTHON declaration, the variable expands to an empty string, causing the path to resolve to python/deplocks/ci/data_base_depset_py.lock (missing the version number). The wanda file also doesn't pass PYTHON or PYTHON_DEPSET as a build argument to override this default.


Generating data depsets for data ci images