Skip to content

[libcu++] Use cccl runtime in thrust tests pt 2#9633

Open
pciolkosz wants to merge 3 commits into
NVIDIA:mainfrom
pciolkosz:adopt-cccl-runtime-2800-set2
Open

[libcu++] Use cccl runtime in thrust tests pt 2#9633
pciolkosz wants to merge 3 commits into
NVIDIA:mainfrom
pciolkosz:adopt-cccl-runtime-2800-set2

Conversation

@pciolkosz

Copy link
Copy Markdown
Contributor

Continuing #9591 with second set of changes to Use CCCL Runtime in thrust tests

@pciolkosz pciolkosz requested a review from a team as a code owner June 30, 2026 02:44
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 30, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 30, 2026
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8030360d-0760-4a45-8138-05d4bb7bb432

📥 Commits

Reviewing files that changed from the base of the PR and between d65a9d3 and 39e419b.

📒 Files selected for processing (2)
  • thrust/testing/cuda/cccl_runtime_test_helper.cuh
  • thrust/testing/cuda/merge.cu
🚧 Files skipped from review as they are similar to previous changes (2)
  • thrust/testing/cuda/cccl_runtime_test_helper.cuh
  • thrust/testing/cuda/merge.cu

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Improved reliability of CUDA merge and set-intersection test validations by strengthening end-iterator and output buffer checks.
    • Reduced flaky synchronization in GPU tests by standardizing stream usage and synchronization patterns.
  • Refactor

    • Updated device-side merge and intersection kernels to use device-callable functors and the shared test runtime workflow.
    • Modernized CUDA stream-based test paths to use the unified stream abstraction and buffer utilities.
  • Chores

    • Added pinned host buffer helpers for randomized test data and generalized result comparison utilities.

Walkthrough

Adds pinned host/device buffer helpers and broadens assert_equal in the CUDA test runtime header. Updates four CUDA tests to use device functors, cuda::stream, cuda::launch, and cuda::make_device_buffer instead of raw kernels and manual stream management.

Changes

CUDA Test Runtime and Algorithm Test Refactors

Layer / File(s) Summary
Test runtime helper: buffer generators and assert_equal generalization
thrust/testing/cuda/cccl_runtime_test_helper.cuh
Adds <cuda/buffer>, pinned host buffer creation, random integer/sample buffer generators, and a generic assert_equal that indexes Expected directly.
merge.cu: functor kernel and buffer-based test refactor
thrust/testing/cuda/merge.cu
Replaces the CUDA global merge kernel with a device functor, rewrites the device test around random_integers_buffer and cuda::launch, and updates the streams test to use cuda::stream and assert_equal.
merge_by_key.cu: functor kernel and buffer-based test refactor
thrust/testing/cuda/merge_by_key.cu
Replaces the CUDA global merge-by-key kernel with a device functor, rewrites device and streams tests to use cuda::make_device_buffer, cuda::launch, cuda::stream, and assert_equal.
set_intersection.cu: functor and stream refactor
thrust/testing/cuda/set_intersection.cu
Replaces the CUDA global set-intersection kernel and updates both tests to use cuda::make_device_buffer, cuda::stream, and assert_equal.
set_intersection_by_key.cu: functor and stream refactor
thrust/testing/cuda/set_intersection_by_key.cu
Replaces the CUDA global set-intersection-by-key kernel and updates both tests to use cuda::make_device_buffer, cuda::launch, cuda::stream, and assert_equal.

Suggested reviewers

  • ericniebler
  • davebayer

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
thrust/testing/cuda/merge.cu (1)

30-32: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

suggestion: Use constexpr for these compile-time values.

n, sizes, and num_sizes are compile-time constants. As per coding guidelines, “All variables that can be evaluated at compile time must be declared constexpr.”

-  const size_t n         = 10000;
-  const size_t sizes[]   = {0, 1, n / 2, n, n + 1, 2 * n};
-  const size_t num_sizes = sizeof(sizes) / sizeof(size_t);
+  constexpr size_t n         = 10000;
+  constexpr size_t sizes[]   = {0, 1, n / 2, n, n + 1, 2 * n};
+  constexpr size_t num_sizes = sizeof(sizes) / sizeof(sizes[0]);

Source: Coding guidelines

thrust/testing/cuda/merge_by_key.cu (1)

30-30: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

suggestion: Make the returned iterator pair const.

end is not modified after initialization. As per coding guidelines, “All variables that are not modified must be declared const, including ... function return values.”

-    auto end = thrust::merge_by_key(
+    const auto end = thrust::merge_by_key(

Source: Coding guidelines

thrust/testing/cuda/set_intersection.cu (1)

28-29: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

suggestion: Declare read-only input buffers const.

a and b are only used as inputs in both tests; use const auto for these handles. As per coding guidelines, “All variables that are not modified must be declared const.”

Also applies to: 63-64

Source: Coding guidelines

thrust/testing/cuda/set_intersection_by_key.cu (1)

24-24: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

suggestion: Declare read-only locals const.

end, a_key, b_key, and a_val are not modified after initialization; use const auto. As per coding guidelines, “All variables that are not modified must be declared const.”

Also applies to: 44-46, 91-93

Source: Coding guidelines


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 81309f84-15ae-449c-add0-312e73ef848a

📥 Commits

Reviewing files that changed from the base of the PR and between e1cb696 and d65a9d3.

📒 Files selected for processing (5)
  • thrust/testing/cuda/cccl_runtime_test_helper.cuh
  • thrust/testing/cuda/merge.cu
  • thrust/testing/cuda/merge_by_key.cu
  • thrust/testing/cuda/set_intersection.cu
  • thrust/testing/cuda/set_intersection_by_key.cu
Comment thread thrust/testing/cuda/cccl_runtime_test_helper.cuh
@github-actions

Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 1h 01m: Pass: 100%/70 | Total: 19h 47m | Max: 1h 01m | Hits: 92%/152517

See results here.


for (cuda::std::size_t i = 0; i < size; ++i)
{
result[i] = static_cast<T>(unittest::generate_random_integer<RandomT>{}(static_cast<unsigned int>(first + i)));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: Isn't this extremely inefficient if size is large? This should be limited like the unittest::random_integers function

@pciolkosz pciolkosz Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by limited? random_integers just passes the size to transform and doesn't cap it

Comment on lines +77 to +79
for (cuda::std::size_t i = 0; i < size; ++i)
{
result[i] = static_cast<T>(unittest::generate_random_sample<RandomT>{}(static_cast<unsigned int>(first + i)));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants