Skip to content

[libcu++] Additional peer device copy testing#9636

Open
pciolkosz wants to merge 1 commit into
NVIDIA:mainfrom
pciolkosz:cccl_rt_cross_device_extra_tests
Open

[libcu++] Additional peer device copy testing#9636
pciolkosz wants to merge 1 commit into
NVIDIA:mainfrom
pciolkosz:cccl_rt_cross_device_extra_tests

Conversation

@pciolkosz

Copy link
Copy Markdown
Contributor

This PR adds testing for copy_bytes and buffer copy constructor across peer devices

@pciolkosz pciolkosz requested a review from a team as a code owner June 30, 2026 05:07
@pciolkosz pciolkosz requested a review from wmaxey June 30, 2026 05:07
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 30, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 30, 2026
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cbf59bba-5261-49d9-9587-f47d4beaaaf6

📥 Commits

Reviewing files that changed from the base of the PR and between c054ead and 5cb667d.

📒 Files selected for processing (2)
  • libcudacxx/test/libcudacxx/cuda/ccclrt/algorithm/copy.cu
  • libcudacxx/test/libcudacxx/cuda/containers/buffer/copy.cu
🚧 Files skipped from review as they are similar to previous changes (2)
  • libcudacxx/test/libcudacxx/cuda/ccclrt/algorithm/copy.cu
  • libcudacxx/test/libcudacxx/cuda/containers/buffer/copy.cu

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Added multi-GPU test coverage for copying between peer GPUs using both byte-copy and buffer APIs.
    • Validates stream-ordered behavior and memory-pool-backed peer accessibility where supported.
  • Bug Fixes
    • Improved multi-GPU pinned host buffer handling in copy tests to use safer unsynchronized access reads.
    • Added additional safeguards to skip peer and memory-pool-dependent tests when the environment doesn’t support the required features.

Walkthrough

Fixes host pinned buffer accesses in an existing copy_bytes test to use get_unsynchronized(0). Adds two new multi-GPU peer-device copy tests: one for copy_bytes with copy_configuration location hints, and one for cuda::make_buffer with pool-based cross-device access.

Changes

Multi-GPU peer copy tests

Layer / File(s) Summary
Fix get_unsynchronized in existing test
libcudacxx/test/libcudacxx/cuda/ccclrt/algorithm/copy.cu
Replaces data()[0] with get_unsynchronized(0) for host pinned buffer write and read in the existing stream-device copy_bytes test.
New copy_bytes peer device test
libcudacxx/test/libcudacxx/cuda/ccclrt/algorithm/copy.cu
Adds "copy_bytes can copy between peer device buffers": skips without 2+ GPUs, peer access, or memory pools; sets up per-device streams and pools with cross-device access; copies host→device, peer device→device, and device→host; then asserts the result.
New cuda::make_buffer peer device test
libcudacxx/test/libcudacxx/cuda/containers/buffer/copy.cu
Adds cuda/memory_pool include and a new test for peer-device cuda::make_buffer copying with pool access checks, allocation validation, content comparison, and stream synchronization.

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 51e6ac2b-ecff-4192-86c1-fc0fd1cea10d

📥 Commits

Reviewing files that changed from the base of the PR and between e1cb696 and c054ead.

📒 Files selected for processing (2)
  • libcudacxx/test/libcudacxx/cuda/ccclrt/algorithm/copy.cu
  • libcudacxx/test/libcudacxx/cuda/containers/buffer/copy.cu
Comment on lines +161 to +178
if (cuda::devices.size() < 2)
{
return;
}

cuda::device_ref source_device{0};
auto peers = source_device.peers();
if (peers.empty())
{
return;
}

cuda::device_ref destination_device = peers.front();
if (!source_device.attribute(cuda::device_attributes::memory_pools_supported)
|| !destination_device.attribute(cuda::device_attributes::memory_pools_supported))
{
return;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

important: Add a short comment for each early-return skip path here. These branches silently skip the test when multi-GPU peer copies or memory pools are unavailable, and the test rules require unsupported or skipped cases to be motivated. As per coding guidelines, "If a test is unsupported, expected to fail, disabled, or skipped on a platform, motivate it with a comment."

Source: Coding guidelines

Comment thread libcudacxx/test/libcudacxx/cuda/containers/buffer/copy.cu Outdated
@github-actions

This comment has been minimized.

@pciolkosz pciolkosz force-pushed the cccl_rt_cross_device_extra_tests branch from c054ead to 5cb667d Compare June 30, 2026 17:04
@github-actions

Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 4h 23m: Pass: 100%/71 | Total: 1d 00h | Max: 1h 05m | Hits: 99%/326659

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant