[WIP] Disagg write filecache for compaction tasks#10937
Conversation
Initialize FileCache and register remote cache paths for disaggregated storage nodes, and add dt_enable_write_filecache (default false) as the runtime gate for upcoming write-side local staging.
Propagate an optional DMFileBlockInputStreamBuilder hook through getPlacedStream and enable it from prepareMerge, prepareMergeDelta, and prepareSplitPhysical when dt_enable_write_filecache is set.
Introduce collectMetaV2MergedFilesForLocalRead to map read columns to deduplicated physical .merged S3 objects, with unit tests for noop, column filtering, and remote MetaV2 dedup behavior.
Download deduplicated .merged S3 objects before building DMFileReader when write FileCache local read is enabled, pin FileSegmentPtr for reader lifetime, and fall back to direct read on per-object failures.
Replace ProfileEvents with tiflash_storage_write_filecache_staging counters and staged-bytes counter, and update local staging unit tests accordingly.
Cover prepareMergeDelta, prepareMerge, and prepareSplitPhysical staging paths in SegmentTestS3, and fix FileCache test teardown to re-init S3FileCachePool so multiple gtests can run in one process.
Signed-off-by: JaySon-Huang <tshent@qq.com>
…logging Instrument prepareMergeDelta/Merge/SplitPhysical and remote DMFile upload with tiflash_storage_subtask_* metrics. Track remote upload duration in createNewStable and log stable column count in Segment::info().
Return prepare/remote upload durations from prepareMerge and prepareMergeDelta, and print them in segmentMerge/MergeDelta finish logs.
Track prepare and S3 upload durations in SplitInfo and print them in segmentSplit finish logs. Move getPlacedStream above the gtest visibility split in Segment.h.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds local FileCache staging for MetaV2 DMFile ChangesWrite FileCache Local Staging
Sequence Diagram(s)sequenceDiagram
participant DMFileBlockInputStreamBuilder
participant DMFileLocalStaging
participant FileCache
participant DMFileReader
DMFileBlockInputStreamBuilder->>DMFileLocalStaging: tryDownloadMetaV2MergedFilesForLocalRead(...)
DMFileLocalStaging->>FileCache: downloadImpl(..., download_type)
FileCache-->>DMFileLocalStaging: FileSegmentPtr list
DMFileBlockInputStreamBuilder->>DMFileReader: construct(..., local_read_files)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dbms/src/Storages/DeltaMerge/tests/gtest_dm_segment_s3.cpp (1)
133-150: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winRestore
dt_enable_write_filecachein teardown.Line 228 mutates a process-wide setting on
global_context, butTearDown()only clearsFileCache. That leaks the flag into later same-process gtests and can silently change read-path behavior outside this suite.Suggested fix
class SegmentTestS3 : public DB::base::TiFlashStorageTestBasic { public: SegmentTestS3() = default; @@ void SetUp() override { @@ auto & global_context = TiFlashTestEnv::getGlobalContext(); + orig_dt_enable_write_filecache = global_context.getSettingsRef().dt_enable_write_filecache; global_context.getTMTContext().initS3GCManager(nullptr); @@ void TearDown() override { + db_context->getGlobalContext().getSettingsRef().dt_enable_write_filecache = orig_dt_enable_write_filecache; shutdownWriteFileCache(); @@ bool already_initialize_data_store = false; bool already_initialize_write_ps = false; DB::PageStorageRunMode orig_mode = PageStorageRunMode::ONLY_V3; + bool orig_dt_enable_write_filecache = false; };Also applies to: 225-229
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/DeltaMerge/tests/gtest_dm_segment_s3.cpp` around lines 133 - 150, The TearDown() in gtest_dm_segment_s3.cpp leaves the process-wide dt_enable_write_filecache setting changed, which can leak into later tests. Restore the original value during teardown alongside the existing cleanup in TearDown(), using the same global_context pattern already used for orig_mode and the remote_data_store reset so the suite leaves shared state unchanged.
🧹 Nitpick comments (2)
dbms/src/Storages/DeltaMerge/File/DMFileLocalStaging.cpp (1)
36-43: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winPrecompute merged-file sizes before the collection loop.
getMergedFileSize()scansmerged_filesfor every logical subfile. On wide MetaV2 files this makes staging discovery quadratic in a background compaction path. Build anumber -> sizemap once before the loop and reuse it.♻️ Possible tweak
namespace { -std::optional<UInt64> getMergedFileSize(const DMFileMetaV2 & dmfile_meta, UInt32 number) -{ - for (const auto & merged_file : dmfile_meta.merged_files) - { - if (merged_file.number == number) - return merged_file.size; - } - return std::nullopt; -} } // namespace @@ std::unordered_map<String, LocalReadObject> objects_by_key; + std::unordered_map<UInt32, UInt64> merged_file_sizes; + merged_file_sizes.reserve(dmfile_meta->merged_files.size()); + for (const auto & merged_file : dmfile_meta->merged_files) + merged_file_sizes.emplace(merged_file.number, merged_file.size); + for (const auto & logical_filename : logical_filenames) { @@ - const auto merged_file_size = getMergedFileSize(*dmfile_meta, merged_file_info.number); - if (!merged_file_size.has_value()) + const auto size_it = merged_file_sizes.find(merged_file_info.number); + if (size_it == merged_file_sizes.end()) { @@ "merged_number={}", @@ objects_by_key.emplace( s3_fname.toFullKey(), LocalReadObject{ .s3_key = s3_fname.toFullKey(), - .file_size = merged_file_size.value(), + .file_size = size_it->second, }); }Also applies to: 82-100
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/DeltaMerge/File/DMFileLocalStaging.cpp` around lines 36 - 43, Precompute the merged-file sizes once instead of calling getMergedFileSize() for every subfile in DMFileLocalStaging::collect staging logic, since the current linear scan over dmfile_meta.merged_files makes wide MetaV2 files quadratic. Build a number-to-size lookup from dmfile_meta.merged_files before the collection loop, then reuse that map inside the loop where the logical subfiles are processed; keep the existing getMergedFileSize() behavior only if it is still needed elsewhere.dbms/src/Storages/DeltaMerge/Segment.h (1)
133-137: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winUse
Float64for the new timing fields.These new API fields are part of the storage-layer contract, so they should follow the repo’s explicit-width type convention instead of introducing raw
double.Suggested change
- double prepare_seconds = 0; - double remote_upload_seconds = 0; + Float64 prepare_seconds = 0; + Float64 remote_upload_seconds = 0;As per coding guidelines,
**/*.{cpp,h,hpp}: Use explicit width types fromdbms/src/Core/Types.h:UInt8,UInt32,Int64,Float64,String.Also applies to: 373-380, 406-413
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/DeltaMerge/Segment.h` around lines 133 - 137, The new timing fields in Segment should use the repo’s explicit-width floating type instead of raw double. Update the `prepare_seconds` and `remote_upload_seconds` members in `Segment` to `Float64`, and check the related storage-layer fields in the same diff locations so the API contract stays consistent with the `dbms/src/Core/Types.h` type convention.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dbms/src/Storages/DeltaMerge/tests/gtest_dm_segment_s3.cpp`:
- Around line 381-383: The test is asserting an absolute value for the
cumulative Prometheus metric returned by stagingAttempt(), which can vary
depending on earlier tests in the same binary. Capture a baseline value before
calling mergeDeltaToRemoteStable() or before the action that stages files, then
assert the delta from stagingAttempt() matches the expected increment instead of
comparing against zero. Use stagingAttempt(), mergeDeltaToRemoteStable(), and
readRows() as the key references when updating the test logic.
---
Outside diff comments:
In `@dbms/src/Storages/DeltaMerge/tests/gtest_dm_segment_s3.cpp`:
- Around line 133-150: The TearDown() in gtest_dm_segment_s3.cpp leaves the
process-wide dt_enable_write_filecache setting changed, which can leak into
later tests. Restore the original value during teardown alongside the existing
cleanup in TearDown(), using the same global_context pattern already used for
orig_mode and the remote_data_store reset so the suite leaves shared state
unchanged.
---
Nitpick comments:
In `@dbms/src/Storages/DeltaMerge/File/DMFileLocalStaging.cpp`:
- Around line 36-43: Precompute the merged-file sizes once instead of calling
getMergedFileSize() for every subfile in DMFileLocalStaging::collect staging
logic, since the current linear scan over dmfile_meta.merged_files makes wide
MetaV2 files quadratic. Build a number-to-size lookup from
dmfile_meta.merged_files before the collection loop, then reuse that map inside
the loop where the logical subfiles are processed; keep the existing
getMergedFileSize() behavior only if it is still needed elsewhere.
In `@dbms/src/Storages/DeltaMerge/Segment.h`:
- Around line 133-137: The new timing fields in Segment should use the repo’s
explicit-width floating type instead of raw double. Update the `prepare_seconds`
and `remote_upload_seconds` members in `Segment` to `Float64`, and check the
related storage-layer fields in the same diff locations so the API contract
stays consistent with the `dbms/src/Core/Types.h` type convention.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 30a9f205-8988-4058-8474-aedf3a6b827a
📒 Files selected for processing (22)
dbms/src/Common/TiFlashMetrics.hdbms/src/Interpreters/Settings.hdbms/src/Server/Server.cppdbms/src/Server/StorageConfigParser.cppdbms/src/Server/StorageConfigParser.hdbms/src/Server/tests/gtest_storage_config.cppdbms/src/Storages/DeltaMerge/DeltaMergeStore.hdbms/src/Storages/DeltaMerge/DeltaMergeStore_InternalSegment.cppdbms/src/Storages/DeltaMerge/File/DMFile.hdbms/src/Storages/DeltaMerge/File/DMFileBlockInputStream.cppdbms/src/Storages/DeltaMerge/File/DMFileBlockInputStream.hdbms/src/Storages/DeltaMerge/File/DMFileLocalStaging.cppdbms/src/Storages/DeltaMerge/File/DMFileLocalStaging.hdbms/src/Storages/DeltaMerge/File/DMFileReader.cppdbms/src/Storages/DeltaMerge/File/DMFileReader.hdbms/src/Storages/DeltaMerge/File/tests/gtest_dm_meta_version.cppdbms/src/Storages/DeltaMerge/Segment.cppdbms/src/Storages/DeltaMerge/Segment.hdbms/src/Storages/DeltaMerge/StableValueSpace.cppdbms/src/Storages/DeltaMerge/StableValueSpace.hdbms/src/Storages/DeltaMerge/tests/gtest_dm_segment.cppdbms/src/Storages/DeltaMerge/tests/gtest_dm_segment_s3.cpp
| segment = mergeDeltaToRemoteStable(); | ||
| ASSERT_EQ(0, stagingAttempt()); | ||
| ASSERT_EQ(100, readRows(segment)); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Use a baseline for stagingAttempt() instead of asserting zero.
Line 382 reads a cumulative Prometheus metric. If any earlier test in the same binary staged files, this becomes non-zero and the test fails even though the code under test behaved correctly.
Suggested fix
TEST_F(SegmentTestS3, WriteFileCacheEnabledWithoutFileCache)
try
{
ASSERT_EQ(FileCache::instance(), nullptr);
setDtEnableWriteFileCache(true);
+ const auto attempt_before = stagingAttempt();
segment = mergeDeltaToRemoteStable();
- ASSERT_EQ(0, stagingAttempt());
+ ASSERT_EQ(attempt_before, stagingAttempt());
ASSERT_EQ(100, readRows(segment));
}
CATCH📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| segment = mergeDeltaToRemoteStable(); | |
| ASSERT_EQ(0, stagingAttempt()); | |
| ASSERT_EQ(100, readRows(segment)); | |
| ASSERT_EQ(FileCache::instance(), nullptr); | |
| setDtEnableWriteFileCache(true); | |
| const auto attempt_before = stagingAttempt(); | |
| segment = mergeDeltaToRemoteStable(); | |
| ASSERT_EQ(attempt_before, stagingAttempt()); | |
| ASSERT_EQ(100, readRows(segment)); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@dbms/src/Storages/DeltaMerge/tests/gtest_dm_segment_s3.cpp` around lines 381
- 383, The test is asserting an absolute value for the cumulative Prometheus
metric returned by stagingAttempt(), which can vary depending on earlier tests
in the same binary. Capture a baseline value before calling
mergeDeltaToRemoteStable() or before the action that stages files, then assert
the delta from stagingAttempt() matches the expected increment instead of
comparing against zero. Use stagingAttempt(), mergeDeltaToRemoteStable(), and
readRows() as the key references when updating the test logic.
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dbms/src/Storages/KVStore/tests/gtest_kvstore_fast_add_peer.cpp`:
- Around line 15-17: The entire RegionKVStoreTestFAP suite is being compiled out
with the top-level `#if` 0, which removes coverage for stable fast-add-peer paths.
Remove the file-wide disable and keep gtest_kvstore_fast_add_peer.cpp compiling,
then mark only the genuinely flaky test cases as disabled using DISABLED_ naming
or narrower conditional guards around those specific tests. Use the
RegionKVStoreTestFAP test class and the individual test names in this file to
isolate the unstable cases while preserving restart, checkpoint, empty-segment,
existing-peer, and table-not-found coverage.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a493622c-f685-4d9a-8e8d-b69a31acfe30
📒 Files selected for processing (1)
dbms/src/Storages/KVStore/tests/gtest_kvstore_fast_add_peer.cpp
Pass DownloadType into downloadImpl and print it in Download success logs.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
dbms/src/Storages/S3/FileCache.h (1)
384-388: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winRename the new parameter to camelCase.
Line 388 introduces
download_type, which breaks the repository's C++ naming convention. Please rename it todownloadTypehere and inFileCache::downloadImplso the declaration and definition stay consistent.As per coding guidelines,
**/*.{cpp,h,hpp}: "Method and variable names should usecamelCase."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/S3/FileCache.h` around lines 384 - 388, The new parameter name in FileCache::downloadImpl uses snake_case and violates the C++ camelCase convention. Rename download_type to downloadType in both the declaration in FileCache and the corresponding FileCache::downloadImpl definition/call sites so the signature stays consistent and matches the repository style.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@dbms/src/Storages/S3/FileCache.h`:
- Around line 384-388: The new parameter name in FileCache::downloadImpl uses
snake_case and violates the C++ camelCase convention. Rename download_type to
downloadType in both the declaration in FileCache and the corresponding
FileCache::downloadImpl definition/call sites so the signature stays consistent
and matches the repository style.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 61613a14-e157-40ff-8c6f-ac6ffd538ebf
📒 Files selected for processing (2)
dbms/src/Storages/S3/FileCache.cppdbms/src/Storages/S3/FileCache.h
Collect large standalone `.dat` subfiles in addition to `.merged` blobs when preparing write-side FileCache local reads, with unit test coverage.
Signed-off-by: JaySon-Huang <tshent@qq.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
dbms/src/Storages/DeltaMerge/DeltaMergeStore_Ingest.cpp (1)
234-246: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winUse camelCase for the new local/constant names.
slow_split_ingest_log_threshold_seconds,split_ingest_log, andwatch_ingestintroduce new snake_case identifiers in a C++ file. Please rename them to match the repo’s camelCase convention.Suggested rename
-constexpr double slow_split_ingest_log_threshold_seconds = 10.0; +constexpr double slowSplitIngestLogThresholdSeconds = 10.0; struct SplitIngestLogContext { const Stopwatch & watch; - bool isSlow() const { return watch.elapsedSeconds() > slow_split_ingest_log_threshold_seconds; } + bool isSlow() const { return watch.elapsedSeconds() > slowSplitIngestLogThresholdSeconds; } double elapsedSeconds() const { return watch.elapsedSeconds(); } @@ - Stopwatch watch; - SplitIngestLogContext split_ingest_log{watch}; + Stopwatch watch; + SplitIngestLogContext splitIngestLog{watch}; @@ - split_ingest_log.debugOrInfoLevel(), + splitIngestLog.debugOrInfoLevel(), @@ - split_ingest_log.elapsedSeconds(), + splitIngestLog.elapsedSeconds(), @@ - split_ingest_log.elapsedSeconds(), + splitIngestLog.elapsedSeconds(), @@ - split_ingest_log.debugOrInfoLevel(), + splitIngestLog.debugOrInfoLevel(), @@ - split_ingest_log.elapsedSeconds(), + splitIngestLog.elapsedSeconds(), @@ - Stopwatch watch_ingest; + Stopwatch watchIngest; @@ - { GET_METRIC(tiflash_storage_subtask_duration_seconds, type_ingest).Observe(watch_ingest.elapsedSeconds()); }); + { GET_METRIC(tiflash_storage_subtask_duration_seconds, type_ingest).Observe(watchIngest.elapsedSeconds()); });As per coding guidelines,
**/*.{cpp,h,hpp}: Method and variable names should usecamelCase.Also applies to: 268-269, 625-625
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/DeltaMerge/DeltaMergeStore_Ingest.cpp` around lines 234 - 246, The new snake_case identifiers violate the C++ camelCase naming convention; rename slow_split_ingest_log_threshold_seconds, split_ingest_log, and watch_ingest to camelCase throughout DeltaMergeStore_Ingest, including any related uses in SplitIngestLogContext and the ingest logging code. Update all references consistently so the new names match the repo’s method/variable style and the code remains readable and searchable.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@dbms/src/Storages/DeltaMerge/DeltaMergeStore_Ingest.cpp`:
- Around line 234-246: The new snake_case identifiers violate the C++ camelCase
naming convention; rename slow_split_ingest_log_threshold_seconds,
split_ingest_log, and watch_ingest to camelCase throughout
DeltaMergeStore_Ingest, including any related uses in SplitIngestLogContext and
the ingest logging code. Update all references consistently so the new names
match the repo’s method/variable style and the code remains readable and
searchable.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 9fba919a-dd9a-4d3d-8304-8514fbacd4b6
📒 Files selected for processing (6)
dbms/src/Common/TiFlashMetrics.hdbms/src/Storages/DeltaMerge/DeltaMergeStore_Ingest.cppdbms/src/Storages/DeltaMerge/DeltaMergeStore_InternalSegment.cppdbms/src/Storages/DeltaMerge/File/DMFileLocalStaging.cppdbms/src/Storages/DeltaMerge/File/DMFileLocalStaging.hdbms/src/Storages/DeltaMerge/File/tests/gtest_dm_meta_version.cpp
🚧 Files skipped from review as they are similar to previous changes (5)
- dbms/src/Storages/DeltaMerge/File/DMFileLocalStaging.h
- dbms/src/Common/TiFlashMetrics.h
- dbms/src/Storages/DeltaMerge/DeltaMergeStore_InternalSegment.cpp
- dbms/src/Storages/DeltaMerge/File/DMFileLocalStaging.cpp
- dbms/src/Storages/DeltaMerge/File/tests/gtest_dm_meta_version.cpp
|
/test pull-unit-next-gen |
Signed-off-by: JaySon-Huang <tshent@qq.com>
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Contribute Code" section in the development guide. |
|
@JaySon-Huang: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: close #xxx
Problem Summary:
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit
New Features
dt_enable_write_filecache(default:false) to enable FileCache staging for disaggregated TiFlash DeltaMerge activity.Bug Fixes
Tests