Skip to content

Enable quiescence in production and add timeout config #10001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

yyforyongyu
Copy link
Member

This PR enables the quiescence protocol to run in non-dev env. A new timeout config value is introduced to replace the old default value, which controls how long the channel can be quiescent. In addition, the itest is fixed and updated to check the timeout behavior.

@yyforyongyu yyforyongyu added this to the v0.20.0 milestone Jun 27, 2025
@yyforyongyu yyforyongyu requested a review from Copilot June 27, 2025 13:23
@yyforyongyu yyforyongyu self-assigned this Jun 27, 2025
@yyforyongyu yyforyongyu added dynamic commitments size/micro small bug fix or feature, less than 15 mins of review, less than 250 labels Jun 27, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables the quiescence protocol to run in non-development environments, replacing the old fixed timeout with a configurable value. Key changes include passing the new timeout config throughout peer connection, updating the configuration and validation for the htlcswitch, and revising integration tests to verify the timeout behavior.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
server.go Added QuiescenceTimeout propagation in peerConnected configuration.
sample-lnd.conf Documented the new config option with usage guidance.
peer/brontide.go Updated Config to include the QuiescenceTimeout value in link creation.
lncfg/protocol.go Adjusted NoQuiescence() to enable quiescence in production.
lncfg/htlcswitch.go Introduced new timeout constants, config field, and validation logic.
itest/lnd_quiescence_test.go Expanded integration tests to assert timeout behavior and node reconnection.
htlcswitch/quiescer.go Updated error message formatting and removed unused timeout constant.
htlcswitch/link.go Updated ChannelLinkConfig and link initialization to use the new timeout.
docs/release-notes/release-notes-0.20.0.md Added release note documentation for the new config.
config.go Set default QuiescenceTimeout value in the overall config.
Copy link
Contributor

coderabbitai bot commented Jun 27, 2025

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)
  • llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.
@yyforyongyu yyforyongyu force-pushed the finalize-stfu branch 6 times, most recently from 4951959 to 4240edd Compare July 1, 2025 12:00

// DefaultQuiescenceTimeout specifies the default value to be used for
// `QuiescenceTimeout`.
DefaultQuiescenceTimeout = 60 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the current behavior if this is hit? Do we drop the connection, then try to reconnect again? Or is there some messages that both sides send to open things up again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the link we will simply disconnect. then due to the connection is persistent as we have a channel, both sides will try to reconnect.

@@ -147,7 +147,7 @@ func (l *ProtocolOptions) NoExperimentalEndorsement() bool {

// NoQuiescence returns true if quiescence is disabled.
func (l *ProtocolOptions) NoQuiescence() bool {
return true
return false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to enable this for non-dev envs if nothing user facing uses it yet?

Also where do we even check this value? IIUC there's no standalone feature bit. The RPC call doesn't check the value:

// Quiesce initiates the quiescence process for the channel with the given
// channel ID. This method will block until the channel is fully quiesced.
func (s *Server) Quiesce(_ context.Context, in *QuiescenceRequest) (
*QuiescenceResponse, error) {
txid, err := lnrpc.GetChanPointFundingTxid(in.ChanId)
if err != nil {
return nil, err
}
op := wire.NewOutPoint(txid, in.ChanId.OutputIndex)
cid := lnwire.NewChanIDFromOutPoint(*op)
ln, err := s.cfg.Switch.GetLink(cid)
if err != nil {
return nil, err
}
select {
case result := <-ln.InitStfu():
mkResp := func(b lntypes.ChannelParty) *QuiescenceResponse {
return &QuiescenceResponse{Initiator: b.IsLocal()}
}
return fn.MapOk(mkResp)(result).Unpack()
case <-s.quit:
return nil, fmt.Errorf("server shutting down")
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have this config,

lnd/peer/brontide.go

Lines 1450 to 1451 in 7be45c3

DisallowQuiescence: p.cfg.DisallowQuiescence ||
!p.remoteFeatures.HasFeature(lnwire.QuiescenceOptional),

and the DisallowQuiescence is set here,

lnd/server.go

Line 4531 in 7be45c3

DisallowQuiescence: s.cfg.ProtocolOptions.NoQuiescence(),

Do we need to enable this for non-dev envs if nothing user facing uses it yet?

Can also update it once dyn is in place tho, was just preparing for dyn.

//
// TODO(yy): When refactoring the peer connection we need to have an RPC
// that reports how many times the peer has been connected.
ht.AssertConnected(alice, bob)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice didn't know that, yeah updated!

@Roasbeef
Copy link
Member

Roasbeef commented Jul 2, 2025

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables the quiescence protocol in production environments and introduces a new configuration option, --htlcswitch.quiescencetimeout, to control the maximum duration a channel can be quiescent. The changes are well-implemented, propagating the new configuration value from the top-level config down to the channel link where it's used. The integration test has been significantly improved to cover the new timeout behavior, including node restarts and disconnects due to timeouts. The documentation, including release notes and the sample configuration file, has been updated accordingly. The code is clean and the changes are consistent with the PR's objective. I have one minor suggestion for improving the grammar in the release notes for better clarity.

@yyforyongyu
Copy link
Member Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables the quiescence protocol in production environments and introduces a configurable timeout for it. The changes are well-implemented across the configuration, peer management, and HTLC switch logic. The integration test has been significantly improved to cover the new timeout functionality, including peer disconnection and reconnection scenarios.

I've found a couple of minor documentation issues in comments that could be improved for clarity. Other than that, the changes look solid and well-tested.

This commit makes removes the `defaultQuiescenceTimeout` and makes it
configurable as different nodes have different network environment. In
addition the default timeout has been increased from 30s to 60s.
The original test lets Bob send an HTLC to Alice, but Bob doesn't have
any balance to begin with. This commit now fixes it by explicitly
checking sending payment is allowed before quiescence, and forbidden
after.
@yyforyongyu
Copy link
Member Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables the quiescence protocol for production use and introduces a configurable timeout. The changes are well-implemented across the configuration, peer management, and htlcswitch components. The integration test has been substantially improved to validate the new timeout behavior and various quiescence scenarios. I've left a few minor comments regarding documentation and test code clarity. Overall, this is a solid contribution.

@yyforyongyu yyforyongyu requested a review from ellemouton July 2, 2025 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dynamic commitments size/micro small bug fix or feature, less than 15 mins of review, less than 250
2 participants