Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healthcheck fails when JetStream account is removed from configuration #5459

Open
pcsegal opened this issue May 21, 2024 · 6 comments
Open
Labels
defect Suspected defect such as a bug or regression stale This issue has had no activity in a while

Comments

@pcsegal
Copy link

pcsegal commented May 21, 2024

Observed behavior

When I do the following:

  • Create NATS a cluster with JetStream enabled, configured with an account named acc1.
  • Create a KV bucket in acc1.
  • Stop NATS server.
  • Change configuration, removing account acc1 and adding account acc2.
  • When reloading the server, the node that contained the KV bucket replica fails with the error below:
[970687] 2024/05/20 15:38:10.744645 [WRN] Healthcheck failed: "JetStream can not lookup account \"acc1\": account missing"

Expected behavior

NATS should still load if an account is removed from the configuration.

Server and client version

Server: 2.10.14
Client: 0.1.1

Host environment

Ubuntu 20.04, amd64.

Steps to reproduce

Here is a gist with an example reproducing the issue:

https://gist.github.com/pcsegal/532d15b827d9b13f8a1456e95f1ebc52

The script test-cluster.sh and the accompanying files should all be in the same directory.

The script runs through the described scenario.

In the end, the node in which the KV bucket was placed should be unable to load. It should show the following warning in the logs:

[970687] 2024/05/20 15:38:10.744645 [WRN] Healthcheck failed: "JetStream can not lookup account \"acc1\": account missing"

In turn, the other nodes will show the following warning:

Update Stream Account acc1, error on lookup: account missing
@pcsegal pcsegal added the defect Suspected defect such as a bug or regression label May 21, 2024
@pcsegal pcsegal changed the title Healtcheck fails when JetStream account is removed from configuration May 21, 2024
@derekcollison
Copy link
Member

As the system user do the following.

nats server account purge acc1

@pcsegal
Copy link
Author

pcsegal commented May 21, 2024

Thank you.

If I understand correctly, this needs to be run before I remove the account from the configuration, right?

@derekcollison
Copy link
Member

It can be run at any time, so if you run it now it will instruct the system to remove any jetstream artifacts from that account that are still on the system.

@pcsegal
Copy link
Author

pcsegal commented May 28, 2024

Thank you.

So, in a situation where accounts represent tenancies that can be decommissioned, forgetting to purge the account first could lead to downtime, if some stream with only 1 replica happens to live in the node that failed the healthcheck.

If I want to automate account purging, can something like the NACK operator help here? I see that NACK allows managing accounts via CRDs.

@Jarema
Copy link
Member

Jarema commented May 28, 2024

NACK does not allow purging accounts.

However, you can achieve that programatically by sending a Request to $JS.API.ACCOUNT.PURGE.{ACC_NAME} in the client, using a System account. That achieves the same result as the CLI call.

@pcsegal
Copy link
Author

pcsegal commented May 28, 2024

Thank you; how about purging streams? Would NACK help with purging individual streams (rather than the entire account) when a stream CRD is deleted?

@github-actions github-actions bot added the stale This issue has had no activity in a while label Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression stale This issue has had no activity in a while
3 participants