Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streams getting marked as orphaned and deleted when transitioning from one non-clustered server to a cluster #5467

Open
anthonyjacques20 opened this issue May 23, 2024 · 6 comments
Labels
defect Suspected defect such as a bug or regression stale This issue has had no activity in a while

Comments

@anthonyjacques20
Copy link

Observed behavior

Streams getting marked as orphaned and deleted when transitioning from one non-clustered server to a cluster even though the account information is the same between the single non-clustered server and the cluster.

We also noticed that if we go from clustered servers down to one non-clustered server and then back to clustered servers, the data does not get orphaned. We made sure that the data was stored in the same spot when going from cluster to one non-clustered server and could see the data when running the one non-clustered server. This behavior seems to contradict going from one non-clustered server to a cluster.

Expected behavior

I don't expect any streams to get orphaned and deleted if the account the stream is tied to is still valid even if going from non-clustered server to clustered servers.

Server and client version

Server version: 2.10.9 and 2.10.15 with same behavior
CLI version: 0.0.35

Host environment

First saw error on linux in a helm deployment but for was able to reproduce on my intel based mac so don't think it's environment specific

Steps to reproduce

  1. Start non-clustered server (using n1_nocluster.txt)
  2. Add stream
nats stream add test
? Subjects test.>
? Storage file
? Replication 1
? Retention Policy Limits
? Discard Policy Old
? Stream Messages Limit -1
? Per Subject Messages Limit -1
? Total Stream Size -1
? Message TTL -1
? Max Message Size -1
? Duplicate tracking time window 2m0s
? Allow message Roll-ups No
? Allow message deletion Yes
? Allow purging subjects or the entire stream Yes
Stream test was created

Information for Stream test created 2024-05-21 20:50:25

             Subjects: test.>
             Replicas: 1
              Storage: File

Options:

            Retention: Limits
     Acknowledgements: true
       Discard Policy: Old
     Duplicate Window: 2m0s
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false

Limits:

     Maximum Messages: unlimited
  Maximum Per Subject: unlimited
        Maximum Bytes: unlimited
          Maximum Age: unlimited
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited


State:

             Messages: 0
                Bytes: 0 B
             FirstSeq: 0
              LastSeq: 0
     Active Consumers: 0
  1. Add data to stream
nats pub test.hello "{{ Random 100 1000 }}" -H Count:{{Count}} --count 100

100 / 100 [======================================================================================]    0s
  1. Shut down non-clustered server
  2. Start all 3 clustered servers
  1. Verify test stream was orphaned and deleted via logs
[WRN] Detected orphaned stream '$G > test', will cleanup
  1. Verify test stream was deleted via nats stream report
nats stream report
Obtaining Stream stats

No Streams defined
@anthonyjacques20 anthonyjacques20 added the defect Suspected defect such as a bug or regression label May 23, 2024
@derekcollison
Copy link
Member

Moving from non-clustered to clustered you should create a snapshot of all assets from your account using NATS cli.

nats account snapshot <dir to store snapshots> use -h to see additional options.

@anthonyjacques20
Copy link
Author

Definitely, we missed this part of the process in one of our upgrades.

But is this expected behavior? It seems odd that going cluster -> non-cluster -> cluster doesn't orphan any streams but going from non-cluster -> cluster does.

@derekcollison
Copy link
Member

Yes expected since when the server starts in clustered mode it tries to resolve the meta layer assignments to local assets. You will not have any meta layer assignments so it sees any assets as orphans..

@anthonyjacques20
Copy link
Author

Gotcha, can we make it not do that? From the end user perspective, if the accounts are the same and the data is there, not sure why we would expect the data to be deleted when going from non-clustered to clustered?

@derekcollison
Copy link
Member

They are very different setups in how they work. In clustered mode there is a meta layer that is shared and replicated between all servers in the system. It's like the DNA for the system. That is not needed in single server mode.

We could look into it, but would be low priority since folks usually just snapshot and restore from a single server into a cluster which is what we recommend.

@anthonyjacques20
Copy link
Author

Yeah, understood...I figured there was a good technical reason.
Is it documented anywhere that it is expected that the data will be deleted? I don't recall reading it anywhere and if this isn't going to be changed (or at least not in the short term), can we add it somewhere? Because I think "recommending a snapshot" and "your data will be deleted" will solicit very different responses. I think here in the k8s repo would be really useful.

I know it's not a super common pattern but deleting the data when enabling clustering was a very unexpected behavior that has (and had) huge repercussions.

@github-actions github-actions bot added the stale This issue has had no activity in a while label Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression stale This issue has had no activity in a while
2 participants