[Bug] The consumer minOffSet keeps no change before restart container #8327

tongtaodragon · 2024-06-24T13:05:58Z

Before Creating the Bug Report

I found a bug, not just asking a question, which should be created in GitHub Discussions.
I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.
I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.

Runtime platform environment

Redhat 8.0

RocketMQ version

Broker: 4.8.0
Client SDK: 4.9.3

JDK Version

Orace JDK 1.8.0_121

Describe the Bug

In our product environment, we met this issue for several times but can't reproduce always. It happened during upgrading. We have several container instances in environment. During upgrade we adopt rolling upgrade. When it happened, the consumer minOffset of one queue in one broker always keeps fixed value, but the consumer still can consumer later messages successfully. And the number of cumulative messages grow bigger. After we restart the problematic consumer instance, this issue disappeared.

In broker log, we didn't find any warn or error messages related to this.
In client log, we didn't find any warn or error messages related to this. Since the number of cumulative messages is larger than 2000 which is larger than maxSpan so in client log, some warn flow control log message logged.
We compared the cleanExpireMessage thread between problematic and normal instance, cleanExpiredMessage thread work well. We found an old issue which is related to cleanExpiredMessage thread, it will result in this issue, but it was fixed in SDK 4.9.3.

Steps to Reproduce

We doubt one consume thread exit abnormally and doesn't set ConsumeStartTimeStamp. Then we reproduce it using below hack method.

In ConsumeMessageConcurrentlyService.java, we change code.

Add one member as below
private static int count = 0;
Add code in methos run of class ConsumeRequest as below
@OverRide
public void run() {
if (this.processQueue.isDropped()) {
log.info("xxxx");
return;
}

// Add below code
if (count == 0) {
count++;
return;
}
Recompile rocketmq-client jar and install
Produce some messages in queue
Start one consumer instance, then we can found issue happened.

What Did You Expect to See?

After some time, this queue will be normal, minOffset keep grow normally and cumulative messages be normal. We don't want to restart consumer instance since it will consume old messages again.

What Did You See Instead?

minOffset keep a fixed value and no grow, the cumulative messages grow bigger.

Additional Context

In our case

clustering consume
DefaultPushConsume
Concurrently consume

tongtaodragon · 2024-06-25T11:50:43Z

The issue confirmed in production environment. We got live dump and searched the message which stucked for several days. In the msg properties, there is no CONSUME_START_TIME property. We don't know the reason why consume thread exit abnormally still.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] The consumer minOffSet keeps no change before restart container #8327

[Bug] The consumer minOffSet keeps no change before restart container #8327

tongtaodragon commented Jun 24, 2024

tongtaodragon commented Jun 25, 2024

[Bug] The consumer minOffSet keeps no change before restart container #8327

[Bug] The consumer minOffSet keeps no change before restart container #8327

Comments

tongtaodragon commented Jun 24, 2024

Before Creating the Bug Report

Runtime platform environment

RocketMQ version

JDK Version

Describe the Bug

Steps to Reproduce

What Did You Expect to See?

What Did You See Instead?

Additional Context

tongtaodragon commented Jun 25, 2024