Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Timed task not trigger #16197

Open
2 of 3 tasks
1032851561 opened this issue Jun 21, 2024 · 14 comments
Open
2 of 3 tasks

[Bug] Timed task not trigger #16197

1032851561 opened this issue Jun 21, 2024 · 14 comments
Assignees

Comments

@1032851561
Copy link

1032851561 commented Jun 21, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

image

image

[INFO] 2024-06-21 16:57:01.556 +0800 o.a.d.s.q.QuartzScheduler:[104] - Add job, job name: job_39, group name: jobgroup_1
[INFO] 2024-06-21 16:57:01.606 +0800 o.a.d.s.q.QuartzScheduler:[137] - schedule job trigger, triggerName: job_39, triggerGroupName: jobgroup_1, cronExpression: 10 * * * * ? *, startDate: Fri Jun 21 16:57:01 CST 2024, endDate: Wed Jun 21 00:00:00 CST 2124

My timed task add success but never trigger

What you expected to happen

The task should be triggered every minute.

How to reproduce

Just create a 'shell' task , print some message , online this timed task.

Anything else

No response

Version

3.2.x

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@1032851561 1032851561 added bug Something isn't working Waiting for reply Waiting for reply labels Jun 21, 2024
@ruanwenjun ruanwenjun removed the Waiting for reply Waiting for reply label Jun 21, 2024
@ruanwenjun
Copy link
Member

ruanwenjun commented Jun 21, 2024

Is there any error log in master? or error command in t_ds_error_command?
You can get the scheduler count metrics by ds_master_quartz_job_executed

@ruanwenjun ruanwenjun self-assigned this Jun 21, 2024
@1032851561
Copy link
Author

Is there any error log in master? or error command in t_ds_error_command? You can get the scheduler count metrics by ds_master_quartz_job_executed

  1. no error log in master and api service
  2. two records in t_ds_error_command, I had deleted, but not work.
  3. not found metrics of ds_master_quartz_job_executed , just found ds_master_consume_command_count_total{application="master-server",} 0.0
  4. ProcessScheduleTask#executeInternal is not running yet. Is it running in dolphinscheduler-api server?
  5. after many cycles have passed, there is not new record generated in table 't_ds_process_instance' .
@ruanwenjun
Copy link
Member

ruanwenjun commented Jun 22, 2024

ProcessScheduleTask#executeInternal is running on master. You need to provide more information, e.g. your cluster information, is this bug can reproduce?

@1032851561
Copy link
Author

1032851561 commented Jun 23, 2024

ProcessScheduleTask#executeInternal is running on master. You need to provide more information, e.g. your cluster information, is this bug can reproduce?

The bug is alway exist. All timed job not trigger.
My cluster: docker deployment , 1 master ,1 worker ,1 apiserver , postgresql database

The process goes like this:

  1. All timed tasks are normal for a long time.
  2. Dorck exception , master,worker,api server down.
  3. Start cluster, all task not trigger.
  4. found error in master , some log like 'Master handle command xxx error '
  5. Manually changing the record xxx in the tds_process_instance table : state -> 7
  6. Delete all records in t_ds_error_command
  7. Restart master , have not error log any more.

I can't see the log of ProcessScheduleTask in master: scheduled fire time :{}, fire time......, so is quartz something wrong?

image
image

master.log

@1032851561
Copy link
Author

ProcessScheduleTask#executeInternal is running on master. You need to provide more information, e.g. your cluster information, is this bug can reproduce?

I try to debug master:
image

run the sql directly:
image

@ruanwenjun
Copy link
Member

4. found error in master , some log like 'Master handle command xxx error '

This is caused by master handle command failed, you can find the reason from t_ds_error_command or master error log

@1032851561
Copy link
Author

1032851561 commented Jun 24, 2024

  1. found error in master , some log like 'Master handle command xxx error '

This is caused by master handle command failed, you can find the reason from t_ds_error_command or master error log

image

#16197 (comment)

@ruanwenjun
Copy link
Member

  1. found error in master , some log like 'Master handle command xxx error '

This is caused by master handle command failed, you can find the reason from t_ds_error_command or master error log

image

#16197 (comment)

If you delete the records from t_ds_error_command, then you cannot find out the reason why the command handle failed. I am not clear why you delete these, these will not affect the system.

@1032851561
Copy link
Author

  1. found error in master , some log like 'Master handle command xxx error '

This is caused by master handle command failed, you can find the reason from t_ds_error_command or master error log

image
#16197 (comment)

If you delete the records from t_ds_error_command, then you cannot find out the reason why the command handle failed. I am not clear why you delete these, these will not affect the system.

My problem is not why the command handler failed . Instead, ProcessScheduleTask why doesn't execute, this is a quratz job ,it not trigger.

please see this : #16197 (comment)

@ruanwenjun
Copy link
Member

  1. found error in master , some log like 'Master handle command xxx error '

This is caused by master handle command failed, you can find the reason from t_ds_error_command or master error log

image
#16197 (comment)

If you delete the records from t_ds_error_command, then you cannot find out the reason why the command handle failed. I am not clear why you delete these, these will not affect the system.

My problem is not why the command handler failed . Instead, ProcessScheduleTask why doesn't execute, this is a quratz job ,it not trigger.

please see this : #16197 (comment)

I'm still not sure what your problem is at the moment, right now ds process timing task will have two steps:

  1. Generate command by quartz task
  2. Execute the command.

You means the step one is wrong? There are many reason may cause the step one not execute.
e.g. quartz metadata is incorrect, quartz main thread is block, db lock.
You can find some detail from the log and check if there exist dead lock in db.

@1032851561
Copy link
Author

  1. found error in master , some log like 'Master handle command xxx error '

This is caused by master handle command failed, you can find the reason from t_ds_error_command or master error log

image
#16197 (comment)

If you delete the records from t_ds_error_command, then you cannot find out the reason why the command handle failed. I am not clear why you delete these, these will not affect the system.

My problem is not why the command handler failed . Instead, ProcessScheduleTask why doesn't execute, this is a quratz job ,it not trigger.
please see this : #16197 (comment)

I'm still not sure what your problem is at the moment, right now ds process timing task will have two steps:

  1. Generate command by quartz task
  2. Execute the command.

You means the step one is wrong? There are many reason may cause the step one not execute. e.g. quartz metadata is incorrect, quartz main thread is block, db lock. You can find some detail from the log and check if there exist dead lock in db.

Yes, step one is wrong , it is never tigger. Quartz main thread is running , it query the table qrtz_triggers to find some timed job has triggered. When I debug the master service remotely, the code shows 0 records, but running the sql directly in the database shows 3 records.

@ruanwenjun
Copy link
Member

  1. found error in master , some log like 'Master handle command xxx error '

This is caused by master handle command failed, you can find the reason from t_ds_error_command or master error log

image
#16197 (comment)

If you delete the records from t_ds_error_command, then you cannot find out the reason why the command handle failed. I am not clear why you delete these, these will not affect the system.

My problem is not why the command handler failed . Instead, ProcessScheduleTask why doesn't execute, this is a quratz job ,it not trigger.
please see this : #16197 (comment)

I'm still not sure what your problem is at the moment, right now ds process timing task will have two steps:

  1. Generate command by quartz task
  2. Execute the command.

You means the step one is wrong? There are many reason may cause the step one not execute. e.g. quartz metadata is incorrect, quartz main thread is block, db lock. You can find some detail from the log and check if there exist dead lock in db.

Yes, step one is wrong , it is never tigger. Quartz main thread is running , it query the table qrtz_triggers to find some timed job has triggered. When I debug the master service remotely, the code shows 0 records, but running the sql directly in the database shows 3 records.

Is the date is correct of the master machine?

@1032851561
Copy link
Author

  1. found error in master , some log like 'Master handle command xxx error '

This is caused by master handle command failed, you can find the reason from t_ds_error_command or master error log

image
#16197 (comment)

If you delete the records from t_ds_error_command, then you cannot find out the reason why the command handle failed. I am not clear why you delete these, these will not affect the system.

My problem is not why the command handler failed . Instead, ProcessScheduleTask why doesn't execute, this is a quratz job ,it not trigger.
please see this : #16197 (comment)

I'm still not sure what your problem is at the moment, right now ds process timing task will have two steps:

  1. Generate command by quartz task
  2. Execute the command.

You means the step one is wrong? There are many reason may cause the step one not execute. e.g. quartz metadata is incorrect, quartz main thread is block, db lock. You can find some detail from the log and check if there exist dead lock in db.

Yes, step one is wrong , it is never tigger. Quartz main thread is running , it query the table qrtz_triggers to find some timed job has triggered. When I debug the master service remotely, the code shows 0 records, but running the sql directly in the database shows 3 records.

Is the date is correct of the master machine?

The date is correct.

@ruanwenjun
Copy link
Member

@1032851561 If this occurs next time, please provide the whole log of your masters. I have no idea now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants