Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAMZA-2464: Container shuts down when task fails to remove old state checkpoint dirs #1283

Merged
merged 2 commits into from
Mar 3, 2020

Conversation

bkonold
Copy link
Contributor

@bkonold bkonold commented Feb 20, 2020

Symptom: Container shuts down with exception on invocation of TaskStorageManager.removeOldCheckpoints from TaskInstance.commit

Cause: Concurrent modification of checkpoint directories by other processes / threads may cause FileNotFoundException to be thrown and shutdown the container. IOException may be thrown for other miscellaneous failures; these should not cause the container to shutdown but be logged and allow processing to continue.

Tests: Added a unit test which fails when exception from removeOldCheckpoints is not caught.

API Changes: None
Upgrade instructions: None
Usage instructions: None

Copy link
Contributor

@prateekm prateekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. Minor logging suggestion.

@prateekm prateekm merged commit f8bfe87 into apache:master Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants