JDK-8327860 : Java processes get killed, leaving no hs_err/stack trace on macOS 14.4
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 8,11,17,21,22
  • Priority: P1
  • Status: Resolved
  • Resolution: External
  • OS: os_x
  • CPU: aarch64
  • Submitted: 2024-03-12
  • Updated: 2024-03-28
  • Resolved: 2024-03-28
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 23
23Resolved
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8328219 :  
Description
The test java/awt/color/MTICC_ColorSpaceToFrom.java became failing after updating macOS to 14.4 with exit code: 137
The failure was detected on  aarch64. The test successfully passes on x86-64. It is also successful on the previous versions of Sonoma

Diagnostic report related to this failure looks like:

======================8<----------------------

Process:               java [4820]
Path:                  /Users/USER/*/java
Identifier:            java
Version:               ???
Code Type:             ARM-64 (Native)
Parent Process:        java [4810]
Responsible:           Terminal [2470]
User ID:               502

Date/Time:             2024-03-11 23:36:05.6272 +0000
OS Version:            macOS 14.4 (23E214)
Report Version:        12
Anonymous UUID:        53991BD1-7C07-984F-8362-CC485246D2D9


Time Awake Since Boot: 4100 seconds

System Integrity Protection: enabled

Crashed Thread:        34  Java: Thread-7825

Exception Type:        EXC_CRASH (SIGKILL)
Exception Codes:       0x0000000000000000, 0x0000000000000000

Termination Reason:    Namespace GUARD, Code 5 

. . .

Thread 34 Crashed:: Java: Thread-7825
0   libjvm.dylib                  	       0x1062d6ec0 _SafeFetchN_fault + 0
1   libjvm.dylib                  	       0x1062331a4 ObjectMonitor::TrySpin(JavaThread*) + 408
2   libjvm.dylib                  	       0x106232b44 ObjectMonitor::enter(JavaThread*) + 228
3   libjvm.dylib                  	       0x10637436c ObjectSynchronizer::enter(Handle, BasicLock*, JavaThread*) + 392
4   libjvm.dylib                  	       0x1062e6600 SharedRuntime::monitor_enter_helper(oopDesc*, BasicLock*, JavaThread*) + 156
5   libjvm.dylib                  	       0x1062e66c8 SharedRuntime::complete_monitor_locking_C(oopDesc*, BasicLock*, JavaThread*) + 76
6   ???                           	       0x1164b5bb8 ???
7   ???                           	         0x10454a8 ???

. . .

======================8<----------------------

See the attached java-2024-03-11-233605.ips

No hotspot crash logs were generated for this failure.
The failure was observed on 17, 21. Reproducibility: ~80%

It could not be reproduced on 23-ea+13, 22+36-2370 
Comments
Please see https://support.apple.com/en-us/109035
28-03-2024

Resolved - External would be better.
26-03-2024

With macOS 14.4.1 this issue no longer reproduces. Resolving as Won't Fix.
26-03-2024

Jerry, thanks for testing. Regarding 14.4.1 see https://support.apple.com/en-us/109035 macOS Sonoma 14.4.1 This update provides bug fixes for your Mac, including: ..... Apps that include Java may quit unexpectedly
26-03-2024

Using the standalone C test case, I was able to reproduce the issue while on mac Sonoma 14.4 (M1 Pro) I just completed the recently released update to 14.4.1 (1.5 GB), and I can confirm that the issue seems to be resolved. Prior to the update, I got the SIGKILL. After the update I get the expected: Bus Eror: 10
25-03-2024

tried the reproducer on both my M1 & M2 w/14.4 (SIP disabled)... failed w/SIGKILL as expected I guess I just haven't yet encountered the failure in my JVM usage. :(
17-03-2024

[~lcable] You might try Stefan K’s reproducer posted a few days ago (see comment above) to see whether SIP being enabled or disabled makes a difference. The consensus seems to be that it doesn’t make a difference but this is easy to try and should answer the question definitively. https://bugs.openjdk.org/browse/JDK-8327860?focusedId=14656913&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14656913
16-03-2024

17,19,20, & 21 ... but I suspect as [~dholmes] points out SIP may not alter r/t behaviors ... it was just a passing thought I dont wish to distract from the investigation...
15-03-2024

[~lcable] Which version of Java are you running? 17 and 21 crash fairly reliably while 23 doesn't, perhaps, by accident.
15-03-2024

typo I meant 14.4
15-03-2024

I only asked since I have 2x M1 and M2 systems with 14.4 (SIP disabled) and I have yet to see this failure mode. thx
15-03-2024

[~lcable] There are no reports like this in any configuration except on macOS 14.4.
15-03-2024

[~lcable] AFAIK SIP only controls whether unauthorised/unsigned apps can get launched, it doesn't affect execution of said apps after that.
15-03-2024

does this occur if System Integrity Protection is disabled?
14-03-2024

[~ikrylov] I don't know if you have seen this bug. With the newly released MacOS 14.4, we see a difference in behavior when reading unmapped or protected memory. In previous versions of the OS we get SIGSEGVs and SIGBUGSs, which we can catch and handle. With 14.4 the JVM gets forcefully killed with a SIGKILL. Do you have an channels in Appel that could help look into this problem?
14-03-2024

Just a progress update ... The problem arises when a memory access fault (aka page protection fault) arises when executing VM code, and in response to that macOS 14.4 sends the process a SIGKILL instead of a SIGBUS. The VM cannot catch or ignore SIGKILL so the VM process just vanishes with no hs_err file or any trace of what went wrong other than what macOS records. We discovered that if we switch the jit protection mode to EXEC rather than WRITE, then these page faults again raise the expected SIGBUS not a SIGKILL. Note the faulting accesses are predominantly reads and have nothing at all to do with accessing a MAP_JIT memory region. We have been working on a patch that switches the jit protection mode to EXEC around these potential faulting memory accesses. It has been a bit of an exercise in whack-a-mole finding them all, and testing is still in progress. This change in behavior has been reported to Apple.
14-03-2024

Updated ILW assessment after investigation and more information about this issue: ILW = HMH = P1 I: High - Crash L: Medium - fails in many code paths but not all the time W: High - no known work-around Note: this bug is specific to MacOS 14.4. As additional information comes to light about likelihood or possible work-around the ILW can be re-evaluated.
14-03-2024

Right. We have a few API:s and tests that poke in unmapped or protected memory. All those will fail in similar manners as the original SafeFetch problem and the stand-alone reproducer.
13-03-2024

Some tests fail for seemingly the same reason. For example, test/hotspot/jtreg/runtime/ErrorHandling/MachCodeFramesInErrorFile.java: Exception Type: EXC_CRASH (SIGKILL) Exception Codes: 0x0000000000000000, 0x0000000000000000 Termination Reason: Namespace GUARD, Code 5 . . . Thread 2 Crashed: 0 libjvm.dylib 0x104032a48 Unsafe_GetLong(JNIEnv_*, _jobject*, _jobject*, long) + 276
13-03-2024

macOS 14.3.1: ``` addr = 0x1022c4000 Bus error: 10 ``` macOS 14.4: ``` addr = 0x100158000 Killed: 9 ```
12-03-2024

In HotSpot, the pthread_jit_write_protect_np call is made from os::current_thread_enable_wx from Threads::create_vm.
12-03-2024

I've managed to narrow this down to this small reproducer: ``` #include <stdio.h> #include <sys/mman.h> #include <pthread.h> int main() { pthread_jit_write_protect_np(0); char* mem = (char*)mmap(0, 16 * 1024, 0, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0); fprintf(stderr, "addr = %p\n", mem); char value = *mem; fprintf(stderr, "value = %c\n", value); return 0; } ``` This results the expected SIGBUS on 14.3, but on 14.4 it results in a SIGKILL.
12-03-2024

ILW = HMM = P2
12-03-2024

I've moved this over to the HotSpot component, since this is a problem with HotSpot's SafeFetch.
12-03-2024

JDK-8320317 removes this specific call to SafeFetch. We could probably still still hit similar issues with other calls to SafeFetch
12-03-2024