JEP draft: Null-Restricted Value Class Types (Preview)

Owner	Dan Smith
Type	Feature
Scope	SE
Status	Draft
Discussion	valhalla dash dev at openjdk dot org
Effort	XL
Duration	XL
Created	2023/09/22 23:57
Updated	2023/12/20 21:18
Issue	8316779

Summary

Allow the type of a variable storing value objects to exclude null, enabling more compact storage and other optimizations at run time. This is a preview language and VM feature.

Goals

Introduce a new kind of type for value classes that excludes null from its value set, much like primitive types that cannot be null.
Allow a value class to "opt in" to the automatic creation of an appropriate default value used to initialize fields and arrays that don't store null.
Allow larger value classes to further "opt in" to non-atomic encodings in fields and arrays that don't store null.
Support compatible migration of existing classes. Apply these properties to value classes in the Java platform, including the classes used for primitive boxing.

Non-Goals

It is not a goal to support null-restricted types for identity classes or classes that do not provide a default value.

Motivation

Value objects are special objects that lack identity, and so can be freely duplicated and re-encoded by the JVM at run time. One especially useful optimization that can be applied to these objects is heap flattening, in which a reference to a value object is encoded as a compact bit vector of the object's field values, without a pointer to a different memory location. The bit vector can be stored directly in a field or an array of a value class type. This encoding strategy usually leads to smaller memory footprint and better locality than a standard encoding using heap-allocated objects and pointers.

However, in comparison to primitive types, heap flattening of value class types can be inefficient, because it must account for null references. These are typically encoded by reserving some bits for a "null flag", and those bits are then unavailable to encode the object's field values. So, for example, a boxed Integer requires 32 bits to encode the int value, and at least 1 more bit for the null flag, probably leading to a 64-bit encoding.

Further, heap flattening of value class types is limited by the integrity requirements of objects and references: the flattened data must be small enough to read and write atomically, or else the encoded data may become corrupted. On common platforms, "small enough" may mean as few as 32 or 64 bits. So while many small value classes can be flattened, most value classes that declare 2 or more fields will have to be encoded as ordinary heap objects (unless the fields store primitives of types boolean, char, byte, or short). Even a boxed Double requires at least 65 bits (counting one for a null flag), which exceeds that atomic read/write capabilities of many systems.

Primitives types do not have these constraints: a primitive-typed field is implicitly initialized to a zero value (or the equivalent) on creation, rather than null; and large primitive variables, of types long or double, are allowed to be non-atomically updated (see JLS 17.7). Thus, for example, a large array of type int has half the memory footprint of a flattened array of type Integer.

If the Java language had a type representing references to instances of a value class but not null, then there would be no need for a null flag, and the flattened storage could have a footprint no larger than the footprint of the class's fields.

This storage would need to be initialized to something, and so classes that intend to support this feature would need to allow for a default value, something like the 0 value used to initialize int-typed storage.

Some value classes might further be willing to tolerate corrupt data created by non-atomic reads and writes. Without the need to track null flags, the JVM's data integrity requirements could be relaxed, allowing classes that opt in to mimic the specified behavior of long and double. This choice would shift responsibility to their users for managing concurrency and handling any bugs arising from races.

Description

The features described below are preview features, enabled with the --enable-preview compile-time and runtime flags. More comprehensive requirements and implementation details for the language, JVM, and standard libraries can be found in subtasks of this JEP.

Null-restricted types

A null-restricted type is a reference type expressed with the name of a value class followed by the ! symbol. It asserts that the value of a given variable or expression will not be null.

Null-restricted types may appear in variable declarations, array allocations (as the component type), and casts.

IntStream printAll(Range! r) {
    for (int i = r.start; i < r.end; i++)
        System.out.println(i);
}

printAll(new Range(5, 50));
printAll(null); // compiler error

A normal class type can be converted to a null-restricted type, and vice versa, much like the type Integer can be converted to and from the type int. When converting to a null-restricted type, a null check occurs at run time.

Range r = new Range(1, 3);
printAll(r);
r = null;
printAll(r); // NullPointerException

Object o = null;
r = (Range!) o; // NullPointerException

Arrays of null-restricted types can be assigned to non-restricted supertypes, and a null check continues to be enforced at run time, similarly to other array storage checks.

Range![] a1 = new Range![3];
a1[0] = new Range(-3, 0);

Range[] a2 = a1;
a2[1] = null; // ArrayStoreException

Object[] a3 = a2;
a3[2] = new Object(); // ArrayStoreException
a3[2] = null; // ArrayStoreException

Zero instances

When objects and arrays are created, each enclosed field or array component is automatically initialized to an appropriate default value. This ensures that if the program attempts to read from the variable before its first write, a predictable value can be found (rather than, say, garbage data).

Each primitive type has a zero-like default value: 0, 0.0, false, etc. For normal reference types, the default value is null. But a variable with a null-restricted type cannot store null, so what is its default value?

Range![] a = new Range![100];
Range r = a[5]; // not null...

The answer is that the default value of a null-restricted type is a zero instance of the given value class. The zero instance is created by simply setting each of the class's instance fields to its own default value. Unlike null, the zero instance is a real, fully-functional object.

Range r = a[5];
System.out.println(r); // Range[start=0, end=0]
int size = r.size(); // 0

Implicit constructors

Notice that the zero instance of a value class is created automatically, without any execution of code in the class. Not all value classes will be comfortable with this behavior, or be willing to accept the zero instance as a valid object in their domain. For example, for an Name record with some String fields, the zero instance would be a name with all fields set to null.

value record Name(String first, String last) {
    public String toString() { return "%s %s".formatted(first, last); }
    // zero instance toString: 'null null'
}

For this reason, creation of a zero instance must be authorized by the class, and many value classes will choose not to opt in. We use a zero-argument constructor with the implicit modifier to allow zero instance to be created automatically at run time.

value record Range(int start, int end) {
    public implicit Range();
    
    public Range(int start, int end) {
        if (start > end) throw new IllegalArgumentException();
    }
}

The implicit constructor must always be public and can be invoked directly, producing a zero instance without executing any code. At run time, the implicit constructor gives the JVM permission to create zero instances without invoking a constructor at all.

If a value class declares an implicit constructor, it must not be an inner class, and its zero instance must not contain itself through circular null-restricted field types.

value class ListNode {
    implicit ListNode();

    Object val;
    ListNode! next; // error
}

If a value class with an implicit constructor extends an abstract class, that superclass must also declare an implicit constructor.

Because implicit constructors are necessary for the allocation of null-restricted fields and arrays, value classes that do not declare implicit constructors cannot be used as null-restricted types.

Non-atomic updates

A value class with an implicit constructor may also declare that it tolerates implicit creation of instances via non-atomic field and array updates. This means that, in a race condition, new class instances may be accidentally created by intermixing field values from other instances, without any code execution or other additional cooperation from the value class.

A value class opts in to allowing this behavior by implementing the LooselyConsistentValue interface:

value class Point implements LooselyConsistentValue {
    double x;
    double y;
    
    public implicit Point();
    
    public Point(double x, double y) {
        this.x = x;
        this.y = y;
    }
}

This is strawman syntax, subject to change.

Users of a LooselyConsistentValue class are responsible for maintaining the integrity of their data, and can avoid unwanted instance creation by limiting access to a single thread, enforcing a synchronization protocol, or declaring a field volatile. Otherwise, unexpected instances may be created:

Point![] ps = { new Point(0.0, 1.0) }; 
new Thread(() -> ps[0] = new Point(2.0, 3.0)).start(); 
Point p = ps[0]; // may be (2.0, 1.0), among other possibilities

Some implicitly-constructible value classes have complex integrity constraints for non-zero field values (for example, the start index of a Range, declared above, must not exceed the end index). In that circumstance, it may not be appropriate for the class to implement the LooselyConsistentValue interface. This feature is designed for the subset of value classes that can comfortably operate on arbitrary combinations of field values.

Performance model

As described in the Value Objects JEP, the typical treatment of a standard value class is for local variables, method parameters, and expression results to use inline encodings, while fields and array components are only inlined if the value object, plus a null flag, can fit in an atomic word size (such as 64 bits).

Adding an implicit constructor to a value class enables null-restricted storage, avoiding the need to dedicate any bits to a null flag. So, for example, a variable of type Long might be too large to store inline, but the type Long! should be safely inlinable on a 64-bit JVM.

For larger classes (as determined by the JVM implementation), implementing LooselyConsistentValue may also be necessary to enable inlining of these null-restricted fields and array components.

When flattened, a null-restricted class type should have a heap storage footprint and execution time (when fully optimized) comparable to the primitive types. For example, a Point!, given the class declaration above, can be expected to directly occupy 128 bits in fields and array components, and to avoid any allocation in stack computations. A field access simply references the first or second 64 bits. There are no additional pointers.

Notably, null-restricted uses of a value class with an implicit constructor and a single instance field can be expected to have minimal overhead compared to operating on a value of the field's type directly.

However, JVMs are ultimately free to encode class instances however they see fit. Some classes may be considered too large to represent inline. Certain JVM components, in particular those that are less performance-tuned, may prefer to interact with instances as heap-allocated objects. An encoding might carry with it a cached heap pointer to reduce the overhead of future allocations. Etc.

Implicitly-Constructible Value Classes in the Standard Library

The following classes, which are considered value classes under JEP 401 when preview features are enabled, are further considered under this JEP to have an implicit constructor, despite not having declared such a constructor:

java.lang.Byte
java.lang.Short
java.lang.Integer
java.lang.Long
java.lang.Float
java.lang.Double
java.lang.Boolean
java.lang.Character
java.util.Optional

Reflection and erasure

Like parameterized types, null-restricted types are erased in compiled field and method signatures. There is no instance of java.lang.Class to represent Range!, and adding or removing ! in APIs is a binary compatible refactoring.

However, unlike parameterized types, null restrictions are still enforced at run time. This is achieved with compiler-generated null checks, and through a new mechanism, called a CheckedType, that performs a dynamic check when fields and arrays are written to.

The CheckedType of an array expresses the array's dynamic store check, including any null check.

String[] a1 = new String[100];
CheckedType t1 = Array.getComponentType(a1);
t1.cast("abc"); // success
t1.cast(new Range(8, 12)); // ClassCastException
t1.cast(null); // success

Range![] a2 = new Range![100];
CheckedType t2 = Array.getComponentType(a2);
t2.cast("abc"); // ClassCastException
t2.cast(new Range(8, 12)); // success
t2.cast(null); // NullPointerException

New arrays can be created using a CheckedType, and this mechanism should be preferred over array allocations using Class objects to represent the component type.

Range[] a3 = (Range[]) Array.newInstance(t2, 100);
a3[10] = null; // ArrayStoreException

If a field is declared with a null-restricted type, the Field.getCheckedType method will return the corresponding checked type.

Alternatives

Making use of primitive types, rather than declaring value classes, will often produce a program with equivalent or slightly better performance. However, this approach gives up the valuable abstractions provided by classes. It's easy to, say, interpret a double with the wrong units, pass an out-of-range int to a library method, or fail to keep two boolean flags together in the right order.

Value classes provide useful performance benefits without needing implicit constructors and non-atomic/null-restricted storage. In some cases, field and array storage can already be inlined. But many classes cannot fit in an atomic word size, or have no room to spare for a null flag; and even if further engineering could increase that atomic word size to a comfortable level, null flags unnecessarily inflate memory footprint in many use cases. This JEP allows the memory footprint to match that of primitive types.

We considered many different approaches to the object model and type system before settling on a model in which compact flattened heap storage is simply a JVM optimization for a null-restricted reference type. This strategy avoids the conceptual overhead that comes from generalizing the existing model of primitive types. Developers already understand objects and classes, and null-restricted types are a simple language enhancement that is useful as a general-purpose feature.

Risks and Assumptions

There are security risks involved in allowing instance creation outside of constructors, via zero instances and non-atomic reads and writes. Developers will need to understand the implications, and recognize when it would be unsafe to declare an implicit constructor or implement the LooselyConsistentValue interface.

Dependencies

This JEP depends on Value Classes and Objects (Preview), which establishes the semantics of identity-free objects and implements value object inlining.

Building on this JEP, JEP 402: Enhanced Primitive Boxing (Preview) refactors the primitive wrapper classes as value classes with implicit constructors.

In the future, JVM class and method specialization (JEP 218, with revisions) will allow generic classes and methods to specialize field and array layouts when parameterized by null-restricted value class types.

More general support for nullness features will be explored in a future JEP.