Extended Data Fig. 3: AlphaTensor’s network architecture. | Nature

Extended Data Fig. 3: AlphaTensor’s network architecture.

From: Discovering faster matrix multiplication algorithms with reinforcement learning

Extended Data Fig. 3

The network takes as input the list of tensors containing the current state and previous history of actions, and a list of scalars, such as the time index of the current action. It produces two kinds of outputs: one representing the value, and the other inducing a distribution over the action space from which we can sample from. The architecture of the network is accordingly designed to have a common torso, and two heads, the value and the policy heads. c is set to 512 in all experiments.

Back to article page