Extended Data Fig. 3: AlphaTensor’s network architecture.
From: Discovering faster matrix multiplication algorithms with reinforcement learning
![Extended Data Fig. 3](https://cdn.statically.io/img/media.springernature.com/full/springer-static/esm/art%3A10.1038%2Fs41586-022-05172-4/MediaObjects/41586_2022_5172_Fig8_ESM.jpg)
The network takes as input the list of tensors containing the current state and previous history of actions, and a list of scalars, such as the time index of the current action. It produces two kinds of outputs: one representing the value, and the other inducing a distribution over the action space from which we can sample from. The architecture of the network is accordingly designed to have a common torso, and two heads, the value and the policy heads. c is set to 512 in all experiments.