From the course: Applied AI: Getting Started with Hugging Face Transformers

Unlock the full course today

Join today to access over 23,200 courses taught by industry experts.

Transformer training and inference

Transformer training and inference

- [Instructor] Training a transformer follows a similar process as training any deep learning model. We will briefly discuss these steps in this video. The first step in training a transformer is creating the transformer architecture. This requires decisions on a variety of parameters and hyperparameters, the number of encoder and decoder layers, number of attention heads, feedforward network architecture, and normalization techniques are some of the key decision points. Then, we initialize the weights and other parameters. Please note that there are weights both in the attention block and also the feedforward network block, and this is across multiple layers of the encoder and decoder stack. Then, we pass the training data to the encoder-decoder pipeline and predict the output. The output is then compared with true labels and the cost is determined. The cost is then used to update the weights across the…

Contents