From the course: Hands-On AI: Build a Generative Language Model from Scratch

Unlock this course with a free trial

Join today to access over 23,200 courses taught by industry experts.

Measuring distance

Measuring distance

- [Instructor] In order to dive into semantic similarity and understand which texts have similar meanings, we need to look at the concept of embeddings. We can think of an embedding as a vector representation of a word or a phrase. Imagine you could give a model a word and it could tell you where that word is in a multi-dimensional space. Let's start by imagining words in a two-dimensional space. So what if we gave our model the words dog, cat, and car, and it could tell us where they were on this XY axis? Large language models may produce vectors that can have hundreds and even thousands of dimensions. Some models' entire purpose is to receive text as input and return vectors as outputs. So if we gave such a model the word tree, we may get a vector, and if we gave it the word plant, we may get another vector. Now, there are many ways of comparing these vectors, but for our particular task, for understanding which words have…

Contents