From the course: Complete Guide to Apache Kafka for Beginners

Topics, partitions, and offsets

Hi, this is Stephane from conduktor and welcome to this first lecture on a topic named Kafka Topics. So Kafka topics are a particular stream of data within your Kafka Cluster. So Kafka Cluster can have many topics. It could be named, for example, logs, purchases, Twitter, tweets, tracks gps and so on. So a topic in Kafka is a stream of data. And if you wanted to make a parallel to databases, well, a topic is similar to what a table would be in a database, but without all the constraints because you send whatever you want to a Kafka topic, there is no data verification and I will explain to you what it means later on. So you can have as many topics as you want in your Kafka Cluster. And the way to identify a topic in a Kafka Cluster is by its name. That's why I have logs, purchases, Twitter tweets, trucks gps. Those are all names for my Kafka topics. So these Kafka topics supports any kind of message formats, and then you can send, for example, JSON, Avro, text file, binary, whatever you want. The sequence of the messages in a topic is called a data stream, and this is why Kafka is called a data streaming platform. Because you make data stream through topics. You cannot query topics. So topics are similar to a table database, but you cannot query them instead to add data into a Kafka topic, we're going to use Kafka Producers and to read data from a topic we're going to use Kafka Consumers. But there is no querying capability within Kafka. Okay, so these topics, they're general, but you can divide them into partitions. So a topic can be made up of, for example, 100 partitions. But in my example, I'm going to have a Kafka topic with three partitions; partition 0, 1, and 2. Now the messages sent to Kafka topic are going to end up in these partitions, and messages within each partition are going to be ordered. So my first message send to partition 0 will have the ID 0, 1, and then 2 and then all the way up to 9. And then as I keep on writing messages into my partition, this ID is going to increase. So this is the same case when I go and write data into partition one of my Kafka topic, the ID will keep on increasing and so on. So the messages in these partitions where they are written, they are getting an ID that's incrementing from 0 to whatever, and this ID is called a Kafka partition offsets. Okay. So you will hear me saying offsets a lot in this course. So as we can see, each partition has different offsets. Now Kafka topics are also immutable. That means that once the data is written into a partition, it cannot be changed. So we cannot delete data in Kafka, you cannot update data in Kafka. You have to keep on writing to the partition. Okay. So now let's take an example of trucks gps to make it more concrete. So say you have a fleet of trucks and each truck has a GPS and the GPS reports its position to Apache Kafka. Then each truck will send a message to Kafka every 20 seconds, for example, and each message will contain some information, such as the truck ID and the truck position, for example, the latitude and the longitude. So we have a bunch of trucks and they're going to be data producers, and they will send data into a topic, a Kafka topic named trucks_gps that will contain the positions of all trucks. So the topics send the data into the trucks GPS topic. And then because a topic is made of partitions, as we've seen, we choose to create a topic with 10 partitions. Now that's an arbitrary number and I will tell you how later on in this course how to select the number of partitions for your topic. So once this topic is created in Kafka, well, we have a use case for it. For example, we want to have consumers that will consume that truck's GPS data and send it into a location dashboard so we can track the location of other trucks in real time. Or maybe we also want to have a notification service, consume the same stream of data, and the notification service will, for example, send notifications to the customers when the delivery is close. So this is why Kafka is very helpful, because well, multiple services are reading from the same stream of data. Okay. So now let's note some important things about topics, partitions and offsets. So once a data is written to a partition, it will not be changed. It cannot be changed. That's called immutability. It's very important to understand this. Data in Kafka is only kept for a limited time, and the default is one week, although that is configurable. That means that after one week your data will disappear. And the offsets only have a meaning for a specific partition. As you can see, the offsets are repeated across partitions. So offset 3 in partition 0 represents a message, but it doesn't represent the same data as offset 3 in partition 1. And the offsets are not going to be reused even if previous messages have been deleted. Okay. So keep it keeps on increasing incrementally one by one as you send messages into your Kafka topic. Now that means also that the order of messages is guaranteed only within a partition, but not across partitions. And that is very important to understand, and I will repeat this later on in this course again. Okay. But what this means is that while the messages within each partition, they have offsets increasing. So that means they are in order and we read them in the order of the offsets. But then across partitions we have no control. Okay. So if we need ordering, we'll see how we can achieve this. And then the data, when sent to a Kafka topic, is going to be assigned to a random partition. Okay. For example, 0, 1. or 2 in this example, unless you provide a key and I will show you what this does when we have a key. And in a Kafka topic, you can have as many partitions as you want. Okay? With 3, sometimes 10, sometimes 100. And again, we'll see how we can determine what is the right number of partitions for our topic. Okay. So that's it for this lecture. We've seen what our Kafka topics partitions and offsets that hold messages, and we've seen already some specificity aspects about Kafka. So I hope you liked this lecture and I will see you in the next lecture.

Contents