Fluentd and Kafka

Fluentd and Kafka
Hadoop / Spark Conference Japan 2016 
Feb 8, 2016

Who are you?
• Masahiro Nakagawa
• github: @repeatedly
• Treasure Data Inc.
• Fluentd / td-agent developer
• Fluentd Enterprise support
• I love OSS :)
• D Language, MessagePack, The organizer of several meetups, etc…

Fluentd
• Pluggable streaming event collector
• Lightweight, robust and ﬂexible
• Lots of plugins on rubygems
• Used by AWS, GCP, MS and more companies
• Resources
• http://www.ﬂuentd.org/
• Webinar: https://www.youtube.com/watch?v=6uPB_M7cbYk

Popular case
App
Push
Push
Forwarder Aggregator Destination

• Distributed messaging system
• Producer - Broker - Consumer pattern
• Pull model, replication, etc 
 
 
 
 
 
Apache Kafka
App
Push
Pull
Producer Broker DestinationConsumer

Push vs Pull
• Push:
• Easy to transfer data to multiple destinations
• Hard to control stream ratio in multiple streams
• Pull:
• Easy to control stream ﬂow / ratio
• Should manage consumers correctly

There are 2 ways
• ﬂuent-plugin-kafka
• kafka-ﬂuentd-consumer

ﬂuent-plugin-kafka
• Input / Output plugin for kafka
• https://github.com/htgc/ﬂuent-plugin-kafka
• in_kafka, in_kafka_group, out_kafka, out_kafka_buffered
• Pros
• Easy to use and output support
• Cons
• Performance is not primary

Configuration example
<source>
@type kafka
topics web,system
format json
add_prefix kafka.
# more options
</source>
<match kafka.**>
@type kafka_buffered
output_data_type msgpack
default_topic metrics
compression_codec gzip
required_acks 1
</match>
https://github.com/htgc/fluent-plugin-kafka#usage

kafka fluentd consumer
• Stand-alone kafka consumer for fluentd
• https://github.com/treasure-data/kafka-fluentd-consumer
• Send cosumed events to fluentd’s in_forward
• Pros
• High performance and Java API features
• Cons
• Need Java runtime

Run consumer
• Edit log4j and fluentd-consumer properties
• Run following command: 
$ java  
-Dlog4j.configuration=file:///path/to/log4j.properties  
-jar path/to/kafka-fluentd-consumer-0.2.1-all.jar  
path/to/fluentd-consumer.properties

Properties example
fluentd.tag.prefix=kafka.event.
fluentd.record.format=regexp # default is json
fluentd.record.pattern=(?<text>.*) # for regexp format
fluentd.consumer.topics=app.* # can use Java Rege
fluentd.consumer.topics.pattern=blacklist # default is whitelist
fluentd.consumer.threads=5
https://github.com/treasure-data/kafka-fluentd-consumer/blob/master/config/fluentd-consumer.properties

With Fluentd example
<source>
@type forward
</source>
<source>
@type exec
command java -
Dlog4j.configuration=file:///path/to/
log4j.properties -jar /path/to/kafka-
fluentd-consumer-0.2.1-all.jar /path/
to/config/fluentd-
consumer.properties
tag dummy
format json
</source>
https://github.com/treasure-data/kafka-fluentd-consumer#run-kafka-consumer-for-fluentd-via-in_exec

Conclusion
• Kafka is now becomes important component 
on data platform
• Fluentd can communicate with Kafka
• Fluentd plugin and kafka consumer
• Building reliable and ﬂexible data pipeline with 
Fluentd and Kafka

Fluentd and Kafka

Related slideshows

More Related Content

What's hot

What's hot (20)

Similar to Fluentd and Kafka

Similar to Fluentd and Kafka (20)

More from N Masahiro

More from N Masahiro (20)

Recently uploaded

Recently uploaded (20)

Fluentd and Kafka