SlideShare a Scribd company logo
OCTOBER 11-14, 2016 • BOSTON, MA
Tuning Solr and its Pipeline for Logs
Rafał Kuć and Radu Gheorghe
Software Engineers, Sematext Group, Inc.
3
01
Agenda
Designing a Solr(Cloud) cluster for time-series data
Solr and operating system knobs to tune
Pipeline patterns and shipping options
4
01
Time-based collections, the single best improvement
14.10
indexing
5
01
Time-based collections, the single best improvement
14.10
indexing
15.10
6
01
Time-based collections, the single best improvement
14.10
indexing
15.10
7
01
Time-based collections, the single best improvement
14.10 15.10 ... 21.10
indexing
8
01
Time-based collections, the single best improvement
14.10 15.10 ... 21.10
indexing
Less merging ⇒ faster indexing
Quick deletes (of whole collections)
Search only some collections
Better use of caches
9
01
Load is uneven
Black
Friday
Saturday Sunday
1
0
01
Load is uneven
Need to “rotate” collections fast enough to
work with this (otherwise indexing and
queries will be slow)
These will be tiny
Black
Friday
Saturday Sunday
1
1
01
If load is uneven, daily/monthly/etc indices are suboptimal:
you either have poor performance or too many collections
Octi* is worried:
* this is Octi →
1
2
01
Solution: rotate by size
indexing
logs01
Size limit
1
3
01
Solution: rotate by size
indexing
logs01
Size limit
1
4
01
Solution: rotate by size
indexing
logs01
logs02
Size limit
1
5
01
Solution: rotate by size
indexing
logs01
logs02
Size limit
1
6
01
Solution: rotate by size
indexing
logs01
logs08...
logs02
1
7
01
Solution: rotate by size
indexingPredictable indexing and search performance
Fewer shards
logs01
logs08...
logs02
1
8
01
Dealing with size-based collections
logs01
logs08...
logs02
app (caches results)
stats
2016-10-11 2016-10-13 2016-10-12 2016-10-14
2016-10-18
doesn’t matter,
it’s the latest collection
1
9
01
Size-based collections handle spiky load better
Octi concludes:
2
0
01
Tiered cluster (a.k.a. hot-cold)
14 Oct
11 Oct
10 Oct
12 Oct
13 Oct
hot01 cold01 cold02
indexing, most searches longer-running (+cached) searches
2
1
01
Tiered cluster (a.k.a. hot-cold)
14 Oct
11 Oct
10 Oct
12 Oct
13 Oct
hot01 cold01 cold02
indexing, most searches longer-running (+cached) searches
⇒ Good CPU and IO* ⇒ Heap. Decent IO for replication&backup
* Ideally local SSD; avoid network storage unless it’s really good
2
2
01
Octi likes tiered clusters
Costs: you can use different hardware for different workloads
Performance (see costs): fewer shards, less overhead
Isolation: long-running searches don’t slow down indexing
2
3
01
AWS specifics
Hot tier:
c3 (compute optimized) + EBS and use local SSD as cache*
c4 (EBS only)
Cold tier:
d2 (big local HDDs + lots of RAM)
m4 (general purpose) + EBS
i2 (big local SSDs + lots of RAM)
General stuff:
EBS optimized
Enhanced Networking
VPC (to get access to c4&m4 instances)
* Use --cachemode writeback for async writing:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/ht
ml/Logical_Volume_Manager_Administration/lvm_cache_volume_creation.html
PIOPS is best but expensive
HDD - too slow (unless cold=icy)
⇒ General purpose SSDs
2
4
01
EBS
Stay under 3TB. More smaller (<1TB) drives in RAID0 give better, but shorter IOPS bursts
Performance isn’t guaranteed ⇒ RAID0 will wait for the slowest disk
Check limits (e.g. 160MB/s per drive, instance-dependent IOPS and network)
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerformance.html
3 IOPS/GB up to 10K (~3TB), up to 256kb/IO, merges up to 4 consecutive IOs
2
5
01
Octi’s AWS top picks
c4s for the hot tier: cheaper than c3s with similar performance
m4s for the cold tier: well balanced, scale up to 10xl, flexible storage via EBS
EBS drives < 3TB, otherwise avoids RAID0: higher chances of performance drop
2
6
01
Scratching the surface of OS options
Say no to swap
Disk scheduler: CFQ for HDD, deadline for SSD
Mount options: noatime, nodiratime, data=writeback, nobarrier
For bare metal: check CPU governor and THP*
because strict ordering is for the weak
* often it’s enabled, but /proc/sys/vm/nr_hugepages is 0
2
7
01
Schema and solrconfig
Auto soft commit (5s?)
Auto commit (few minutes?)
RAM buffer size + Max buffered docs
Doc Values for faceting
+ retrieving those fields (stored=false)
Omit norms, frequencies and positions
Don’t store catch-all field(s)
2
8
01
Relaxing the merge policy*
Merges are the heaviest part of indexing
Facets are the heaviest part of searching
Facets (except method=enum) depend on data size more than # of segments
Knobs:
segmentsPerTier: more segments ⇒ less merging
maxMergeAtOnce < segmentsPerTier, to smooth out those IO spikes
maxMergedSegmentMB: lower to merge more small segments, less bigger ones
* unless you only do “grep”. YMMV, of course. Keep an eye on open files, though
⇒ fewer open files
2
9
01
Some numbers: more segments, more throughput (10-15% here)
10 segmentsPerTier
10 maxMergeAtOnce
50 segmentsPerTier
50 maxMergeAtOnce
need to rotate
before this drop
3
0
01
Lower max segment (500MB from 5GB default)
less CPU fewer segments
3
1
01
There’s more...
SPM screenshots from all tests + JMeter test plan here:
https://github.com/sematext/lucene-revolution-samples/tree/master/2016/solr_logging
We’d love to hear about your own results!
correct spelling: sematext.com/spm
3
2
01
Increasing segments per tier while decreasing max merged
segment (by an order of magnitude) makes indexing better and
search latency less spiky
Octi’s conclusions so far
3
3
01
Optimize I/O and CPU by not optimizing
Unless you have spare CPU & IO (why would you?)
And unless you run out of open files
Only do this on “old” indices!
3
4
01
Optimizing the pipeline*
logs
log shipper(s)
Ship using which protocol(s)?
Buffer
Route to other destinations?
Parse
* for availability and performance/costs
Or log to Solr directly from app
(i.e. implement a new, embedded log shipper)
3
5
01
A case for buffers
performance & availability
allows batches and threads when Solr is down or can’t keep up
3
6
01
Types of buffers
Disk*, memory or a combination
On the logging host or centralized
* doesn’t mean it fsync()s for every message
file or local log shipper
Easy scaling; fewer moving parts
Often requires a lightweight shipper
Kafka/Redis/etc or central log shipper
Extra features (e.g. TTL)
One place for changes
bufferapp
bufferapp
3
7
01
Multiple destinations
* or flat files, or S3 or...
input buffer
processing
Outputs need to be in sync
input Processing may
cause backpressure
processing
*
3
8
01
Multiple destinations
input
Solr
offset
HDFS
offset
input
processing
processing
3
9
01
Just Solr and maybe flat files? Go simple with a local shipper
Custom, fast-changing processing & multiple destinations? Kafka as a central buffer
Octi’s pipeline preferences
4
0
01
Parsing unstructured data
Ideally, log in JSON*
Otherwise, parse
* or another serialization format
For performance and maintenance
(i.e. no need to update parsing rules)
Regex-based (e.g. grok)
Easy to build rules
Rules are flexible
Typically slow & O(n) on # of rules, but:
Move matching patterns to the top of the list
Move broad patterns to the bottom
Skip patterns including others that didn’t match
Grammar-based
(e.g. liblognorm, PatternDB)
Faster. O(1) on # of rules
Numbers in our 2015 session:
sematext.com/blog/2015/10/16/large-scale-log-analytics-with-solr/
4
1
01
Decide what gives when buffers fill up
input shipper
Can drop data here, but better buffer
when shipper buffer is full, app can block or drop data
Check:
Local files: what happens when files are rotated/archived/deleted?
UDP: network buffers should handle spiky load*
TCP: what happens when connection breaks/times out?
UNIX sockets: what happens when socket blocks writes**?
* you’d normally increase net.core.rmem_max and rmem_default
** both DGRAM and STREAM local sockets are reliable (vs Internet ones, UDP and TCP)
4
2
01
Octi’s flow chart of where to log
critical?
UDP. Increase network
buffers on destination,
so it can handle spiky
traffic
Paying with
RAM or IO?
UNIX socket. Local
shipper with memory
buffers, that can
drop data if needed
Local files. Make sure
rotation is in place or
you’ll run out of disk!
no
IO RAM
yes
4
3
01
Protocols
UDP: cool for the app, but not reliable
TCP: more reliable, but not completely
Application-level ACKs may be needed:
No failure/backpressure handling needed
App gets ACK when OS buffer gets it
⇒ no retransmit if buffer is lost*
* more at blog.gerhards.net/2008/05/why-you-cant-build-reliable-tcp.html
sender receiver
ACKs
Protocol Example shippers
HTTP Logstash, rsyslog, Fluentd
RELP rsyslog, Logstash
Beats Filebeat, Logstash
Kafka Fluentd, Filebeat, rsyslog, Logstash
4
4
01
Octi’s top pipeline+shipper combos
rsyslog
app
UNIX
socket
HTTP
memory+disk buffer
can drop
app
fileKafka
consumer
Filebeat
HTTP
simple & reliable
4
5
01
Conclusions, questions, we’re hiring, thank you
The whole talk was pretty much only conclusions :)
Feels like there’s much more to discover. Please test & share your own nuggets
http://www.relatably.com/m/img/oh-my-god-memes/oh-my-god.jpg
Scary word, ha? Poke us:
@kucrafal @radu0gheorghe @sematext
...or @ our booth here

More Related Content

What's hot

DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
PROIDEA
 
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Sematext Group, Inc.
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
SATOSHI TAGOMORI
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
Lucidworks
 
Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12
N Masahiro
 
ELK stack at weibo.com
ELK stack at weibo.comELK stack at weibo.com
ELK stack at weibo.com
琛琳 饶
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
Sematext Group, Inc.
 
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Maarten Balliauw
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
Sadayuki Furuhashi
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Rafał Kuć
 
Logstash family introduction
Logstash family introductionLogstash family introduction
Logstash family introduction
Owen Wu
 
How to create Treasure Data #dotsbigdata
How to create Treasure Data #dotsbigdataHow to create Treasure Data #dotsbigdata
How to create Treasure Data #dotsbigdata
N Masahiro
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
SATOSHI TAGOMORI
 
Fluentd at Bay Area Kubernetes Meetup
Fluentd at Bay Area Kubernetes MeetupFluentd at Bay Area Kubernetes Meetup
Fluentd at Bay Area Kubernetes Meetup
Sadayuki Furuhashi
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
Amazon Web Services
 
Scaling Flink in Cloud
Scaling Flink in CloudScaling Flink in Cloud
Scaling Flink in Cloud
Steven Wu
 
Building the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for HadoopBuilding the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for Hadoop
All Things Open
 
Fluentd meetup #2
Fluentd meetup #2Fluentd meetup #2
Fluentd meetup #2
Treasure Data, Inc.
 
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et KibanaJournée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Publicis Sapient Engineering
 

What's hot (19)

DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
 
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
 
Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12Dive into Fluentd plugin v0.12
Dive into Fluentd plugin v0.12
 
ELK stack at weibo.com
ELK stack at weibo.comELK stack at weibo.com
ELK stack at weibo.com
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - SematextScaling massive elastic search clusters - Rafał Kuć - Sematext
Scaling massive elastic search clusters - Rafał Kuć - Sematext
 
Logstash family introduction
Logstash family introductionLogstash family introduction
Logstash family introduction
 
How to create Treasure Data #dotsbigdata
How to create Treasure Data #dotsbigdataHow to create Treasure Data #dotsbigdata
How to create Treasure Data #dotsbigdata
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
Fluentd at Bay Area Kubernetes Meetup
Fluentd at Bay Area Kubernetes MeetupFluentd at Bay Area Kubernetes Meetup
Fluentd at Bay Area Kubernetes Meetup
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
Scaling Flink in Cloud
Scaling Flink in CloudScaling Flink in Cloud
Scaling Flink in Cloud
 
Building the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for HadoopBuilding the Right Platform Architecture for Hadoop
Building the Right Platform Architecture for Hadoop
 
Fluentd meetup #2
Fluentd meetup #2Fluentd meetup #2
Fluentd meetup #2
 
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et KibanaJournée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
 

Viewers also liked

How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
Sematext Group, Inc.
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Sematext Group, Inc.
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
Sematext Group, Inc.
 
Top Node.js Metrics to Watch
Top Node.js Metrics to WatchTop Node.js Metrics to Watch
Top Node.js Metrics to Watch
Sematext Group, Inc.
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
Sematext Group, Inc.
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
Sematext Group, Inc.
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
thelabdude
 
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & KafkaBuilding Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Sematext Group, Inc.
 
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerRunning High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Sematext Group, Inc.
 
Docker Logging Webinar
Docker Logging  WebinarDocker Logging  Webinar
Docker Logging Webinar
Sematext Group, Inc.
 
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetJourney of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Lucidworks
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
lucenerevolution
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Lucidworks
 
Solr Anti Patterns
Solr Anti PatternsSolr Anti Patterns
Solr Anti Patterns
Sematext Group, Inc.
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
Sematext Group, Inc.
 
Python Unittest
Python UnittestPython Unittest
Python Unittest
명규 최
 
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchFrom Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
Sematext Group, Inc.
 
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextTuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Lucidworks
 
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
Lucidworks
 
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Hortonworks
 

Viewers also liked (20)

How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
 
Top Node.js Metrics to Watch
Top Node.js Metrics to WatchTop Node.js Metrics to Watch
Top Node.js Metrics to Watch
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & KafkaBuilding Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
 
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerRunning High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
 
Docker Logging Webinar
Docker Logging  WebinarDocker Logging  Webinar
Docker Logging Webinar
 
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetJourney of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
 
Solr Anti Patterns
Solr Anti PatternsSolr Anti Patterns
Solr Anti Patterns
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
 
Python Unittest
Python UnittestPython Unittest
Python Unittest
 
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & ElasticsearchFrom Zero to Hero - Centralized Logging with Logstash & Elasticsearch
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
 
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextTuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
 
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
 
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
 

Similar to Tuning Solr & Pipeline for Logs

Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Kyle Hailey
 
UKOUG, Lies, Damn Lies and I/O Statistics
UKOUG, Lies, Damn Lies and I/O StatisticsUKOUG, Lies, Damn Lies and I/O Statistics
UKOUG, Lies, Damn Lies and I/O Statistics
Kyle Hailey
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
Best Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on SolarisBest Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on Solaris
Jignesh Shah
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
Tanel Poder
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
Bob Ward
 
Oracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningOracle R12 EBS Performance Tuning
Oracle R12 EBS Performance Tuning
Scott Jenner
 
POLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloudPOLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloud
oysteing
 
Assignment 2 Theoretical
Assignment 2 TheoreticalAssignment 2 Theoretical
Assignment 2 Theoretical
Esteban Gonzalez
 
A presentaion on Panasas HPC NAS
A presentaion on Panasas HPC NASA presentaion on Panasas HPC NAS
A presentaion on Panasas HPC NAS
Rahul Janghel
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of Ozone
Erik Krogen
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
solarisyougood
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
Amazon Web Services
 
Lamp Stack Optimization
Lamp Stack OptimizationLamp Stack Optimization
Lamp Stack Optimization
Dave Ross
 
Tuning Java Servers
Tuning Java Servers Tuning Java Servers
Tuning Java Servers
Srinath Perera
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
xKinAnx
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
MongoDB
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
Amazon Web Services
 
SQL Server It Just Runs Faster
SQL Server It Just Runs FasterSQL Server It Just Runs Faster
SQL Server It Just Runs Faster
Bob Ward
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
Nicolas Poggi
 

Similar to Tuning Solr & Pipeline for Logs (20)

Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
 
UKOUG, Lies, Damn Lies and I/O Statistics
UKOUG, Lies, Damn Lies and I/O StatisticsUKOUG, Lies, Damn Lies and I/O Statistics
UKOUG, Lies, Damn Lies and I/O Statistics
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Best Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on SolarisBest Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on Solaris
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
 
Oracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningOracle R12 EBS Performance Tuning
Oracle R12 EBS Performance Tuning
 
POLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloudPOLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloud
 
Assignment 2 Theoretical
Assignment 2 TheoreticalAssignment 2 Theoretical
Assignment 2 Theoretical
 
A presentaion on Panasas HPC NAS
A presentaion on Panasas HPC NASA presentaion on Panasas HPC NAS
A presentaion on Panasas HPC NAS
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of Ozone
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
 
Lamp Stack Optimization
Lamp Stack OptimizationLamp Stack Optimization
Lamp Stack Optimization
 
Tuning Java Servers
Tuning Java Servers Tuning Java Servers
Tuning Java Servers
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
SQL Server It Just Runs Faster
SQL Server It Just Runs FasterSQL Server It Just Runs Faster
SQL Server It Just Runs Faster
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 

More from Sematext Group, Inc.

Tweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities ExplainedTweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities Explained
Sematext Group, Inc.
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
Sematext Group, Inc.
 
Is observability good for your brain?
Is observability good for your brain?Is observability good for your brain?
Is observability good for your brain?
Sematext Group, Inc.
 
Introducing log analysis to your organization
Introducing log analysis to your organization Introducing log analysis to your organization
Introducing log analysis to your organization
Sematext Group, Inc.
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
Sematext Group, Inc.
 
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleMetrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Sematext Group, Inc.
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for Logs
Sematext Group, Inc.
 
(Elastic)search in big data
(Elastic)search in big data(Elastic)search in big data
(Elastic)search in big data
Sematext Group, Inc.
 
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and SolrSide by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and Solr
Sematext Group, Inc.
 
Open Source Search Evolution
Open Source Search EvolutionOpen Source Search Evolution
Open Source Search Evolution
Sematext Group, Inc.
 
Elasticsearch and Solr for Logs
Elasticsearch and Solr for LogsElasticsearch and Solr for Logs
Elasticsearch and Solr for Logs
Sematext Group, Inc.
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Sematext Group, Inc.
 

More from Sematext Group, Inc. (12)

Tweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities ExplainedTweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities Explained
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
 
Is observability good for your brain?
Is observability good for your brain?Is observability good for your brain?
Is observability good for your brain?
 
Introducing log analysis to your organization
Introducing log analysis to your organization Introducing log analysis to your organization
Introducing log analysis to your organization
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
 
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleMetrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for Logs
 
(Elastic)search in big data
(Elastic)search in big data(Elastic)search in big data
(Elastic)search in big data
 
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and SolrSide by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and Solr
 
Open Source Search Evolution
Open Source Search EvolutionOpen Source Search Evolution
Open Source Search Evolution
 
Elasticsearch and Solr for Logs
Elasticsearch and Solr for LogsElasticsearch and Solr for Logs
Elasticsearch and Solr for Logs
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 

Recently uploaded

What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
Stephanie Beckett
 
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
Low Hong Chuan
 
The Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - CoatueThe Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - Coatue
Razin Mustafiz
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
AmandaCheung15
 
Camunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptxCamunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptx
ZachWylie3
 
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdfDefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
Yury Chemerkin
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Snarky Security
 
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Alliance
 
Keynote : Presentation on SASE Technology
Keynote : Presentation on SASE TechnologyKeynote : Presentation on SASE Technology
Keynote : Presentation on SASE Technology
Priyanka Aash
 
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptxFIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Alliance
 
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
Zilliz
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
Zilliz
 
What's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptxWhat's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptx
Stephanie Beckett
 
It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
Zilliz
 
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptxFIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Alliance
 
Enterprise_Mobile_Security_Forum_2013.pdf
Enterprise_Mobile_Security_Forum_2013.pdfEnterprise_Mobile_Security_Forum_2013.pdf
Enterprise_Mobile_Security_Forum_2013.pdf
Yury Chemerkin
 
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc
 
Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...
Nohoax Kanont
 
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
DianaGray10
 
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptxFIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Alliance
 

Recently uploaded (20)

What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
 
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
 
The Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - CoatueThe Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - Coatue
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
 
Camunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptxCamunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptx
 
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdfDefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
 
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
 
Keynote : Presentation on SASE Technology
Keynote : Presentation on SASE TechnologyKeynote : Presentation on SASE Technology
Keynote : Presentation on SASE Technology
 
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptxFIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptx
 
Retrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with RagasRetrieval Augmented Generation Evaluation with Ragas
Retrieval Augmented Generation Evaluation with Ragas
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
 
What's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptxWhat's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptx
 
It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
 
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptxFIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
 
Enterprise_Mobile_Security_Forum_2013.pdf
Enterprise_Mobile_Security_Forum_2013.pdfEnterprise_Mobile_Security_Forum_2013.pdf
Enterprise_Mobile_Security_Forum_2013.pdf
 
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification
 
Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...Generative AI technology is a fascinating field that focuses on creating comp...
Generative AI technology is a fascinating field that focuses on creating comp...
 
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
 
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptxFIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
 

Tuning Solr & Pipeline for Logs

  • 1. OCTOBER 11-14, 2016 • BOSTON, MA
  • 2. Tuning Solr and its Pipeline for Logs Rafał Kuć and Radu Gheorghe Software Engineers, Sematext Group, Inc.
  • 3. 3 01 Agenda Designing a Solr(Cloud) cluster for time-series data Solr and operating system knobs to tune Pipeline patterns and shipping options
  • 4. 4 01 Time-based collections, the single best improvement 14.10 indexing
  • 5. 5 01 Time-based collections, the single best improvement 14.10 indexing 15.10
  • 6. 6 01 Time-based collections, the single best improvement 14.10 indexing 15.10
  • 7. 7 01 Time-based collections, the single best improvement 14.10 15.10 ... 21.10 indexing
  • 8. 8 01 Time-based collections, the single best improvement 14.10 15.10 ... 21.10 indexing Less merging ⇒ faster indexing Quick deletes (of whole collections) Search only some collections Better use of caches
  • 10. 1 0 01 Load is uneven Need to “rotate” collections fast enough to work with this (otherwise indexing and queries will be slow) These will be tiny Black Friday Saturday Sunday
  • 11. 1 1 01 If load is uneven, daily/monthly/etc indices are suboptimal: you either have poor performance or too many collections Octi* is worried: * this is Octi →
  • 12. 1 2 01 Solution: rotate by size indexing logs01 Size limit
  • 13. 1 3 01 Solution: rotate by size indexing logs01 Size limit
  • 14. 1 4 01 Solution: rotate by size indexing logs01 logs02 Size limit
  • 15. 1 5 01 Solution: rotate by size indexing logs01 logs02 Size limit
  • 16. 1 6 01 Solution: rotate by size indexing logs01 logs08... logs02
  • 17. 1 7 01 Solution: rotate by size indexingPredictable indexing and search performance Fewer shards logs01 logs08... logs02
  • 18. 1 8 01 Dealing with size-based collections logs01 logs08... logs02 app (caches results) stats 2016-10-11 2016-10-13 2016-10-12 2016-10-14 2016-10-18 doesn’t matter, it’s the latest collection
  • 19. 1 9 01 Size-based collections handle spiky load better Octi concludes:
  • 20. 2 0 01 Tiered cluster (a.k.a. hot-cold) 14 Oct 11 Oct 10 Oct 12 Oct 13 Oct hot01 cold01 cold02 indexing, most searches longer-running (+cached) searches
  • 21. 2 1 01 Tiered cluster (a.k.a. hot-cold) 14 Oct 11 Oct 10 Oct 12 Oct 13 Oct hot01 cold01 cold02 indexing, most searches longer-running (+cached) searches ⇒ Good CPU and IO* ⇒ Heap. Decent IO for replication&backup * Ideally local SSD; avoid network storage unless it’s really good
  • 22. 2 2 01 Octi likes tiered clusters Costs: you can use different hardware for different workloads Performance (see costs): fewer shards, less overhead Isolation: long-running searches don’t slow down indexing
  • 23. 2 3 01 AWS specifics Hot tier: c3 (compute optimized) + EBS and use local SSD as cache* c4 (EBS only) Cold tier: d2 (big local HDDs + lots of RAM) m4 (general purpose) + EBS i2 (big local SSDs + lots of RAM) General stuff: EBS optimized Enhanced Networking VPC (to get access to c4&m4 instances) * Use --cachemode writeback for async writing: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/ht ml/Logical_Volume_Manager_Administration/lvm_cache_volume_creation.html
  • 24. PIOPS is best but expensive HDD - too slow (unless cold=icy) ⇒ General purpose SSDs 2 4 01 EBS Stay under 3TB. More smaller (<1TB) drives in RAID0 give better, but shorter IOPS bursts Performance isn’t guaranteed ⇒ RAID0 will wait for the slowest disk Check limits (e.g. 160MB/s per drive, instance-dependent IOPS and network) http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerformance.html 3 IOPS/GB up to 10K (~3TB), up to 256kb/IO, merges up to 4 consecutive IOs
  • 25. 2 5 01 Octi’s AWS top picks c4s for the hot tier: cheaper than c3s with similar performance m4s for the cold tier: well balanced, scale up to 10xl, flexible storage via EBS EBS drives < 3TB, otherwise avoids RAID0: higher chances of performance drop
  • 26. 2 6 01 Scratching the surface of OS options Say no to swap Disk scheduler: CFQ for HDD, deadline for SSD Mount options: noatime, nodiratime, data=writeback, nobarrier For bare metal: check CPU governor and THP* because strict ordering is for the weak * often it’s enabled, but /proc/sys/vm/nr_hugepages is 0
  • 27. 2 7 01 Schema and solrconfig Auto soft commit (5s?) Auto commit (few minutes?) RAM buffer size + Max buffered docs Doc Values for faceting + retrieving those fields (stored=false) Omit norms, frequencies and positions Don’t store catch-all field(s)
  • 28. 2 8 01 Relaxing the merge policy* Merges are the heaviest part of indexing Facets are the heaviest part of searching Facets (except method=enum) depend on data size more than # of segments Knobs: segmentsPerTier: more segments ⇒ less merging maxMergeAtOnce < segmentsPerTier, to smooth out those IO spikes maxMergedSegmentMB: lower to merge more small segments, less bigger ones * unless you only do “grep”. YMMV, of course. Keep an eye on open files, though ⇒ fewer open files
  • 29. 2 9 01 Some numbers: more segments, more throughput (10-15% here) 10 segmentsPerTier 10 maxMergeAtOnce 50 segmentsPerTier 50 maxMergeAtOnce need to rotate before this drop
  • 30. 3 0 01 Lower max segment (500MB from 5GB default) less CPU fewer segments
  • 31. 3 1 01 There’s more... SPM screenshots from all tests + JMeter test plan here: https://github.com/sematext/lucene-revolution-samples/tree/master/2016/solr_logging We’d love to hear about your own results! correct spelling: sematext.com/spm
  • 32. 3 2 01 Increasing segments per tier while decreasing max merged segment (by an order of magnitude) makes indexing better and search latency less spiky Octi’s conclusions so far
  • 33. 3 3 01 Optimize I/O and CPU by not optimizing Unless you have spare CPU & IO (why would you?) And unless you run out of open files Only do this on “old” indices!
  • 34. 3 4 01 Optimizing the pipeline* logs log shipper(s) Ship using which protocol(s)? Buffer Route to other destinations? Parse * for availability and performance/costs Or log to Solr directly from app (i.e. implement a new, embedded log shipper)
  • 35. 3 5 01 A case for buffers performance & availability allows batches and threads when Solr is down or can’t keep up
  • 36. 3 6 01 Types of buffers Disk*, memory or a combination On the logging host or centralized * doesn’t mean it fsync()s for every message file or local log shipper Easy scaling; fewer moving parts Often requires a lightweight shipper Kafka/Redis/etc or central log shipper Extra features (e.g. TTL) One place for changes bufferapp bufferapp
  • 37. 3 7 01 Multiple destinations * or flat files, or S3 or... input buffer processing Outputs need to be in sync input Processing may cause backpressure processing *
  • 39. 3 9 01 Just Solr and maybe flat files? Go simple with a local shipper Custom, fast-changing processing & multiple destinations? Kafka as a central buffer Octi’s pipeline preferences
  • 40. 4 0 01 Parsing unstructured data Ideally, log in JSON* Otherwise, parse * or another serialization format For performance and maintenance (i.e. no need to update parsing rules) Regex-based (e.g. grok) Easy to build rules Rules are flexible Typically slow & O(n) on # of rules, but: Move matching patterns to the top of the list Move broad patterns to the bottom Skip patterns including others that didn’t match Grammar-based (e.g. liblognorm, PatternDB) Faster. O(1) on # of rules Numbers in our 2015 session: sematext.com/blog/2015/10/16/large-scale-log-analytics-with-solr/
  • 41. 4 1 01 Decide what gives when buffers fill up input shipper Can drop data here, but better buffer when shipper buffer is full, app can block or drop data Check: Local files: what happens when files are rotated/archived/deleted? UDP: network buffers should handle spiky load* TCP: what happens when connection breaks/times out? UNIX sockets: what happens when socket blocks writes**? * you’d normally increase net.core.rmem_max and rmem_default ** both DGRAM and STREAM local sockets are reliable (vs Internet ones, UDP and TCP)
  • 42. 4 2 01 Octi’s flow chart of where to log critical? UDP. Increase network buffers on destination, so it can handle spiky traffic Paying with RAM or IO? UNIX socket. Local shipper with memory buffers, that can drop data if needed Local files. Make sure rotation is in place or you’ll run out of disk! no IO RAM yes
  • 43. 4 3 01 Protocols UDP: cool for the app, but not reliable TCP: more reliable, but not completely Application-level ACKs may be needed: No failure/backpressure handling needed App gets ACK when OS buffer gets it ⇒ no retransmit if buffer is lost* * more at blog.gerhards.net/2008/05/why-you-cant-build-reliable-tcp.html sender receiver ACKs Protocol Example shippers HTTP Logstash, rsyslog, Fluentd RELP rsyslog, Logstash Beats Filebeat, Logstash Kafka Fluentd, Filebeat, rsyslog, Logstash
  • 44. 4 4 01 Octi’s top pipeline+shipper combos rsyslog app UNIX socket HTTP memory+disk buffer can drop app fileKafka consumer Filebeat HTTP simple & reliable
  • 45. 4 5 01 Conclusions, questions, we’re hiring, thank you The whole talk was pretty much only conclusions :) Feels like there’s much more to discover. Please test & share your own nuggets http://www.relatably.com/m/img/oh-my-god-memes/oh-my-god.jpg Scary word, ha? Poke us: @kucrafal @radu0gheorghe @sematext ...or @ our booth here