Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend for JanusGraph

PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Performance Study for
JanusGraph Storage Backends,
Scylla, Cassandra, and HBase
Chin Huang, Software Engineer, IBM
Ted Chang, Performance Engineer, IBM

AND ON TWO LINES
First and last name
Position, company
Chin Huang
Chin Huang is a software engineer at the IBM Open Technologies. He
has worked on software development, solutions integration, and
performance evaluation for open source projects such as OpenStack,
JanusGraph, and various databases.
2
Ted Chang is a software engineer at IBM Open Technologies and
Design for Performance. He has worked on various enterprise and
open source cloud solutions. At the moment, his focus is JanusGraph
performance and characterization.
Ted Chang

AND ON TWO LINES
First and last name
Position, company
Agenda
▪ Overview – Graph database storage backends
▪ Performance evaluation scenarios and results
o Insert vertices (inserts)
o Insert edges (= search + update)
o Graph traversal (query)
▪ Lessons learned
▪ Q&A
3

AND ON TWO LINES
First and last name
Position, company
Overview – Graph
database storage
backends

AND ON TWO LINES
First and last name
Position, company
Overview
▪ JanusGraph is a highly scalable graph database optimized for storing and
querying large graphs.
▪ JanusGraph stores graphs in adjacency list format which means that a graph is
stored as a collection of vertices with their adjacency list.
▪ Data storage layer is pluggable. Most common storage backends are
Cassandra and HBase. We want to add Scylla to the mix!
▪ Test workloads:
o Insert vertices - writes
o Insert edges - reads and writes
o Queries – reads
▪ Test environments: database clusters
5

AND ON TWO LINES
First and last name
Position, company
Performance test environment
▪ Server spec
o Physical servers: x3650 M5, 2 sockets x 14 cores, 384 GB (12 x 32G) memory
o CPU: Intel Xeon Processor E5-2690 v4 14C 2.6GHz 35MB Cache 2400MHz
o Network interface: Emulex VFA5.2 ML2 Dual Port 10GbE SFP+ Adapter
o Disk: 720 GB SSD, RAID 5
o Operating system: Ubuntu 16.04.2 LTS
▪ Public tools
o jMeter - load testing tool
o nmon, nmon analyser - system performance monitor and analyze tool
o VisualVM - all-in-one Java troubleshooting/profiling tool
o GCeasy - garbage collection log analysis tool
o Prometheus and grafana – monitoring dashboard
▪ Home grown tools
o Graph schema loader, data generator, batch importer
6

AND ON TWO LINES
First and last name
Position, company
Performance Test Topology
7
Cassandra
Hbase + HDFS
+ Zookeeper
Scylla
Cassandra
Hbase + HDFS
+ Zookeeper
Scylla
Cassandra
Hbase + HDFS
+ Zookeeper
Scylla
JanusGraph
Database Cluster
Load injector
queryinsert, update

AND ON TWO LINES
First and last name
Position, company
Performance
evaluation scenarios
and results

AND ON TWO LINES
First and last name
Position, company
Performance Evaluation: Insert Vertices
9
▪ 40 mil vertices in total
▪ 2 properties for each vertex
▪ Insert scenario
▪ Fully utilize the injectors to
generate the loading against
the databases

AND ON TWO LINES
First and last name
Position, company
Performance Evaluation: Insert Edges
10
▪ 30 mil edges in total
▪ 1 property for each edge
▪ Query and update scenario

AND ON TWO LINES
First and last name
Position, company
Performance Evaluation: Graph Traversal
11
▪ Query: g.V().has('name', ‘usr_name').in('Follows').as('a').out('Retweets').in('Tweets').has('name',
‘usr_name').select('a').values('name').dedup()

AND ON TWO LINES
First and last name
Position, company
Lessons Learned
Scylla
• Easy clustering - adding multiple nodes at once
• Well self-tuned but also lacks documentation
• Even load distributed
• Fully utilize system resources
• CPU utilization mis-represents real loads
• Nice monitoring dashboard – prometheus + grafana
• Works with existing Cassandra utility clients
12

AND ON TWO LINES
First and last name
Position, company
Lessons Learned - Continued
Cassandra
• Cluster bootstrapping takes more efforts
• Smaller memory footprint
HBase
• Uneven CPU% on caused by hot regions
• Need to carefully configure read and write cache settings for
better throughput
13

AND ON TWO LINES
First and last name
Position, company
THANK YOU
chhuang@us.ibm.com
htchang@us.ibm.com
Please stay in touch
Any questions?

Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend for JanusGraph

Related slideshows

More Related Content

What's hot

What's hot (20)

Similar to Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend for JanusGraph

Similar to Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend for JanusGraph (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend for JanusGraph