Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based FPGA Accelerators

Accelerating SPARK SQL to 50X
Performance with Apache Arrow based
FPGA accelerators
Calvin Hung
calvin.hung@wasaitech.com
Wasai Technology, Inc.
Weiting Chen
weiting.chen@intel.com
Intel Corp.

AGENDA
Background
Problem Definition
Solutions
Performance
Summary
Q&A

ABOUT US
Calvin Hung @wasaitech
Calvin is the co-founder, CEO and CTO of WASAI
Technology, which is specialized in FPGA-based
datacenter accelerations for Apache Spark,
Apache Hadoop and Genomics Analysis
applications. He has more than 15 years of
experience in software and hardware architecture
co-design and performance optimization.
Weiting Chen(William) @intel
Weiting is a senior software engineer at Intel
Software. He has worked for Big Data and Cloud
Solutions including Spark, Hadoop, OpenStack, and
Kubernetes for more than 6 years.

MOTIVATION
▪ CPU support SIMD instructions such as SSE, AVX2, AVX512, …etc. We
would like to unleash the power in SPARK.
▪ Many accelerators such as FPGA, GPU, ASIC, …etc in the world can
help CPU to offload functions and speed up the performance.

PROBLEM DEFINITION
▪ How to avoid row and columnar convert overhead during data
processing?
▪ How SPARK can leverage AVX support?
▪ How to coordinate the accelerators (e.g. FPGA, GPU, …etc) to work
with CPU in SPARK3.0?
▪ How FPGA can help to speed up SPARK?
▪ How to minimize data copy and serialization overhead when copying
from host to device?
▪ How to enhance the performance during DMA transfer?

SOLUTION: SPARK + ARROW + FPGA
A better way to run SPARK with AVX and accelerators support
- Apache Arrow
- SPARK 3.0 New Features
- OAP Native SQL Engine (Intel)
- FPGA Accelerators (WASAI)

THE GOALS
End-to-End Columnar-to-Columnar Data Processing:
To avoid columnar-to-row or row-to-columnar overhead when processing data.
Support AVX(via OAP Native SQL):
Columnar based Reader -> Columnar based Data Processing(w/ AVX) -> Columnar based Writer Result
With FPGA Integration and acceleration:
Columnar based Reader -> Columnar based Data Processing(w/ AVX) -> Columnar based Data Copy to
Device -> Columnar based Data Processing(on FPGA) -> Columnar

APACHE ARROW
• Each system has its own internal memory format
• 70-80% computation wasted on serialization and
deserialization
• Similar functionality implemented in multiple
projects
• All systems utilize the same memory format
• No overhead for cross-system communication
• Projects can share functionality
Reference: https://arrow.apache.org

NEW FEATURES in SPARK3.0
SPARK-27396 Public APIs for extended Columnar Processing Support
https://issues.apache.org/jira/browse/SPARK-27396
▪ An interface to extend columnar processing API
▪ Provide an opportunity to create a custom API for columnar data processing with
OAP Native SQL Engine and FPGA support
▪ Advanced user can define a new interface to communication with accelerators
such as GPU or FPGA
SPARK-24615 Accelerator-aware task scheduling for SPARK
https://issues.apache.org/jira/browse/SPARK-24615
▪ An interface for SPARK to allocate accelerators in task level
▪ Make SPARK task to be aware accelerators such as GPU, FPGA, …etc
▪ Currently only support GPU
▪ FPGA can be supported in the same way (vendor specific)

OAP Native SQL Engine Plugin
Intel Optimized Analytics Package(OAP): Native SQL Engine
https://github.com/Intel-bigdata/OAP/
An End-to-End SPARK Columnar based data processing with Intel AVX support
Apache Arrow
Arrow Data Source Arrow Data Processing
Intel CPU Other Accelerators (FPGA, GPU, …)
Columnar Shuffle
SPARK SQL
SPARK Catalyst

FPGA Acceleration
END TO END, COLUMNAR TO COLUMNAR DATA PROCESSING
SPARK3.0
FileScan
FileWriter
Parquet Writer
Parquet Reader
ColumnarVector
ColumnarVector
Columnar-to-Row
Row-to-Columnar
InternalRow
Whole Stage Codegen
Row based Operator
Row based Operator
InternalRow
InternalRow
ColumnarVector
SPARK3.0 + OAP Native SQL Engine
FileScan
FileWriter
Parquet Writer
Parquet Reader
Arrow
Arrow
Columnar based
Operator
Arrow
Columnar based
Operators
Arrow
FPGA Templates
Arrow
FPGA Operators
Aggregation/GroupBy/…
Arrow

OAP NATIVE SQL ENGINE HIGHLIGHT
▪ An Open Source Columnar based Data Processing for SPARK
▪ Apache Arrow based Solution
▪ Enable AVX Support with SIMD instruction acceleration
▪ Leverage SPARK3.0 Support
▪ Communicate with 3rd Party Accelerators
▪ Support Data Source Parsing, SQL Operators, and Columnar Shuffle
▪ Common SQL Operators Support such as filter, join, groupby,
aggregate, …etc.

SPARK + FPGA + ARROW
Why SPARK SQL + FPGA
- SPARK SQL essentially processes structured row-based dataset at once with
single query of a bunch of SQL operators. The operators can be simple while the
dataset could be extremely large.
- FPGA with highly specialized IPs can deal with such multiple-instruction, single-
dataset analysis faster, more power and resource-efficiently than CPU and GPU
under the same total-cost-of-ownership.
Why Arrow
- In order to offload SPARK SQL workload from Java runtime to FPGA, leveraging a
new WholeStageCodegen to invoke native function calls to process data with
FPGA can be messy. Apache Arrow can hold Columnar Batch data inside native
memory and manage its memory reference inside Spark.

SPARK SQL FPGA ACCELERATION
▪ SQL Operators (Aggregation, GroupBy,
Filter, Sort, Join, …etc)
▪ Using Apache Arrow for data transfer
between Java runtime and FPGA to
reduce data traffic
▪ Next step will be leveraging
Arrow::RecordBatch

SPARK(CPU ONLY)
SQL
RDD
Dataset
DataFrame
Catalyst
Optimization
Code
Generation
RDD
Dataset
DataFrame
Code
Execution
Spark Executor
Data Processed
By CPU
Input

SPARK(CPU + FPGA)
SQL
RDD
Dataset
DataFrame
Catalyst
Optimization
Code
Generation
RDD
Dataset
DataFrame
Code
Execution
Spark Executor
1. Get Mem Page for Input / Output 5. Free Memory Pages
FPGA Java Wrapper
3. Start FPGA Engine
Data Block
2. Fill the Input
Data Block
Adaptor / JNI /
Driver
Array[Byte]
4. Iterating the
data by Spark
API
(Schema Specified)
DMA Engine Aggregation Engine GroupBy EngineFPGA
Input
Data Processed
By FPGA
Re-generate
Physical Plan
for FPGA

SPARK SQL + ARROW PERFORMANCE
▪ A simple query with 300GB dataset from TPC-DS Q55
▪ With Apache Arrow, performance boost can be up to 33% and CPU is
obviously offloaded.
SELECT ss_sold_date_sk, sum(ss_ext_sales_price) FROM store_sales
WHERE ss_item_sk = 3175 GROUP BY ss_sold_date_sk
0
2
4
6
8
10
12
14
32 90 300
Minutes
Arrow-boosted Original
Intel® Xeon® Gold 6120 CPU x2
DDR4 256GB
Intel PAC Arria10 x1
(GB)

SYSTEM STACK
Storage
Storage/Data Format
JSON Parquet
Distributed
Execution
Big Data Cores
MapReduce
Spark SQL
Engine Spark RDD/DFOS
OS Core System
CentOS RHEL Ubuntu
FPGA
Accelerator
Accelerators
MapReduce
Accelerator
Spark SQL
Accelerator
Spark RDD/DF
Accelerator
Data
Decoder
WASAI
System Lib & Drivers
WASAI
IOBooster
WASAI
EvoCores
Compressor

OTHER ACCELERATORS
Solution Description Workloads Result
Spark RDD groupByKey, foldByKey, etc microbench
80%~3x performance boost. Shuffle size 90%
to 99% reduction.
General Sort
General TimSort for both
Hadoop & Spark
TeraSort
microbench
20% performance boost
Compression
Compression
encoding/decoding
microbench Ongoing
Erasure Coding EC codec microbench Reach maximum throughput of PCIe
Input Format
Parsing
JSON, CSV, Parquet format
parser
microbench 2X~7.8X performance boost
Intel® Xeon® Gold 6120 CPU x2
DDR4 256GB
Intel PAC Arria10 x1

KEY TAKEAWAYS
▪ End-to-End Columnar Data Processing can optimize the performance
in CPU, FPGA and other Accelerators in native layer.
▪ FPGA can help to accelerate SPARK in many cases that involved heavy
CPU-intensive operations.
▪ Last but not least, with SPARK3.0 support, many new opportunities
can be done in the future.

NEXT …
▪ More features in OAP Native SQL Engine
▪ OAP Native SQL Engine + FPGA integration
OAP Native SQL Engine Plugin
Apache Arrow
Arrow Data Source Arrow Data Processing
Intel CPU WASAI FPGA Accelerators
Columnar Shuffle WASAI
CodeGen

CALL TO ACTION
We encourage you to try OAP Native SQL Engine for SPARK in
Wasai SPARK SQL + FPGA Solution
https://www.wasaitech.com/
Please contact
Intel: weiting.chen@intel.com
Wasai: calvin.hung@wasaitech.com

Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.

Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based FPGA Accelerators

Related slideshows

More Related Content

What's hot

What's hot (20)

Similar to Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based FPGA Accelerators

Similar to Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based FPGA Accelerators (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based FPGA Accelerators