A simple spark standalone cluster for your testing environment purposses
-
Updated
Mar 6, 2024 - Dockerfile
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
A simple spark standalone cluster for your testing environment purposses
Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an Apache Spark Performance Dashboard using containers technology.
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Deploy your Spark Production Cluster on Kubernetes
running apache spark with docker swarm
PySpark in Docker Containers
Spyrk-cluster is a data mini-lab, considering the main technologies used these days. It's useful to either understand how to configure a cluster, or just to take it for granted to use for testing with submit or interactive jobs.
Examples and custom spark images for working with the spark-on-k8s operator on AWS
RELK -- The Research Elastic Stack (Kafka, Beats, Zookeeper, Logstash, ElasticSearch, Kibana, Spark, & Jupyter -- All in Docker)
🐋 Docker image for AWS Glue Spark/Python
Using python3.6 alpine base image adds java,pandas, numpy,pyspark and spark as rundeps. This image can be used as container image when you run spark-submit on k8.
Docker image for the Archives Unleashed Toolkit
Standalone Spark setup with Hadoop and Hive leveraged on docker containers.
Created by Matei Zaharia
Released May 26, 2014