Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 321 public repositories matching this topic...
50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Jenkins, TeamCity, Alpine, CentOS, Debian, Fedora, Ubuntu, Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak
-
Updated
Jul 10, 2024 - Shell
[PROJECT IS NO LONGER MAINTAINED] Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
-
Updated
Feb 21, 2022 - Shell
基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
-
Updated
May 26, 2019 - Shell
A Docker container with a full Hadoop cluster setup with Spark and Zeppelin
-
Updated
Feb 2, 2020 - Shell
Apache Hive Metastore as a Standalone server in Docker
-
Updated
Feb 29, 2024 - Shell
Easy CPU Profiling for Apache Spark applications
-
Updated
Aug 20, 2020 - Shell
The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Postgres, Cassandra, Hue, Zeppelin, Kadmin, Kafka Control Center and pgAdmin. This cluster is solely intended for usage in a development environment. Do not use it to run any production workloads.
-
Updated
Feb 27, 2023 - Shell
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 420 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia