A community index of third-party packages for Apache Spark.

Showing packages 1 - 50 out of 62 for search "tags:"Data Sources""

Integration utilities for using Spark with Apache Avro data

@databricks / Latest release: 4.0.0-s_2.11 (2017-10-30) / Apache-2.0 / (13)


Redshift Data Source for Apache Spark

@databricks / Latest release: 3.0.0-preview1 (2016-11-01) / Apache-2.0 / (3)


Spark SQL CSV data source

@databricks / Latest release: 1.5.0-s_2.11 (2016-09-07) / Apache-2.0 / (10)


Connecting Apache Spark with different data stores

@Stratio / Latest release: 0.7.0-RC1 (2015-01-14) / Apache-2.0 / (20)


An external PySpark module that works like R's read.csv or Panda's read_csv, with automatic type inference and null value handling. Parses csv data into SchemaRDD. No installation required, simply include pyspark_csv.py via SparkContext.

@seahboonsiew / No release yet / (1)


MongoDB data source for Spark SQL

@Stratio / Latest release: 0.12.0 (2016-08-31) / Apache-2.0 / (14)


PySpark Cassandra brings back the fun in working with Cassandra data in PySpark.

@TargetHolding / Latest release: 0.3.5 (2016-03-30) / Apache-2.0 / (1)


Connects Spark to Cassandra

@datastax / Latest release: 2.4.0-s_2.11 (2018-11-29) / Apache-2.0 / (14)


Power BI API adapter for Apache Spark

@granturing / Latest release: 1.5.0_0.0.7 (2015-09-13) / Apache-2.0 / (0)


Spark connector for SequoiaDB

@SequoiaDB / Latest release: 1.12-s_2.11 (2015-03-30) / Apache-2.0 / (2)


Spark SQL IBM Cloudant External Datasource

@cloudant / No release yet / (1)


Deprecated, please see couchbase/couchbase-spark-connector

@couchbaselabs / Latest release: 1.0.0 (2015-10-20) / Apache-2.0 / (1)


Splittable SAS (.sas7bdat) Input Format for Hadoop and Spark SQL

@saurfang / Latest release: 3.0.0-s_2.12 (2020-09-13) / Apache-2.0 / (1)


Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.

@LucidWorks / Latest release: 2.0.1 (2016-06-09) / Apache-2.0 / (1)


Spark and Spark SQL integration for Succinct

@amplab / Latest release: 0.1.8 (2019-07-10) / Apache-2.0 / (1)


Pyspark support for Elastic Search

@TargetHolding / Latest release: 0.4.2 (2016-03-22) / Apache-2.0 / (1)


Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data/Compute Engine

@ddf-project / No release yet / (11)


Official integration between Apache Spark and Elasticsearch real-time search and analytics

@elastic / Latest release: 5.3.1 (2017-04-21) / Apache-2.0 / (3)


Geo Spatial Data Analytics on Spark

@harsha2010 / Latest release: 1.0.5-s_2.11 (2017-08-14) / Apache-2.0 / (1)


An Apache Spark utility for pulling Tweets from Gnip's PowerTrack in realtime

@knoldus / No release yet / (1)


Spark Salesforce Wave Connector

@springml / Latest release: 1.2.0 (2018-04-25) / Apache-2.0 / (2)


Infinispan Spark Connector

@infinispan / Latest release: 0.9 (2018-11-05) / Apache-2.0 / (0)


Read SparkSQL parquet file as RDD[Protobuf]

@saurfang / Latest release: 0.1.2-s_2.10 (2015-08-18) / Apache-2.0 / (0)


Spark on Aliyun, supporting interactions with Aliyun's base services.

@aliyun / No release yet / (1)


Spark mainframe connector

@Syncsort / Latest release: 1.0.0 (2015-09-01) / Apache-2.0 / (0)


Docker-based, End-to-End, Real-time, Advanced Analytics Big Data Reference Pipeline using Spark, Spark SQL, Spark Streaming, ML, MLlib, GraphX, Kafka, Cassandra, Redis, Apache Zeppelin, Spark-Notebook, iPython/Jupyter Notebook, Tableau, H2O Flow, Tachyon,

@fluxcapacitor / No release yet / (3)


XML data source for Spark SQL and DataFrames

@HyukjinKwon / Latest release: 0.1.1-s_2.10 (2015-11-19) / Apache-2.0 / (1)


Popular ML Datasets for Spark ML (MNIST, IRIS, CIFAR)

@cookieai / Latest release: 0.1.0 (2015-12-22) / Apache-2.0 / (0)


Google Spreadsheets datasource for SparkSQL and DataFrames

@potix2 / Latest release: 0.6.3-s_2.11 (2019-08-21) / Apache-2.0 / (1)


Spark Package to read and write PLY, LAS and XYZ lidar point clouds using Spark SQL.

@IGNF / Latest release: 0.1.0-s_2.10 (2015-12-08) / Apache-2.0 / (0)


Spark connector for Ryft ONE

@getryft / Latest release: 0.9.0 (2017-04-04) / other license / (1)


Spark connector for SFTP

@springml / Latest release: 1.1.3 (2018-10-01) / Apache-2.0 / (2)


Data source for querying SPARQL endpoints

@USU-Research / Latest release: 1.0.0-beta1-s_2.10 (2016-01-27) / Apache-2.0 / (0)


NetFlow data source for Spark SQL and DataFrames

@sadikovi / Latest release: 2.1.0-s_2.12 (2020-12-24) / Apache-2.0 / (2)


Spark uploader for S3

@knoldus / No release yet / (1)


The Official Couchbase Spark Connector

@couchbase / Latest release: 2.2.0 (2017-09-20) / Apache-2.0 / (2)


SnappyData: OLTP + OLAP Database built on Apache Spark

@SnappyDataInc / Latest release: 1.2.0-s_2.11 (2020-02-07) / Apache-2.0 / (4)


Connects Spark to Hazelcast

@erenavsarogullari / Latest release: 1.0.0-s_2.11 (2016-03-07) / Apache-2.0 / (0)


High performing connector to object storage for Apache Spark.  Supports IBM Cloud Object Storage and OpenStack Swift

@SparkTC / Latest release: 1.1.4 (2021-12-07) / Apache-2.0 / (1)


The official Riak Spark Connector for Apache Spark with Riak TS and Riak KV

@basho / Latest release: 1.6.3 (2017-03-17) / Apache-2.0 / (2)


Google BigQuery support for Spark, SQL, and DataFrames

@spotify / Latest release: 0.2.2-s_2.10 (2017-11-29) / Apache-2.0 / (3)


Officially supported, Apache 2 licensed Neo4j Connector for Apache Spark.

@neo4j-contrib / Latest release: 5.3.1-s_2.13 (2024-07-08) / Apache-2.0 / (1)


Spark Receiver for SQL or NoSQL Databases like Cassandra, MongoDB, Elasticsearch or JDBC

@Stratio / Latest release: 0.1.0 (2016-06-30) / Apache-2.0 / (1)


Write your RDDs and DStreams to Kafka seamlessly

@BenFradet / Latest release: 0.4.0 (2017-07-22) / Apache-2.0 / (0)


Apache Spark datasource for OrientDB

@sbcd90 / No release yet / (1)


Spark SQL datasource for GitHub PR API

@lightcopy / Latest release: 1.3.0-s_2.10 (2016-12-25) / Apache-2.0 / (0)


A Spark datasource for the HadoopCryptoLedger library

@ZuInnoTe / Latest release: 1.3.2-s_2.12 (2021-12-24) / Apache-2.0 / (1)


A Spark datasource for the HadoopOffice library

@ZuInnoTe / Latest release: 1.7.0-s_2.13 (2022-10-29) / Apache-2.0 / (1)


Generic Connector for Apache Spark

@alvsanand / Latest release: 0.2.0-spark_2x-s_2.11 (2017-01-17) / Apache-2.0 / (1)


Spark Tensorflow Connector

@tapanalyticstoolkit / Latest release: 1.0.0-s_2.11 (2017-02-21) / Apache-2.0 / (3)