Skip to content

timvw/frameless-ext

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Frameless-ext

This library contains additional syntax for Frameless.

Build Status Maven Central

Usage

Import the dependency:

libraryDependencies += "be.icteam" %% "frameless_ext" % "2.0.0"

Enable the additional syntax with the following import statement:

import be.icteam.frameless.syntax._

And now you can create TypedColumns via a simple lambda on a TypeDataSet, eg:

val tds: TypedDataset[Event] = ???

// the compiler can infer the types ;)
val userColumn = tds.tc(_.user)
val dayColumn = tds.tc(_.day)

The available aggregation functions become more discoverable in your IDE as well:

val result: TypedDataset[(String, Long, Int)] = tds.
  .groupBy(e.tds(_.user))
  .agg(
    tds.tc(_.day).countDistinct,
    tds.tc(_.hour).max)

Here is the complete example:

case class Event(user: String, year: Int, month: Int, day: Int, hour: Int)

object Demo {

  import frameless._
  import frameless.syntax._
  import frameless.functions.aggregate._
  import be.icteam.frameless.syntax._
  import org.apache.log4j.{Level, LogManager}
  import org.apache.spark.sql.SparkSession

  def initSpark: SparkSession = {

    LogManager.getLogger("org").setLevel(Level.ERROR)

    SparkSession
      .builder()
      .appName("demo")
      .master("local[*]")
      .config("spark.ui.enabled", "false")
      .getOrCreate()
  }

  def main(args: Array[String]): Unit = {

    implicit val spark = initSpark

    import spark.implicits._

    val events = spark.createDataset(List(
      Event("tim", 2020, 9, 1, 7),
      Event("tim", 2020, 9, 1, 3),
      Event("tim", 2020, 9, 2, 5),
      Event("tim", 2020, 9, 2, 3),
      Event("tiebe", 2020, 9, 1, 2)
    ))

    val e = TypedDataset.create(events)

    val result: TypedDataset[(String, Long, Long, Int, Long)] = e
      .groupBy(e.tc(_.user))
      .agg(
        count[Event](),
        e.tc(_.day).countDistinct,
        e.tc(_.hour).max,
        e.tc(_.year).sum)

    val job = result.show(10, false)
    job.run()

  }
}

Development

Compile and test:

sbt +clean; +cleanFiles; +compile; +test

Install a snapshot in your local maven repository:

sbt +publishM2

Release

Set the following environment variables:

  • PGP_PASSPHRASE
  • PGP_SECRET
  • SONATYPE_USERNAME
  • SONATYPE_PASSWORD

Leveraging the ci-release plugin:

sbt ci-release

Find the most recent release:

git ls-remote --tags $REPO | \
  awk -F"/" '{print $3}' | \
  grep '^v[0-9]*\.[0-9]*\.[0-9]*' | \
  grep -v {} | \
  sort --version-sort | \
  tail -n1

Push a new tag to trigger a release via travis-ci:

v=v1.0.5
git tag -a $v  -m $v
git push origin $v

License

Code is provided under the Apache 2.0 license available at http://opensource.org/licenses/Apache-2.0, as well as in the LICENSE file. This is the same license used as Spark and Frameless.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages