From the course: Scala Essential Training for Data Science

Unlock the full course today

Join today to access over 23,200 courses taught by industry experts.

Mapping Functions over RDDs

Mapping Functions over RDDs

- [Instructor] Let's take a look at how we can apply mapping functions over RDDs or Resilient Distributed Datasets. So let's start the spark-repl, and I'm running the spark-repl from the bin directory of the spark package that I installed. So you just need to navigate to whatever directory you installed it and navigate to the bin subdirectory and then start spark-shell. Okay, I am going to work with a list of random numbers, so I'm going to import a helper package, import scala.util.Random, and I'm going to create a value called big range or bigRng for short, and I'm going to call scala.util.Random, and from that package, I'm going to get the shuffle method, and I'm going to specify that I want a range of one to 100000, and this'll generate random numbers for me. Great, so now I have a collection. We'll see here it's a collection of random numbers, so I'm just going to hit Ctrl+L to clear the screen. Now what I want to do is map this into an RDD. So I'll specify val bigPRng for…

Contents