From the course: Scala Essential Training for Data Science
Unlock the full course today
Join today to access over 23,200 courses taught by industry experts.
Mapping Functions over RDDs - Scala Tutorial
From the course: Scala Essential Training for Data Science
Mapping Functions over RDDs
- [Instructor] Let's take a look at how we can apply mapping functions over RDDs or Resilient Distributed Datasets. So let's start the spark-repl, and I'm running the spark-repl from the bin directory of the spark package that I installed. So you just need to navigate to whatever directory you installed it and navigate to the bin subdirectory and then start spark-shell. Okay, I am going to work with a list of random numbers, so I'm going to import a helper package, import scala.util.Random, and I'm going to create a value called big range or bigRng for short, and I'm going to call scala.util.Random, and from that package, I'm going to get the shuffle method, and I'm going to specify that I want a range of one to 100000, and this'll generate random numbers for me. Great, so now I have a collection. We'll see here it's a collection of random numbers, so I'm just going to hit Ctrl+L to clear the screen. Now what I want to do is map this into an RDD. So I'll specify val bigPRng for…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.