Cmu 2011 09.pptx

10/11/11
©
MapR
Conﬁden0al
1

MapR,
Implica0ons
for
Integra0on

CMU
–
September
2011

10/11/11
©
MapR
Conﬁden0al
2

Outline

•  MapR
system
overview

•  Map-‐reduce
review

•  MapR
architecture

•  Performance
Results

on
MapR

•  Architectural
implica0ons

•  Search
indexing
/
deployment

•  EM
algorithm
for
machine
learning

•  …
and
more
…

10/11/11
©
MapR
Conﬁden0al
3

Map-‐Reduce

!"
!"
!#
!#
$%&'()"*" +,&)!'%-(./%0)
"*#
12'!!3)"*4 536'-3)
!'%-(./%0)
"*7
8'(&'()"930)
"*:
@/-,9)
A.0B
@/-,9)
A.0B
!"!#
Input
Output

Shuﬄe

10/11/11
©
MapR
Confiden0al
4

BoQlenecks
and
Issues

•  Read-‐only
files

•  Many
copies
in
I/O
path

•  Shuffle
based
on
HTTP

•  Can’t
use
new
technologies

•  Eats
file
descriptors

•  Spills
go
to
local
file
space

•  Bad
for
skewed
distribu0on
of
sizes

10/11/11
©
MapR
Conﬁden0al
5

MapR
Areas
of
Development

Map

Reduce

Storage

Services

Ecosystem

HBase

Management

10/11/11
©
MapR
Confiden0al
6

MapR
Improvements

•  Faster
file
system

•  Fewer
copies

•  Mul0ple
NICS

•  No
file
descriptor
or
page-‐buf
compe00on

•  Faster
map-‐reduce

•  Uses
distributed
file
system

•  Direct
RPC
to
receiver

•  Very
wide
merges

10/11/11
©
MapR
Conﬁden0al
7

MapR
Innova0ons

•  Volumes

•  Distributed
management

•  Data
placement

•  Read/write
random
access
ﬁle
system

•  Allows
distributed
meta-‐data

•  Improved
scaling

•  Enables
NFS
access

•  Applica0on-‐level
NIC
bonding

•  Transac0onally
correct
snapshots
and
mirrors

10/11/11
©
MapR
Conﬁden0al
8

MapR's
Containers

l  Each
container
contains

l  Directories
&
ﬁles

l  Data
blocks

l  Replicated
on
servers

l  No
need
to
manage

directly

Files/directories
are
sharded
into
blocks,
which

are
placed
into
mini
NNs
(containers
)
on
disks

Containers
are

16-‐32
GB
segments

of
disk,
placed
on

nodes

10/11/11
©
MapR
Conﬁden0al
9

MapR's
Containers

l  Each
container
has
a

replica0on
chain

l  Updates
are
transac0onal

l  Failures
are
handled
by

rearranging
replica0on

10/11/11
©
MapR
Conﬁden0al
10

Container
loca0ons
and
replica0on

CLDB

N1,
N2

N3,
N2

N1,
N2

N1,
N3

N3,
N2

N1

N2

N3

Container
loca0on
database

(CLDB)
keeps
track
of
nodes

hos0ng
each
container
and

replica0on
chain
order

10/11/11
©
MapR
Confiden0al
11

MapR
Scaling

Containers
represent
16
-‐
32GB
of
data

l  Each
can
hold
up
to

1
Billion
files
and
directories

l  100M
containers
=

~
2
Exabytes

(a
very
large
cluster)

250
bytes
DRAM
to
cache
a
container

l  25GB
to
cache
all
containers
for
2EB
cluster

-  But
not
necessary,
can
page
to
disk

l  Typical
large
10PB
cluster
needs
2GB

Container-‐reports
are
100x
-‐
1000x

<

HDFS
block-‐reports

l  Serve
100x
more
data-‐nodes

l  Increase
container
size
to
64G
to
serve
4EB
cluster

l  Map/reduce
not
affected

10/11/11
©
MapR
Conﬁden0al
12

MapR's
Streaming
Performance

Read Write
0
250
500
750
1000
1250
1500
1750
2000
2250
Read Write
0
250
500
750
1000
1250
1500
1750
2000
2250
Hardware
MapR
HadoopMB

per

sec

Tests:

i.

16
streams
x
120GB

ii.

2000
streams
x
1GB

11
x
7200rpm
SATA
11
x
15Krpm
SAS

Higher
is
be;er

10/11/11
©
MapR
Conﬁden0al
13

Terasort
on
MapR

1.0
TB
0
10
20
30
40
50
60
3.5
TB
0
50
100
150
200
250
300
MapR
Hadoop
Elapsed

=me

(mins)

10+1
nodes:
8
core,
24GB
DRAM,
11
x
1TB
SATA
7200
rpm

Lower
is
be;er

10/11/11
©
MapR
Conﬁden0al
14

HBase
on
MapR

Records

per

second

Higher
is
be;er

0

5000

10000

15000

20000

25000

Zipﬁan
Uniform

MapR

Apache

YCSB
Random
Read

with
1
billion
1K
records

10+1
node
cluster:
8
core,
24GB
DRAM,
11
x
1TB
7200
RPM

10/11/11
©
MapR
Confiden0al
15

#
of
files
(m)

Rate(files/sec)
Op:

-‐
create
file

-‐
write
100
bytes

-‐
close

Notes:

-‐
NN
not
replicated

-‐
NN
uses
20G
DRAM

-‐
DN
uses

2G

DRAM

Out
of
box

Tuned

Small
Files
(Apache
Hadoop,
10
nodes)

10/11/11
©
MapR
Conﬁden0al
16

MUCH
faster
for
some
opera0ons

#
of
ﬁles
(millions)

Create

Rate

Same
10
nodes
…

10/11/11
©
MapR
Conﬁden0al
17

What
MapR
is
not

•  Volumes
!=
federa0on

•  MapR
supports
>
10,000
volumes
all
with

independent
placement
and
defaults

•  Volumes
support
snapshots
and
mirroring

•  NFS
!=
FUSE

•  Checksum
and
compress
at
gateway

•  IP
fail-‐over

•  Read/write/update
seman0cs
at
full
speed

•  MapR
!=
maprfs

10/11/11
©
MapR
Conﬁden0al
18

New
Capabili0es

10/11/11
©
MapR
Conﬁden0al
19

Alterna0ve
NFS
moun0ng
models

•  Export
to
the
world

•  NFS
gateway
runs
on
selected
gateway
hosts

•  Local
server

•  NFS
gateway
runs
on
local
host

•  Enables
local
compression
and
check
summing

•  Export
to
self

•  NFS
gateway
runs
on
all
data
nodes,
mounted

from
localhost

10/11/11
©
MapR
Conﬁden0al
20

Export
to
the
world

NFS

Server

NFS

Server

NFS

Server

NFS

Server
NFS

Client

10/11/11
©
MapR
Conﬁden0al
21

Client

NFS

Server

Local
server

Applica0on

Cluster
Nodes

10/11/11
©
MapR
Conﬁden0al
22

Cluster

Node

NFS

Server

Universal
export
to
self

Task

Cluster
Nodes

10/11/11
©
MapR
Conﬁden0al
23

Cluster

Node

NFS

Server

Task

Cluster

Node

NFS

Server

Task

Cluster

Node

NFS

Server

Task

Nodes
are
iden0cal

10/11/11
©
MapR
Conﬁden0al
24

Applica0on
architecture

•  High
performance
map-‐reduce
is
nice

•  But
algorithmic
ﬂexibility
is
even
nicer

10/11/11
©
MapR
Conﬁden0al
25

Sharded
text
Indexing

Map

Reducer

Input

documents

Local

disk
Search

Engine

Local

disk

Clustered

index
storage

Assign
documents

to
shards

Index
text
to
local
disk

and
then
copy
index
to

distributed
ﬁle
store

Copy
to
local
disk

typically
required
before

index
can
be
loaded

10/11/11
©
MapR
Conﬁden0al
26

Sharded
text
indexing

•  Mapper
assigns
document
to
shard

•  Shard
is
usually
hash
of
document
id

•  Reducer
indexes
all
documents
for
a
shard

•  Indexes
created
on
local
disk

•  On
success,
copy
index
to
DFS

•  On
failure,
delete
local
ﬁles

•  Must
avoid
directory
collisions

•  can’t
use
shard
id!

•  Must
manage
and
reclaim
local
disk
space

10/11/11
©
MapR
Conﬁden0al
27

Conven0onal
data
ﬂow

Map

Reducer

Input

documents

Local

disk
Search

Engine

Local

disk

Clustered

index
storage

Failure
of
a
reducer

causes
garbage
to

accumulate
in
the

local
disk

Failure
of
search

engine
requires

another
download

of
the
index
from

clustered
storage.

10/11/11
©
MapR
Confiden0al
28

Search

Engine

Simplified
NFS
data
flows

Map

Reducer

Input

documents

Clustered

index
storage

Failure
of
a
reducer

is
cleaned
up
by

map-‐reduce

framework

Search
engine

reads
mirrored

index
directly.

Index
to
task
work

directory
via
NFS

10/11/11
©
MapR
Confiden0al
29

Simplified
NFS
data
flows

Map

Reducer

Input

documents

Search

Engine

Mirrors

Search

Engine

Mirroring
allows

exact
placement

of
index
data

Aribitrary
levels

of
replica0on

also
possible

10/11/11
©
MapR
Conﬁden0al
30

How
about
another
one?

10/11/11
©
MapR
Conﬁden0al
31

K-‐means

•  Classic
E-‐M
based
algorithm

•  Given
cluster
centroids,

•  Assign
each
data
point
to
nearest
centroid

•  Accumulate
new
centroids

•  Rinse,
lather,
repeat

10/11/11
©
MapR
Conﬁden0al
32

Aggregate

new

centroids

K-‐means,
the
movie

Assign

to

Nearest

centroid

Centroids

I

n

p

u

t

10/11/11
©
MapR
Conﬁden0al
34

Average

models

Parallel
Stochas0c
Gradient
Descent

Train

sub

model

Model

I

n

p

u

t

10/11/11
©
MapR
Conﬁden0al
35

Update

model

Varia0onal
Dirichlet
Assignment

Gather

suﬃcient

sta0s0cs

Model

I

n

p

u

t

10/11/11
©
MapR
Conﬁden0al
36

Old
tricks,
new
dogs

•  Mapper

•  Assign
point
to
cluster

•  Emit
cluster
id,
(1,
point)

•  Combiner
and
reducer

•  Sum
counts,
weighted
sum
of
points

•  Emit
cluster
id,
(n,
sum/n)

•  Output
to
HDFS

Read
from

HDFS
to
local
disk

by
distributed
cache

WriQen
by

map-‐reduce

Read
from
local
disk

from
distributed
cache

10/11/11
©
MapR
Conﬁden0al
37

Old
tricks,
new
dogs

•  Mapper

•  Assign
point
to
cluster

•  Emit
cluster
id,
(1,
point)

•  Combiner
and
reducer

•  Sum
counts,
weighted
sum
of
points

•  Emit
cluster
id,
(n,
sum/n)

•  Output
to
HDFS

MapR
FS

Read

from

NFS

WriQen
by

map-‐reduce

10/11/11
©
MapR
Conﬁden0al
38

Poor
man’s
Pregel

•  Mapper

•  Lines
in
bold
can
use
conven0onal
I/O
via
NFS

38

while not done:!
read and accumulate input models!
for each input:!
accumulate model!
write model!
synchronize!
reset input format!
emit summary!

10/11/11
©
MapR
Conﬁden0al
39

Click
modeling
architecture

Feature

extrac0on

and

down

sampling

I

n

p

u

t

Side-‐data

Data

join

Sequen0al

SGD

Learning

Map-‐reduce

Now
via
NFS

10/11/11
©
MapR
Conﬁden0al
40

Click
modeling
architecture

Map-‐reduce
Map-‐reduce

Feature

extrac0on

and

down

sampling

I

n

p

u

t

Side-‐data

Data

join

Sequen0al

SGD

Learning

Map-‐reduce

cooperates

with
NFS

Sequen0al

SGD

Learning

Sequen0al

SGD

Learning

Sequen0al

SGD

Learning

10/11/11
©
MapR
Conﬁden0al
42

??

Hybrid
model
ﬂow

Map-‐reduce

Map-‐reduce

Feature
extrac0on

and

down
sampling

SVD

(PageRank)

(spectral)

Deployed

Model

Down

stream

modeling

10/11/11
©
MapR
Conﬁden0al
44

Map-‐reduce
Sequen0al

Hybrid
model
ﬂow

Feature
extrac0on

and

down
sampling

SVD

(PageRank)

(spectral)

Deployed

Model

Down

stream

modeling

10/11/11
©
MapR
Conﬁden0al
46

Trivial
visualiza0on
interface

output
is
visible
via
NFS

•  Legacy
visualiza0on
just
works

$ R!
> x <- read.csv(“/mapr/my.cluster/home/ted/data/foo.out”)!
> plot(error ~ t, x)!
> q(save=‘n’)!

10/11/11
©
MapR
Conﬁden0al
47

Conclusions

•  We
used
to
know
all
this

•  Tab
comple0on
used
to
work

•  5
years
of
work-‐arounds
have
clouded
our

memories

•  We
just
have
to
remember
the
future

Cmu 2011 09.pptx

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Cmu 2011 09.pptx

Similar to Cmu 2011 09.pptx (20)

More from MapR Technologies

More from MapR Technologies (20)

Recently uploaded

Recently uploaded (20)

Cmu 2011 09.pptx