Cisco connect toronto 2015 big data sean mc keown

Big Data Architecture and Deployment
Sean McKeown – Technical Solutions Architect
In partnership with:

Thank you for attending Cisco Connect Toronto 2015, here are a few
housekeeping notes to ensure we all enjoy the session today.
§  Please ensure your cellphones / laptops are set on silent to ensure no
one is disturbed during the session
§  A power bar is available under each desk in case you need to charge
your laptop (Labs only)
House Keeping Notes

§  Big Data Concepts and Overview
§  Enterprise data management and big data
§  Infrastructure attributes
§  Hadoop, NOSQL and MPP Architecture concepts
§  Hadoop and the Network
§  Network behaviour, FAQs
§  Cisco UCS for Big Data
§  Building a big data cluster with the UCS Common Platform Architecture (CPA)
§  UCS Networking, Management, and Scaling for big data
§  Q & A
Agenda

Big Data Concepts and
Overview

“More data usually beats
better algorithms.”
-Anand Rajaraman, SVP @WalmartLabs

The Explosion of Unstructured Data
6
2005 20152010
• More than 90% is unstructured
data
• Approx. 500 quadrillion files
• Quantity doubles every 2 years
• Most unstructured data is neither
stored nor analysed!
1.8 trillion gigabytes of data
was created in 2011…
10,000
0
GBofData
(INBILLIONS)
STRUCTURED DATA
UNSTRUCTURED DATA
Source: Cloudera

When the size of the data itself is part of the problem.
What is Big Data?
7

What isn’t Big Data?
•  Usually not blade servers
(not enough local storage)
•  Usually not virtualised
(hypervisor only adds overhead)
•  Usually not highly
oversubscribed
(significant east-west traffic)
•  Usually not SAN/NAS

Classic NAS/SAN vs. New Scale-out DAS
Traditional –
separate
compute from
storage
New –
move the
compute to
the storage
Bottlenecks
$$$
Low-cost, DAS-based,
scale-out clustered
filesystem
9

Big Data Software Architectures
and Design Considerations

Three Common Big Data Architectures
11
NoSQL
Fast key-value store/
retrieve in real time
Hadoop
Distributed batch, query,
and processing platform
MPP Relational
Database
Scale-out BI/DW

§  Hadoop is a distributed, fault-
tolerant framework for storing and
analysing data
Hadoop: A Closer Look
§  Its two primary components are the
Hadoop Filesystem (HDFS) and the
MapReduce application engine
Hadoop
12
Hadoop Distributed File System
(HDFS)
Map-Reduce
PIG Hive
HBASE

Hadoop: A Closer Look
§  Hadoop 2.0 (with YARN) adds the
ability to run additional distributed
application engines concurrently on
the same underlying filesystem
Hadoop
13
(HDFS)
Map-Reduce
PIG Hive
HBASEImpalaSpark
YARN (Resource Negotiator)

Hadoop
MapReduce Example: Word Count
14
the
quick
brown
fox
the fox
ate the
mouse
how now
brown
cow
Map
Map
Map
Reduce
Reduce
brown, 2
fox, 2
how, 1
now, 1
the, 3
ate, 1
cow, 1
mouse, 1
quick, 1
the, 1
brown, 1
fox, 1
quick, 1
quick, 1
the, 1
fox, 1
the, 1
how, 1
now, 1
brown, 1
ate, 1
mouse, 1
cow, 1
Input Map Shuffle & Sort Reduce Output
cat grep sort uniq

Block
1
Block
2
Block
3
Block
4
Block
5
Block
6•  Scalable & Fault Tolerant
•  Filesystem is distributed, stored across all
data nodes in the cluster
•  Files are divided into multiple large blocks –
64MB default, typically 128MB – 512MB
•  Data is stored reliably. Each block is
replicated 3 times by default
•  Types of Nodes
–  Name Node - Manages HDFS
–  Job Tracker – Manages MapReduce
Jobs
–  Data Node/Task Tracker – stores
blocks/does work
ToR FEX/
switch
Data
node 1
Data
node 2
Data
node 3
Data
node 4
Data
node 5
ToR FEX/
switch
Data
node 6
Data
node 7
Data
node 8
Data
node 9
Data
node 10
ToR FEX/
switch
Data
node 11
Data
node 12
Data
node 13
Name
Node
Job
Tracker
File
Hadoop
15

1.  Fault tolerance
Why Replicate?
Two key reasons
16
2. Increase data locality
Hadoop

“Failure is the defining difference between distributed and local
programming”
- Ken Arnold, CORBA designer
17
Hadoop

HDFS Architecture
ToR FEX/
switch
Data
node 1
Data
node 2
Data
node 3
Data
node 4
Data
node 5
ToR FEX/
switch
Data
node 6
Data
node 7
Data
node 8
Data
node 9
Data
node 10
ToR FEX/
switch
Data
node 11
Data
node 12
Data
node 13
Data
node 14
Data
node 15
1
Switch
Name Node
/usr/sean/foo.txt:blk_1,blk_2
/usr/jacob/bar.txt:blk_3,blk_4
Data node 1:blk_1
Data node 2:blk_2, blk_3
Data node 3:blk_3
1
1
2
2
2
3
3
3
4
4
4
4
18
Hadoop

MapReduce (Hadoop 1.0)
ToR FEX/
switch
Task
Tracker 1
Task
Tracker 2
Task
Tracker 3
Task
Tracker 4
Task
Tracker 5
ToR FEX/
switch
Task
Tracker 6
Task
Tracker 7
Task
Tracker 8
Task
Tracker 9
Task
Tracker 10
ToR FEX/
switch
Task
Tracker 11
Task
Tracker 12
Task
Tracker 13
Task
Tracker 14
Task
Tracker 15
Switch
Job Tracker
Job1:TT1:Mapper1,Mapper2
Job1:TT4:Mapper3,Reducer1
Job2:TT6:Reducer2
Job2:TT7:Mapper1,Mapper3
M1
M2
R1
M3
M1
M3
R2
M2
Hadoop
19

Design Considerations
• Scale-out with Shared-Nothing
• Data Redundancy Options
• Key-Value: JBOD + 3-Way
Replication
• Document-Store: RAID or Replication
Configuration Considerations
(1) Moderate Compute
(2) Balanced IOPS (Performance vs. Cost)
10K RPM HDD
(3) Moderate to High Capacity
Design Considerations for NoSQL Databases
MPP Hadoop NOSQL
Compute
IO Bandwidth
Capacity
20

Design Considerations for MPP Database
MPP Hadoop NOSQL
Compute
IO Bandwidth
Capacity
Design Considerations
• Scale-out with Shared nothing
• Data Redundancy with Local RAID5
Configuration Considerations
(1)  High Compute (Fastest CPU)
(2)  High IO Bandwidth
Flash/SSD and In-memory
(3)  Moderate Capacity
21

Hadoop Network Design
•  The network is the fabric – the ‘bus’
- of the ‘supercomputer’
•  Big data clusters often create high
east-west, any-to-any traffic flows
compared to traditional DC networks
•  Hadoop networks are typically
isolated/dedicated; simple leaf-spine
designs are ideal
•  10GE typical from server to ToR, low
oversubscription from ToR to spine
•  With Hadoop 2.0, clusters will likely
have heterogeneous, multi-workload
behaviour
23

Hadoop Network Traffic Types
24
Small Flows/Messaging
(Admin Related, Heart-beats, Keep-alive,
delay sensitive application messaging)
Small – Medium Incast
(Hadoop Shuffle)
Large Flows
(HDFS egress)
Large Pipeline
(Hadoop Replication)

Map and Reduce Traffic
25
Many-to-Many Traffic Pattern
Map 1 Map 2 Map NMap 3
Reducer 1 Reducer 2 Reducer 3 Reducer N
HDFS
Shuffle
Output
Replication
NameNode
JobTracker
ZooKeeper

Typical Hadoop Job Patterns
Different workloads can have widely varying network impact
26
Analyse (1:0.25) Transform (1:1) Explode (1:1.2)
Map
(input/read)
Shuffle
(network)
Reduce
(output/write)
Data

Reducers Start
Maps Finish
Job
Complete
Maps Start
The red line is
the total
amount of
traffic received
by hpc064
These symbols
represent a
node sending
traffic to
HPC064
Note:
Due the combination of the length of the Map phase and the reduced data set being shuffled, the
network is being utilised throughout the job, but by a limited amount.
Analyse Workload
Wordcount on 200K Copies of complete works of Shakespeare
27
Network graph of all
traffic received on a
single node (80
node run)

Transform Workload (1TB Terasort)
Network graph of all traffic received on a single node (80 node run)
Reducers Start
Maps Finish
Job
CompleteMaps Start
These symbols
represent a
node sending
traffic to
HPC064
The red line is
the total
amount of
traffic received
by hpc064

Output Data Replication Enabled
§ Replication of 3 enabled (1 copy stored locally, 2 stored remotely)
§ Each reduce output is replicated now, instead of just stored locally
Note:
If output replication is enabled, then at the end of the job HDFS must store additional copies. For a
1TB sort, additional 2TB will need to be replicated across the network.
Transform Workload (With Output Replication)
29
Network graph
of all traffic
received on a
single node (80
node run)

Job Patterns - Summary
Job Patterns have varying impact on network utilisation
Analyse
Simulated with Shakespeare Wordcount
Extract Transform Load
(ETL)
Simulated with Yahoo TeraSort
Extract Transform Load
(ETL)
Simulated with Yahoo TeraSort with
output replication
30

Data Locality in Hadoop
The ability to process data where it is locally stored
31
Reducers Start
Maps Finish
Job
CompleteMaps Start
Observations
§Notice this initial
spike in RX Traffic is
before the Reducers
kick in.
§ It represents data
each map task needs
that is not local.
§ Looking at the spike
it is mainly data from
only a few nodes.
Map Tasks:
Initial spike for
non-local data.
Sometimes a
task may be
scheduled on
a node that
does not have
the data
available
locally.

Can Hadoop Really Use 10GE?
•  Analytic workloads tend to be
lighter on the network
•  Transform workloads tend to be
heavier on the network
•  Hadoop has numerous
parameters which affect network
•  Take advantage of 10GE:
–  mapred.reduce.slowstart.completed.maps
–  dfs.balance.bandwidthPerSec
–  mapred.reduce.parallel.copies
–  mapred.reduce.tasks
–  mapred.tasktracker.reduce.tasks.maximum
–  mapred.compress.map.output
Definitely, so tune for it!
33

Can QoS Help?
An example with HBase
34
Map 1 Map 2 Map NMap 3
Reducer
1
Reducer
2
Reducer
3
Reducer
N
HDFS
Shuffle
Output
Replication
Region
Server
Region
Server
Client Client
Read/
Write
Read
Update
Update
Read
Major Compaction

Switch Buffer
Usage
With Network QoS
Policy to prioritise
HBase Update/
Read Operations
0"
5000"
10000"
15000"
20000"
25000"
30000"
35000"
40000"
Latency((us)(
Time(
READ","Average"Latency"(us)" QoS","READ","Average"Latency"(us)"
1"
70"
139"
208"
277"
346"
415"
484"
553"
622"
691"
760"
829"
898"
967"
1036"
1105"
1174"
1243"
1312"
1381"
1450"
1519"
1588"
1657"
1726"
1795"
1864"
1933"
2002"
2071"
2140"
2209"
2278"
2347"
2416"
2485"
2554"
2623"
2692"
2761"
2830"
2899"
2968"
3037"
3106"
3175"
3244"
3313"
3382"
3451"
3520"
3589"
3658"
3727"
3796"
3865"
3934"
4003"
4072"
4141"
4210"
4279"
4348"
4417"
4486"
4555"
4624"
4693"
4762"
4831"
4900"
4969"
5038"
5107"
5176"
5245"
5314"
5383"
5452"
5521"
5590"
5659"
5728"
5797"
5866"
5935"
Buﬀer&Used&
Timeline&
Hadoop"TeraSort" Hbase"
Read Latency
Comparison of
Non-QoS vs. QoS
Policy
~60% Read
Improvement
HBase + MapReduce with QoS
35

ACI Fabric Load Balancing
Flowlet Switching
H1 H2
TCP flow
•  Flowlet switching routes bursts
of packets from the same flow
independently, based on
measured congestion of both
external wires and internal
ASICs
•  Allows packets from the same
flow to take different paths
•  Maintains packet ordering
•  Better path utilisation
•  Transparent – nothing to modify
at the host/app level

ACI Fabric Load Balancing
Dynamic Packet Prioritisation
Real traffic is a mix of large (elephant) and small (mice) flows.
F1
F2
F3
Standard (single priority):
Large flows severely impact
performance (latency & loss).
for small flows
High
Priority
Dynamic Flow Prioritisation:
Fabric automatically gives a higher
priority to small flows.
Standard
Priority
Key Idea:
Fabric detects initial few flowlets of
each flow and assigns them to a
high priority class.

Dynamic Packet Prioritisation
Helping heterogeneous workloads
0
0.5
1
1.5
2
2.5
3
MemSQL Only MemSQL + Hadoop
on Traditional
Network
MemSQL + Hadoop
+ Dynamic Packet
Prioritization
ReadQueries/Sec(InMillions)
§  80-node test cluster
§  MemSQL used to generate
heavy #’s of small flows - mice
§  Large file copy workload
unleashes elephant flows that
trample MemSQL performance
§  DPP enabled, helping to
“protect” the mice from the
elephants
2x Improvement in
Reads per Second

Network Summary
• The network is the “system bus” of the Hadoop
“supercomputer”
• Analytic- and ETL-style workloads can behave
very differently on the network
• Minimise oversubscription, leverage QoS and DPP,
and tune Hadoop to take advantage of 10GE
39

“Life is unfair, and the
unfairness is distributed
unfairly.”
-Russian proverb

Hadoop Server Hardware Evolving
Typical 2009
Hadoop node
• 1RU server
• 4 x 1TB 3.5”
spindles
• 2 x 4-core CPU
• 1 x GE
• 24 GB RAM
• Single PSU
• Running Apache
• $
Economics favor
“fat” nodes
• 6x-9x more data/
node
• 3x-6x more IOPS/
node
• Saturated gigabit,
10GE on the rise
• Fewer total nodes
lowers licensing/
support costs
• Increased
significance of node
and switch failure
Typical 2015
Hadoop node
• 2RU server
• 12 x 4TB 3.5” or 24
x 1TB 2.5” spindles
• 2 x 6-12 core CPU
• 2 x 10GE
• 128-256 GB RAM
• Dual PSU
• Running
commercial/licensed
distribution
• $$$
42

Cisco UCS Common Platform Architecture
Building Blocks for Big Data
UCS
6200
Series

Fabric
Interconnects

Nexus
2232

Fabric
Extenders

(op<onal)

UCS
Manager

UCS
C220/C240

M4
Servers

LAN,
SAN,
Management

43

UCS Reference Configurations for Big Data
Quarter-Rack UCS
Solution for MPP,
NoSQL – High
Performance
Full Rack UCS
Solution for Hadoop
Capacity-Optimised
Full Rack UCS
Solution for Hadoop,
NoSQL – Balanced
2 x UCS 6248
8 x C220 M4 (SFF)
2 x E5-2680v3
256GB
6 x 400-GB SAS SSD
2 x UCS 6296
16 x C240 M4 (LFF)
2 x E5-2620v3
128GB
12 x 4TB 7.2K SATA
2 x UCS 6296
16 x C240 M4 (SFF)
2 x E5-2680v3
256GB
24 x 1.2TB 10K SAS

Hadoop and JBOD
•  It hurts performance:
–  RAID-5 turns parallel sequential
reads into slower random reads
–  RAID-5 means speed limited to the
slowest device in the group
•  It’s wasteful: Hadoop already
replicates data, no need for more
replication
–  Hadoop block copies serve two
purposes: 1) redundancy and 2)
performance (more copies available
increases data locality % for map
tasks)
Why not use RAID-5?
46
read read read read
JBOD
read read read read
RAID-5

Can I Virtualise?
• Hadoop and most big data architectures can run virtualised
• However this is typically not recommended for performance
reasons
–  Virtualised data nodes will contend for storage and network I/O
–  Hypervisor adds overhead, typically without benefit
• Some customers are running master/admin nodes (e.g.
Name Node, Job Tracker, Zookeeper, gateways, etc.) in
VM’s, but consider single point of failure
• UCS is ideal for virtualisation if you go this route
Yes you can (easy with UCS), but should you?
47

Does HDFS Support Storage Tiering?
• HDP 2.2 uses the concept of storage types –
SSD, DISK, ARCHIVE (other distros have similar
features)
• The flag is set at a volume level
• ARCHIVE has three associated policies for files:
–  HOT: all replicas on DISK
–  WARM: one replica on DISK, others on ARCHIVE
–  COLD: all replicas on ARCHIVE
An archiving example with Hortonworks
48
Hadoop

HDFS Archiving
49
Hadoop
DISK ARCHIVE
1
2
File “grey”: WARM
File “blue”: ARCHIVE
1
1
2
2
1
1
1
2
2
2

UCS C3160 Dense Storage Rack Server
Up to 360TB in 4RU
50
Server
Node

2x
E5-‐2600
V2
CPUs

128/256GB
RAM

1GB/4GB
RAID
Cache

Op+onal
Disk
Expansion

4x
hot-‐swappable,
rear-‐load

LFF
4TB/6TB
HDD

HDD

4
Rows
of
hot-‐swappable
HDD

4TB/6TB

Total
top
load:
56
drives

Two
120GB
SSDs
(OS/Boot)

CPA Network Design for Big
Data

Cisco UCS: Physical Architecture
6200
Fabric A
6200
Fabric B
B200
VIC
F
E
X
B
F
E
X
A
SAN
A
SAN
B
ETH
1
ETH
2

MGMT MGMT
Chassis 1
Fabric Switch
Fabric Extenders
Uplink Ports
Compute Blades
Half / Full width
OOB Mgmt
Server Ports
Virtualised Adapters
Cluster
Rack Mount C240
VIC
FEX A FEX B
52
Optional, for
scalability

CPA: Single-connect Topology
Single wire for data and management, no oversubscription
53
2 x 10GE links
per server for all
traffic, data and
management
New (cheaper) bare-metal
port licensing available

CPA: FEX Topology (Optional, For Scalability)
Single wire for data and management
8 x 10GE
uplinks per
FEX= 2:1
oversub (16
servers/rack),
no portchannel
(static pinning)
2 x 10GE links
per server for all
traffic, data and
management
54

CPA Recommended FEX Connectivity
55
•  2232 FEX has 4 buffer groups: ports 1-8, 9-16, 17-24, 25-32
•  Distribute servers across port groups to maximise buffer
performance and predictably distribute static pinning on uplinks

Virtualise the Physical Network Pipe
Adapter
Switch
10GE
A
Eth 1/1
FEX A
6200-A
Physical
Cable
Virtual Cable
(VN-Tag)
Server
Cables
vNIC
1
10GE
A
vEth
1
FEX A
Adapter
6200-A
vHBA
1
vFC
1
Service Profile
Server
vNIC
1
vEth
1
6200-A
vHBA
1
vFC
1
(Server)
ü  Dynamic,
Rapid
Provisioning
ü  State
abstraction
ü  Location
Independence
ü  Blade or Rack
What you getWhat you see
Chassis
56

“NIC bonding is one of Cloudera’s
highest case drivers for
misconfigurations.”
http://blog.cloudera.com/blog/2015/01/how-to-deploy-apache-hadoop-
clusters-like-a-boss/

UCS Fabric Failover
•  Fabric provides NIC failover
capabilities chosen when
defining a service profile
•  Avoids traditional NIC
bonding in the OS
•  Provides failover for both
unicast and multicast traffic
•  Works for any OS on
bare metal
•  (Also works for any
hypervisor-based servers)
vNIC

1

10GE
10GE

vEth

1

OS
/
Hypervisor
/
VM

vEth

1

FEX
FEX

Physical
Adapter
Virtual
Adapter
6200-A 6200-B
L1
L2
L1
L2
Physical Cable
Virtual Cable
Cisco
VIC 1225
58

UCS Networking with Hadoop
59
•  VNIC 1 on Fabric A with
FF to B (internal cluster)
•  VNIC 2 on Fabric B with
FF to A (external data)
•  No OS bonding required
•  VNIC 0 (management)
wiring not shown for
clarity (primary on Fabric
B, FF to A)
Note: cluster traffic will
flow northbound in the
event of a VNIC1
failover. Ensure
appropriate bandwidth/
topology.
VNIC
1
L2/L3 Switching
Data
Node
1

VNIC
2
Data
Node
2

6200 A
VNIC
2
6200 B
VNIC
1
EHM
EHM

Data ingress/egress
VNIC
0
VNIC
0

Create QoS Policies
Leverage simplicity of UCS Service Profiles
60
! !
Best Effort policy for management VLAN Platinum policy for cluster VLAN

Enable JumboFrames for Cluster VLAN
!
1.  Select the LAN tab in the left
pane in the UCSM GUI.
2.  Select LAN Cloud > QoS System
Class.
3.  In the right pane, select the
General tab
4.  In the Platinum row, enter 9000
for MTU.
5.  Check the Enabled Check box
next to Platinum.
6.  Click Save Changes.
7.  Click OK.
61

CPA Sizing and Scaling for Big
Data

Cluster Scalability
A general characteristic
of an optimally
configured cluster is a
linear relationship
between data set sizes
and job completion
times
63

Sizing
•  Start with current storage requirement
–  Factor in replication (typically 3x) and compression (varies by data set)
–  Factor in 20-30% free space for temp (Hadoop) or up to 50% for some NoSQL
systems
–  Factor in average daily/weekly data ingest rate
–  Factor in expected growth rate (i.e. increase in ingest rate over time)
•  If I/O requirement known, use next table for guidance
•  Most big data architectures are very linear, so more nodes = more
capacity and better performance
•  Strike a balance between price/performance of individual nodes vs. total
# of nodes
Part science, part art
64

Remember: Different Apps With Different
Needs
65
MPP NOSQL Hadoop
Compute
IO Bandwidth
Capacity

CPA Sizing and Application Guidelines
Server
CPU

2 x E5-2680v3
2 x E5-2680v3
2 x E5-2620v3

Memory (GB)
256
256
128

Disk Drives
6 x 400GB SSD
24 x 1.2TB 10K SFF
12 x 4TB 7.2K LFF

IO Bandwidth (GB/Sec)
2.6
2.6
1.1

Rack-Level
(32 x C220 or 16 x C240)
Cores
768
384
192

Memory (TB)
8
4
2

Capacity (TB)
64
460
768

IO Bandwidth (GB/Sec)
192
42
16

Applications
MPP DB
NoSQL
Hadoop
NoSQL

Hadoop

Best Performance Best Price/TB
66

Scaling the CPA
Single Rack
16 servers
Single Domain
Up to 10 racks, 160 servers
Multiple Domains
L2/L3 Switching
67

Scaling via Nexus 9K Validated Design
68
Use Nexus 9000 with ACI
to scale out multiple UCS
CPA domains (1000’s of
nodes) and/or to connect
them to other application
systems
Enable ACI’s Dynamic
Packet Prioritisation and
Dynamic Load Balancing
to optimise multi-workload
traffic flows

Consider
intra-‐
and
inter-‐domain
bandwidth:

Servers
Per

Domain

(Pair
of
Fabric

Interconnects)

Available

North-‐Bound

10GE
ports

(per
fabric)

Southbound

oversubscrip<on

(per
fabric)

Northbound

oversubscrip<on

(per
fabric)

Intra-‐domain

server-‐to-‐server

bandwidth
(per

fabric,
Gbits/sec)

Inter-‐domain

server-‐to-‐server

bandwidth
(per

fabric,
Gbits/sec)

160
16
2:1
(FEX)
5:1
5
1

128
32
2:1
(FEX)
2:1
5
2.5

80
16
1:1
(no
FEX)
5:1
10
2

64
32
1:1
(no
FEX)
2:1
10
5

Scaling the Common Platform Architecture
69

Rack Awareness
•  Rack Awareness provides Hadoop the
optional ability to group nodes together in
logical “racks”
•  Logical “racks” may or may not correspond to
physical data centre racks
•  Distributes blocks across different “racks” to
avoid failure domain of a single “rack”
•  It can also lessen block movement between
“racks”
•  Can be useful to control block placement and
movement in UCSM integrated environments
“Rack” 1
Data
node 1
Data
node 2
Data
node 3
Data
node 4
Data
node 5
“Rack” 2
Data
node 6
Data
node 7
Data
node 8
Data
node 9
Data
node 10
“Rack” 3
Data
node 11
Data
node 12
Data
node 13
Data
node 14
Data
node 15
1
1
1
2
2
2
3
3
3
4
4
4
70

Recommendations: UCS Domains and Racks
Single Domain Recommendation
Turn off or enable at physical rack level
•  For simplicity and ease of
use, leave Rack Awareness
off
•  Consider turning it on to limit
physical rack level fault
domain (e.g. localised
failures due to physical data
centre issues – water, power,
cooling, etc.)
Multi Domain Recommendation
Create one Hadoop rack per UCS Domain
•  With multiple domains,
enable Rack Awareness
such that each UCS Domain
is its own Hadoop rack
•  Provides HDFS data
protection across domains
•  Helps minimise cross-
domain traffic
71

“The future is here, it’s just
not evenly distributed.”
-William Gibson, author

Summary
• Think of big data clusters as a single “supercomputer”
• Think of the network as the “system bus” of the
supercomputer
• Strive for consistency in your deployments
• The goal is an even distribution of load – distribute fairly
• Cisco Nexus and UCS Common Platform Architecture
for Big Data can help!
Leverage UCS and Nexus to integrate big data into your data centre operations
73

§  Cisco dCloud is a self-service platform that can be accessed via a browser, a high-speed
Internet connection, and a cisco.com account
§  Customers will have direct access to a subset of dCloud demos and labs
§  Restricted content must be brokered by an authorized user (Cisco or Partner) and then shared
with the customers (cisco.com user).
§  Go to dcloud.cisco.com, select the location closest to you, and log in with your cisco.com
credentials
§  Review the getting started videos and try Cisco dCloud today: https://dcloud-cms.cisco.com/help
dCloud
Customers now get full dCloud experience!

In partnership with:
Thank you. Visit us in the World of Solutions

Cisco connect toronto 2015 big data sean mc keown

More Related Content

What's hot

What's hot (20)

Similar to Cisco connect toronto 2015 big data sean mc keown

Similar to Cisco connect toronto 2015 big data sean mc keown (20)

More from Cisco Canada

More from Cisco Canada (20)

Cisco connect toronto 2015 big data sean mc keown