Configuring a Secure, Multitenant Cluster for the Enterprise

Configuring a
secure, multitenant
cluster for the
enterprise
James Kinley // Principal Solutions Architect

© 2014 Cloudera, Inc. All rights reserved. 2
About me
• James Kinley
• Principal Solutions Architect, EMEA
• Hadoop user since 2010
• Clouderan since 2012
• Background in UK defence industry and cyber security
• github.com/jrkinley
• jameskinley.tumblr.com
• @jrkinley
• uk.linkedin.com/in/jameskinley

Introduction: Data Hub Objectives
• Sharing Data
better insight
• Sharing Compute
better utilisation and performance
• Consolidated Operations
reduced cost and complexity

Multitenancy in Hadoop refers to a set of
features that enable multiple groups from
within the same organisation to share the
common set of resources in a cluster without
negatively impacting service-levels, violating
security constraints, or even revealing the
existence of each other, all via policy rather
than physical separation.
© 2014 Cloudera and/or its affiliates. All rights reserved. 4

Multitenant Cluster Architecture
• Security & Governance
• HDFS Information Architecture
(IA)
• Authentication
• Authorisation
• Auditing
• Quota management
• Resource Isolation &
Management
• Static partitioning
• Dynamic partitioning
• Impala admission control
PARTNER LOGO

Security & Governance
• HDFS Information Architecture: file and directory structure
• Authentication: proves users are who they say they are
[Kerberos, Identity Management (LDAP)]
• Authorisation: determines what users can see and do
[HDFS Permissions, RBAC (Apache Sentry), Encryption]
• Auditing: determines who did what, and when
[Cloudera Navigator]

• HDFS Information Architecture (IA)
drwxr-x---+ tadmin tgroup /users/{tenantId}
drwxr-x--- tadmin tgroup /users/{tenantId}/archive
drwxrwx---+ tadmin hive /users/{tenantId}/warehouse
drwxrwx---+ tadmin hive /users/{tenantId}/warehouse/{db}/{table}/{partition}
drwxr-x---+ tadmin tgroup /users/{tenantId}/landing
drwxrwx--- tadmin tgroup /users/{tenantId}/processing
drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId}
drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId}/input
drwxr-x--- {tuser} tgroup /users/{tenantId}/processing/{jobId}/output

• Authentication: Kerberos & LDAP

• Authorisation: HDFS permissions

• Authorisation: HDFS extended ACLs
drwxr-x--- {tuser} tgroup Give /users/{“tingest” tenantId}/user permission processing/{over jobId}/the landing input
directory:
$ hdfs dfs -setfacl -m user:tingest:rwx /users/{tenantId}/landing
Give “hive” group permission over the landing directory:
$ hdfs dfs -setfacl –m group:hive:rwx /users/{tenantId}/landing

• Authorisation: Apache Sentry (incubating)
• Fine-grained, role-based access control (RBAC)
• Users can see only the data and metadata to which they have been granted
the privilege
• Currently works with Apache Hive, Cloudera Impala, and Cloudera Search
• File or Service (GRANT/REVOKE) based policy providers
• Role-based privilege model
• {user} > {groups} > {roles} > object > privilege
• object = {server, database, table, URI}
• privilege = {select, insert, all}
• Supports grant permission delegation for multitenant clusters

• Authorisation: Apache Sentry (incubating)
drwxr-x--- {tuser} tgrouDp ele/ugsaetres g/r{atnetn aanntdI dre}v/oprkoec pesrisviilnegg/e{ jtoo btIedn}a/notu’st paudtmin role:
> GRANT ALL ON DATABASE {db} TO ROLE {tadmin} WITH GRANT OPTION;

• Authorisation: Encryption
• Network encryption (HDFS and MR)
• At-rest encryption for HDFS
• Cloudera Navigator Encrypt & KeyTrustee (Gazzang)
• Project Rhino (Cloudera + Intel)
• HDFS-level encryption (HDFS-6134 + HADOOP-10150)
• Encryption zones (HDFS-6386)
• Hardware-accelerated (HADOOP-10693)

• Authorisation: HDFS encryption zone

• Governance: HDFS disk quota management
• Restrict tenants use of storage
• Prevents misuse of the shared filesystem
• HDFS supports two quota mechanisms
• Disk space quotas
• Name quotas

• Governance: HDFS disk quota management

Resource Isolation & Management
• Dividing up finite cluster resource to ensure predictable
behaviour
• Goals:
• Guarantee service levels for critical workflows
• Support fair allocation of resources between different groups of
users
• Prevent users from depriving other users access to the cluster

• Static partitioning
• Static service pools
• Statically partition resource for HBase, HDFS, Impala, Search, and
YARN
• Enforced by Linux cgroups

• Dynamic partitioning
• Dynamic resource pools
• Dynamically apportion resource [statically] allocated to Impala and
YARN
• Named pool of resource + scheduling policy
• Resource allocation based on weight
• User to pool placement policy
• ACLs
• SLOs (use of pre-emption)

• Limits concurrent queries and memory usage
• Additional queries are queued
• Configured per pool
• max_requests
• mem_limit
• max_queued
• Avoids resource oversubscription (OOM) during heavy usage
• Pool placement policy mechanism same as YARN RM
• Use with static partitioning (independently from YARN)
• Or integrate with YARN for resource management via Llama

• Classification
• User to pool placement rules
• Based on user, group, or specified tag:
MR: mapreduce.job.queuename
Impala: REQUEST_POOL

• Queues
• YARN
• Max running apps
• Max memory
• Max vcores
• Max running queries
• Max memory
• Max queue size

• Dynamic resource pools
• Scheduling policy
• Dominant Resource Fairness (DRF)
• Fair Scheduler (FAIR)
• First-in, First-out (FIFO)
• Recommendations:
• Disable undeclared pools
• Enable the default pool

Configuring a Secure, Multitenant Cluster for the Enterprise

Related slideshows

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Configuring a Secure, Multitenant Cluster for the Enterprise

Similar to Configuring a Secure, Multitenant Cluster for the Enterprise (20)

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Configuring a Secure, Multitenant Cluster for the Enterprise

Editor's Notes