SlideShare a Scribd company logo
© Hortonworks Inc. 2014
Structor – Automated Building
of Virtual Hadoop Clusters
July 2014
Page 1
Owen O’Malley
owen@hortonworks.com
@owen_omalley
© Hortonworks Inc. 2014
Page 2
•Creating a virtual Hadoop cluster is hard
–Takes time to set and configure VM
–A “learning experience” for new engineers
–Each engineer has a different setup
–Experimenting is hazardous!
•Setting up security is even harder
–Most developers don’t test with security
•Need to test Ambari and manual installs
•Need to test various operating systems
What’s the Problem?
© Hortonworks Inc. 2014
Page 3
•Create scripts to create a working
Hadoop cluster
–Secure or Non-secure
–Multiple nodes
•Vagrant
–Used for creating and managing the VMs
–VM starts as a base box with no Hadoop
•Puppet
–Used for provisioning the Hadoop packages
Solution
© Hortonworks Inc. 2014
Page 4
•We’ve put everything for a development
box into the Vagrant base box (CentOS6)
–Build tools: ant, git, java, maven, protobuf, thrift
•Downloaded once and cached
•Setup
% vagrant init omalley/centos6_x64
% vagrant up
% vagrant ssh
•Less than a minute
Simplest Case – Development Box
© Hortonworks Inc. 2014
Page 5
•Ssh in with “vagrant ssh”
–Account: vagrant, Password: vagrant
–Become root with “sudo –i”
•Clone directory to make copies
•Other useful vagrant commands:
% vagrant status – list virtual machines
% vagrant suspend – suspend virtual machines
% vagrant resume – resume virtual machines
% vagrant destroy – destroy virtual machines
Using the Box
© Hortonworks Inc. 2014
Page 6
•Commands to start cluster
% git clone git@github.com:hortonworks/structor.git
% cd structor
% vagrant up
•Default profile has 3 machines
–gw – client gateway machine
–nn – master (NameNode, ResourceMgr)
–slave1 – slaves (DataNode, NodeManager)
•HDFS, Yarn, Hive, Pig, and Zookeeper
Setting up Non-Secure Cluster
© Hortonworks Inc. 2014
Page 7
•Add hostnames to /etc/hosts
240.0.0.10 gw.example.com
240.0.0.11 nn.example.com
240.0.0.12 slave1.example.com
240.0.0.13 slave2.example.com
240.0.0.14 slave3.example.com
•HDFS – http://nn.example.com:50070/
•Yarn – http://nn.example.com:8088/
•For security
–Modify /etc/krb5.conf as in README.md.
–Use Safari or Firefox (needs config change)
Setting up your Mac
© Hortonworks Inc. 2014
Page 8
•Commands to start cluster
% ln –s profiles/3node-secure.profile current.profile
% mkdir generated (bug workaround)
% vagrant up
•Brings up 3 machines with security
–Includes a kdc and principles
•Yarn Web UI - https://nn.exaple.com:8090
•“kinit vagrant” on your Mac for Web UI
•Ssh to gw and kinit for the CLI
Setting up Secure Cluster
© Hortonworks Inc. 2014
Page 9
•JSON files that control cluster
•3 node secure cluster:
{ "domain": "example.com”, "realm": "EXAMPLE.COM",
"security": true,
"vm_mem": 2048, "server_mem": 300, "client_mem": 200,
"clients" : [ "hdfs", "yarn", "pig", "hive", "zk" ],
"nodes": [
{ "hostname": "gw", "ip": "240.0.0.10", "roles": [ "client" ] },
{ "hostname": "nn", "ip": "240.0.0.11", "roles": [ "kdc", "nn", "yarn",
"hive-meta", "hive-db”, "zk" ]},
{ "hostname": "slave1", "ip": "240.0.0.12", "roles": [ "slave" ]}]}
Profiles
© Hortonworks Inc. 2014
Page 10
•Various profiles
–1node-nonsecure
–3node-secure
–5node-nonsecure
–ambari-nonsecure
–knox-nonsecure
•Great way to setup Ambari cluster
•Project owners should add their project
–Help other developers use your project
Additional Profiles
© Hortonworks Inc. 2014
Page 11
•The master branch is Hadoop 2.4
–There is also an Hadoop 1.1 (hdp-1.3) branch
•All packages are installed via Puppet
–Uses built in OS package tools
•Repo file is in files/repos/hdp.repo
–Can override source of packages
–Easy to change to download custom builds
Choosing HDP versions
© Hortonworks Inc. 2014
Page 12
•Each configuration file is templated
•HDFS configuration is in
–modules/hdfs_client/templates/*.erb
–Changes will apply to all nodes
•We use Ruby to find NameNode:
<% @namenode = eval(@nodes).select {|node|
node[:roles].include? 'nn’} [0][:hostname] + "." + @domain; %>
<property>
<name>fs.defaultFS</name>
<value>hdfs://<%= @namenode %>:8020</value>
</property>
Configuration Files (eg. core-site.xml)
© Hortonworks Inc. 2014
Page 13
•Actual work is done via Puppet
–Hides details of each OS
•Modularized
–Top level is manifests/default.pp
–Each module is in modules/*
•Top level looks like:
include selinux
include ntp
if $security == "true" and hasrole($roles, 'kdc') {
include kerberos_kdc
}
Puppet
© Hortonworks Inc. 2014
Page 14
•Add other Hadoop ecosystem tools
–Tez
–HBase
•Add other operating systems
–Ubuntu, Suse, CentOS 5
•Support other Vagrant providers
–Amazon EC2
–Docker
•Support for other backing RDBs
Future Directions
© Hortonworks Inc. 2013
Thank You!
Questions & Answers
Page 15

More Related Content

What's hot

Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
markgrover
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
markgrover
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014
Janos Matyas
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
BlueData, Inc.
 
Ansible + Hadoop
Ansible + HadoopAnsible + Hadoop
Ansible + Hadoop
Michael Young
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorial
markgrover
 
Kafka Security
Kafka SecurityKafka Security
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
Kathleen Ting
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
nelsonadpresent
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
Steve Loughran
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
Cloudera, Inc.
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
markgrover
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
Uwe Printz
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
Hortonworks
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
Adam Muise
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Steve Loughran
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
Yifeng Jiang
 

What's hot (20)

Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache HadoopNYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
 
Ansible + Hadoop
Ansible + HadoopAnsible + Hadoop
Ansible + Hadoop
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorial
 
Kafka Security
Kafka SecurityKafka Security
Kafka Security
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 

Viewers also liked

Adding ACID Updates to Hive
Adding ACID Updates to HiveAdding ACID Updates to Hive
Adding ACID Updates to Hive
Owen O'Malley
 
Data protection2015
Data protection2015Data protection2015
Data protection2015
Owen O'Malley
 
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopProtecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache Hadoop
Owen O'Malley
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in Hadoop
Owen O'Malley
 
ORC File Introduction
ORC File IntroductionORC File Introduction
ORC File Introduction
Owen O'Malley
 
Next Generation MapReduce
Next Generation MapReduceNext Generation MapReduce
Next Generation MapReduce
Owen O'Malley
 
Bay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 IntroBay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 Intro
Owen O'Malley
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
Owen O'Malley
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
 
ORC Files
ORC FilesORC Files
ORC Files
Owen O'Malley
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
Owen O'Malley
 
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013
Owen O'Malley
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
 
ORC 2015
ORC 2015ORC 2015
ORC 2015
t3rmin4t0r
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
Julien Le Dem
 
Hive tuning
Hive tuningHive tuning
Hive tuning
Michael Zhang
 

Viewers also liked (17)

Adding ACID Updates to Hive
Adding ACID Updates to HiveAdding ACID Updates to Hive
Adding ACID Updates to Hive
 
Data protection2015
Data protection2015Data protection2015
Data protection2015
 
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopProtecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache Hadoop
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in Hadoop
 
ORC File Introduction
ORC File IntroductionORC File Introduction
ORC File Introduction
 
Next Generation MapReduce
Next Generation MapReduceNext Generation MapReduce
Next Generation MapReduce
 
Bay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 IntroBay Area HUG Feb 2011 Intro
Bay Area HUG Feb 2011 Intro
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
ORC Files
ORC FilesORC Files
ORC Files
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
ORC 2015
ORC 2015ORC 2015
ORC 2015
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 

Similar to Structor - Automated Building of Virtual Hadoop Clusters

Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
fann wu
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and docker
Fabio Fumarola
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for Windows
Terry Padgett
 
Extending Build to the Client: A Maven User's Guide to Grunt.js
Extending Build to the Client: A Maven User's Guide to Grunt.jsExtending Build to the Client: A Maven User's Guide to Grunt.js
Extending Build to the Client: A Maven User's Guide to Grunt.js
Petr Jiricka
 
VMware, SoftLayer, OpenStack, Heat, Cloud Foundry and Docker put together
VMware, SoftLayer, OpenStack, Heat, Cloud Foundry and Docker put togetherVMware, SoftLayer, OpenStack, Heat, Cloud Foundry and Docker put together
VMware, SoftLayer, OpenStack, Heat, Cloud Foundry and Docker put together
Eduardo Patrocinio
 
Core os dna_oscon
Core os dna_osconCore os dna_oscon
Core os dna_oscon
Patrick Galbraith
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
Mandakini Kumari
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
Sean Roberts
 
Deploying to Ubuntu on Linode
Deploying to Ubuntu on LinodeDeploying to Ubuntu on Linode
Deploying to Ubuntu on Linode
WO Community
 
Big Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VMBig Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VM
Jeffrey Breen
 
Core os dna_automacon
Core os dna_automaconCore os dna_automacon
Core os dna_automacon
Patrick Galbraith
 
Using puppet
Using puppetUsing puppet
Using puppet
Alex Su
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
DataWorks Summit
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Docker
nklmish
 
Exp-3.pptx
Exp-3.pptxExp-3.pptx
Exp-3.pptx
PraveenKumar581409
 
Unraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production CloudUnraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production Cloud
Salman Baset
 
Tokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker SecurityTokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker Security
Phil Estes
 
Undine: Turnkey Drupal Development Environments
Undine: Turnkey Drupal Development EnvironmentsUndine: Turnkey Drupal Development Environments
Undine: Turnkey Drupal Development Environments
David Watson
 
OpenStack Summit 2013 Hong Kong - OpenStack and Windows
OpenStack Summit 2013 Hong Kong - OpenStack and WindowsOpenStack Summit 2013 Hong Kong - OpenStack and Windows
OpenStack Summit 2013 Hong Kong - OpenStack and Windows
Alessandro Pilotti
 

Similar to Structor - Automated Building of Virtual Hadoop Clusters (20)

Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and docker
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for Windows
 
Extending Build to the Client: A Maven User's Guide to Grunt.js
Extending Build to the Client: A Maven User's Guide to Grunt.jsExtending Build to the Client: A Maven User's Guide to Grunt.js
Extending Build to the Client: A Maven User's Guide to Grunt.js
 
VMware, SoftLayer, OpenStack, Heat, Cloud Foundry and Docker put together
VMware, SoftLayer, OpenStack, Heat, Cloud Foundry and Docker put togetherVMware, SoftLayer, OpenStack, Heat, Cloud Foundry and Docker put together
VMware, SoftLayer, OpenStack, Heat, Cloud Foundry and Docker put together
 
Core os dna_oscon
Core os dna_osconCore os dna_oscon
Core os dna_oscon
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Deploying to Ubuntu on Linode
Deploying to Ubuntu on LinodeDeploying to Ubuntu on Linode
Deploying to Ubuntu on Linode
 
Big Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VMBig Data Step-by-Step: Infrastructure 1/3: Local VM
Big Data Step-by-Step: Infrastructure 1/3: Local VM
 
Core os dna_automacon
Core os dna_automaconCore os dna_automacon
Core os dna_automacon
 
Using puppet
Using puppetUsing puppet
Using puppet
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Docker
 
Exp-3.pptx
Exp-3.pptxExp-3.pptx
Exp-3.pptx
 
Unraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production CloudUnraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production Cloud
 
Tokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker SecurityTokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker Security
 
Undine: Turnkey Drupal Development Environments
Undine: Turnkey Drupal Development EnvironmentsUndine: Turnkey Drupal Development Environments
Undine: Turnkey Drupal Development Environments
 
OpenStack Summit 2013 Hong Kong - OpenStack and Windows
OpenStack Summit 2013 Hong Kong - OpenStack and WindowsOpenStack Summit 2013 Hong Kong - OpenStack and Windows
OpenStack Summit 2013 Hong Kong - OpenStack and Windows
 

More from Owen O'Malley

Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemRunning An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid Them
Owen O'Malley
 
Big Data's Journey to ACID
Big Data's Journey to ACIDBig Data's Journey to ACID
Big Data's Journey to ACID
Owen O'Malley
 
ORC Deep Dive 2020
ORC Deep Dive 2020ORC Deep Dive 2020
ORC Deep Dive 2020
Owen O'Malley
 
Protect your private data with ORC column encryption
Protect your private data with ORC column encryptionProtect your private data with ORC column encryption
Protect your private data with ORC column encryption
Owen O'Malley
 
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionFine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column Encryption
Owen O'Malley
 
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetFast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Owen O'Malley
 
Strata NYC 2018 Iceberg
Strata NYC 2018  IcebergStrata NYC 2018  Iceberg
Strata NYC 2018 Iceberg
Owen O'Malley
 
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and ParquetFast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Owen O'Malley
 
ORC Column Encryption
ORC Column EncryptionORC Column Encryption
ORC Column Encryption
Owen O'Malley
 

More from Owen O'Malley (9)

Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemRunning An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid Them
 
Big Data's Journey to ACID
Big Data's Journey to ACIDBig Data's Journey to ACID
Big Data's Journey to ACID
 
ORC Deep Dive 2020
ORC Deep Dive 2020ORC Deep Dive 2020
ORC Deep Dive 2020
 
Protect your private data with ORC column encryption
Protect your private data with ORC column encryptionProtect your private data with ORC column encryption
Protect your private data with ORC column encryption
 
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionFine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column Encryption
 
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetFast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
 
Strata NYC 2018 Iceberg
Strata NYC 2018  IcebergStrata NYC 2018  Iceberg
Strata NYC 2018 Iceberg
 
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and ParquetFast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
 
ORC Column Encryption
ORC Column EncryptionORC Column Encryption
ORC Column Encryption
 

Recently uploaded

Indian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for StartupsIndian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for Startups
AMol NAik
 
Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024
Peter Caitens
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
siddu769252
 
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan..."Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
Fwdays
 
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Alliance
 
It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
Zilliz
 
Demystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity ApplicationsDemystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity Applications
Priyanka Aash
 
What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
Stephanie Beckett
 
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
Fwdays
 
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
DianaGray10
 
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdfDefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
Yury Chemerkin
 
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
Michael Price
 
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
Low Hong Chuan
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
Bhajan Mehta
 
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Alliance
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
Priyanka Aash
 
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
Fwdays
 
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptxFIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Alliance
 
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptxFIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Alliance
 
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptxFIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Alliance
 

Recently uploaded (20)

Indian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for StartupsIndian Privacy law & Infosec for Startups
Indian Privacy law & Infosec for Startups
 
Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024Increase Quality with User Access Policies - July 2024
Increase Quality with User Access Policies - July 2024
 
Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024Generative AI Reasoning Tech Talk - July 2024
Generative AI Reasoning Tech Talk - July 2024
 
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan..."Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
 
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptxFIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
FIDO Munich Seminar: Strong Workforce Authn Push & Pull Factors.pptx
 
It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...It's your unstructured data: How to get your GenAI app to production (and spe...
It's your unstructured data: How to get your GenAI app to production (and spe...
 
Demystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity ApplicationsDemystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity Applications
 
What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024What's New in Teams Calling, Meetings, Devices June 2024
What's New in Teams Calling, Meetings, Devices June 2024
 
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx
 
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
 
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdfDefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
 
Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024Perth MuleSoft Meetup July 2024
Perth MuleSoft Meetup July 2024
 
AMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech DayAMD Zen 5 Architecture Deep Dive from Tech Day
AMD Zen 5 Architecture Deep Dive from Tech Day
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
 
FIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptxFIDO Munich Seminar Workforce Authentication Case Study.pptx
FIDO Munich Seminar Workforce Authentication Case Study.pptx
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
 
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
 
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptxFIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
 
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptxFIDO Munich Seminar In-Vehicle Payment Trends.pptx
FIDO Munich Seminar In-Vehicle Payment Trends.pptx
 
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptxFIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptx
 

Structor - Automated Building of Virtual Hadoop Clusters

  • 1. © Hortonworks Inc. 2014 Structor – Automated Building of Virtual Hadoop Clusters July 2014 Page 1 Owen O’Malley owen@hortonworks.com @owen_omalley
  • 2. © Hortonworks Inc. 2014 Page 2 •Creating a virtual Hadoop cluster is hard –Takes time to set and configure VM –A “learning experience” for new engineers –Each engineer has a different setup –Experimenting is hazardous! •Setting up security is even harder –Most developers don’t test with security •Need to test Ambari and manual installs •Need to test various operating systems What’s the Problem?
  • 3. © Hortonworks Inc. 2014 Page 3 •Create scripts to create a working Hadoop cluster –Secure or Non-secure –Multiple nodes •Vagrant –Used for creating and managing the VMs –VM starts as a base box with no Hadoop •Puppet –Used for provisioning the Hadoop packages Solution
  • 4. © Hortonworks Inc. 2014 Page 4 •We’ve put everything for a development box into the Vagrant base box (CentOS6) –Build tools: ant, git, java, maven, protobuf, thrift •Downloaded once and cached •Setup % vagrant init omalley/centos6_x64 % vagrant up % vagrant ssh •Less than a minute Simplest Case – Development Box
  • 5. © Hortonworks Inc. 2014 Page 5 •Ssh in with “vagrant ssh” –Account: vagrant, Password: vagrant –Become root with “sudo –i” •Clone directory to make copies •Other useful vagrant commands: % vagrant status – list virtual machines % vagrant suspend – suspend virtual machines % vagrant resume – resume virtual machines % vagrant destroy – destroy virtual machines Using the Box
  • 6. © Hortonworks Inc. 2014 Page 6 •Commands to start cluster % git clone git@github.com:hortonworks/structor.git % cd structor % vagrant up •Default profile has 3 machines –gw – client gateway machine –nn – master (NameNode, ResourceMgr) –slave1 – slaves (DataNode, NodeManager) •HDFS, Yarn, Hive, Pig, and Zookeeper Setting up Non-Secure Cluster
  • 7. © Hortonworks Inc. 2014 Page 7 •Add hostnames to /etc/hosts 240.0.0.10 gw.example.com 240.0.0.11 nn.example.com 240.0.0.12 slave1.example.com 240.0.0.13 slave2.example.com 240.0.0.14 slave3.example.com •HDFS – http://nn.example.com:50070/ •Yarn – http://nn.example.com:8088/ •For security –Modify /etc/krb5.conf as in README.md. –Use Safari or Firefox (needs config change) Setting up your Mac
  • 8. © Hortonworks Inc. 2014 Page 8 •Commands to start cluster % ln –s profiles/3node-secure.profile current.profile % mkdir generated (bug workaround) % vagrant up •Brings up 3 machines with security –Includes a kdc and principles •Yarn Web UI - https://nn.exaple.com:8090 •“kinit vagrant” on your Mac for Web UI •Ssh to gw and kinit for the CLI Setting up Secure Cluster
  • 9. © Hortonworks Inc. 2014 Page 9 •JSON files that control cluster •3 node secure cluster: { "domain": "example.com”, "realm": "EXAMPLE.COM", "security": true, "vm_mem": 2048, "server_mem": 300, "client_mem": 200, "clients" : [ "hdfs", "yarn", "pig", "hive", "zk" ], "nodes": [ { "hostname": "gw", "ip": "240.0.0.10", "roles": [ "client" ] }, { "hostname": "nn", "ip": "240.0.0.11", "roles": [ "kdc", "nn", "yarn", "hive-meta", "hive-db”, "zk" ]}, { "hostname": "slave1", "ip": "240.0.0.12", "roles": [ "slave" ]}]} Profiles
  • 10. © Hortonworks Inc. 2014 Page 10 •Various profiles –1node-nonsecure –3node-secure –5node-nonsecure –ambari-nonsecure –knox-nonsecure •Great way to setup Ambari cluster •Project owners should add their project –Help other developers use your project Additional Profiles
  • 11. © Hortonworks Inc. 2014 Page 11 •The master branch is Hadoop 2.4 –There is also an Hadoop 1.1 (hdp-1.3) branch •All packages are installed via Puppet –Uses built in OS package tools •Repo file is in files/repos/hdp.repo –Can override source of packages –Easy to change to download custom builds Choosing HDP versions
  • 12. © Hortonworks Inc. 2014 Page 12 •Each configuration file is templated •HDFS configuration is in –modules/hdfs_client/templates/*.erb –Changes will apply to all nodes •We use Ruby to find NameNode: <% @namenode = eval(@nodes).select {|node| node[:roles].include? 'nn’} [0][:hostname] + "." + @domain; %> <property> <name>fs.defaultFS</name> <value>hdfs://<%= @namenode %>:8020</value> </property> Configuration Files (eg. core-site.xml)
  • 13. © Hortonworks Inc. 2014 Page 13 •Actual work is done via Puppet –Hides details of each OS •Modularized –Top level is manifests/default.pp –Each module is in modules/* •Top level looks like: include selinux include ntp if $security == "true" and hasrole($roles, 'kdc') { include kerberos_kdc } Puppet
  • 14. © Hortonworks Inc. 2014 Page 14 •Add other Hadoop ecosystem tools –Tez –HBase •Add other operating systems –Ubuntu, Suse, CentOS 5 •Support other Vagrant providers –Amazon EC2 –Docker •Support for other backing RDBs Future Directions
  • 15. © Hortonworks Inc. 2013 Thank You! Questions & Answers Page 15