SlideShare a Scribd company logo
info@rittmanmead.com www.rittmanmead.com @rittmanmead
SQL and Data Integration Futures on Hadoop :
A look into the future of analytics on Oracle BDA
Mark Rittman, CTO, Rittman Mead
Enkitec E4, Barcelona, April 2016
info@rittmanmead.com www.rittmanmead.com @rittmanmead 2
•Mark Rittman, Co-Founder of Rittman Mead

‣Oracle ACE Director, specialising in Oracle BI&DW

‣14 Years Experience with Oracle Technology

‣Regular columnist for Oracle Magazine

•Author of two Oracle Press Oracle BI books

‣Oracle Business Intelligence Developers Guide

‣Oracle Exalytics Revealed

‣Writer for Rittman Mead Blog :

http://www.rittmanmead.com/blog

•Email : mark.rittman@rittmanmead.com

•Twitter : @markrittman
About the Speaker
info@rittmanmead.com www.rittmanmead.com @rittmanmead 3
•Started back in 1997 on a bank Oracle DW project

•Our tools were Oracle 7.3.4, SQL*Plus, PL/SQL 

and shell scripts

•Went on to use Oracle Developer/2000 and Designer/2000

•Our initial users queried the DW using SQL*Plus

•And later on, we rolled-out Discoverer/2000 to everyone else

•And life was fun…
15+ Years in Oracle BI and Data Warehousing
info@rittmanmead.com www.rittmanmead.com @rittmanmead 4
•Over time, this data warehouse architecture developed

•Added Oracle Warehouse Builder to 

automate and model the DW build

•Oracle 9i Application Server (yay!) 

to deliver reports and web portals

•Data Mining and OLAP in the database

•Oracle 9i for in-database ETL (and RAC)

•Data was typically loaded from 

Oracle RBDMS and EBS

•It was turtles Oracle all the way down…
The Oracle-Centric DW Architecture

Recommended for you

Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case StudyOracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study

Presentation from BIWA Summit 2016 on using Oracle Big Data Graph and Spatial to analyse the social network around rittmanmead.com

socialgraph analysishadoop
Deploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle CloudDeploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle Cloud

As presented at UKOUG Tech15 - covers deploying full OBIEE systems including DW and ETL to Oracle Public Cloud

Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...

Mark Rittman presented at Big Data World in London in March 2017 on data integration and data warehousing for cloud, big data, and IoT. He discussed the history of data warehousing and how it has evolved from traditional RDBMS implementations to embrace big data technologies like Hadoop. He described how cloud data warehouse offerings from Google BigQuery and Amazon Redshift combine the scalability of big data with the structure of data warehousing. Rittman also covered new approaches to ETL using data pipelines, schema discovery using machine learning, emerging open-source BI tools, and his current work in these areas.

bigquerybig datadata warehousing
info@rittmanmead.com www.rittmanmead.com @rittmanmead
And then came "Big Data"...
info@rittmanmead.com www.rittmanmead.com @rittmanmead 6
•Many customers and organisations are now running initiatives around “big data”

•Some are IT-led and are looking for cost-savings around data warehouse storage + ETL

•Others are “skunkworks” projects in the marketing department that are now scaling-up

•Projects now emerging from pilot exercises

•And design patterns starting to emerge
Many Organisations are Running Big Data Initiatives
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Highly Scalable (and Affordable) Cluster Computing
•Enterprise High-End RDBMSs such as Oracle can scale into the petabytes, using clustering

‣Sharded databases (e.g. Netezza) can scale further but with complexity / single workload trade-offs

•Hadoop was designed from outside for massive horizontal scalability - using cheap hardware

•Anticipates hardware failure and makes multiple copies of data as protection

•More nodes you add, more stable it becomes

•And at a fraction of the cost of traditional

RDBMS platforms
info@rittmanmead.com www.rittmanmead.com @rittmanmead
•Store and analyze huge volumes of structured and unstructured data 

•In the past, we had to throw away the detail

•No need to define a data model during ingest

•Supports multiple, flexible schemas

•Separation of storage from compute engine

•Allows multiple query engines and frameworks

to work on the same raw datasets
Store Everything Forever - And Process in Many Ways
Hadoop	Data	Lake
Webserver

Log	Files	(txt)
Social	Media

Logs	(JSON)
DB	Archives

(CSV)
Sensor	Data

(XML)
`Spatial	&	Graph

(XML,	txt)
IoT	Logs

(JSON,	txt)
Chat	Transcripts

(Txt)
DB	Transactions

(CSV,	XML)
Blogs,	Articles

(TXT,	HTML)
Raw	Data Processed	Data
NoSQL	Key-Value

Store	DB Tabular	Data

(Hive	Tables)
Aggregates

(Impala	Tables) NoSQL	Document	

Store	DB

Recommended for you

Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story

The document discusses a company's migration from their in-house computation engine to Apache Spark. It describes five key issues encountered during the migration process: 1) difficulty adapting to Spark's low-level RDD API, 2) limitations of DataSource predicates, 3) incomplete Spark SQL functionality, 4) performance issues with round trips between Spark and other systems, and 5) OutOfMemory errors due to large result sizes. Lessons learned include being aware of new Spark features and data formats, and designing architectures and data structures to minimize data movement between systems.

parquetsparkdistributed processing
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...

Big Data integration is an excellent feature in the Oracle Data Integration product suite (Oracle Data Integrator, GoldenGate, & Enterprise Data Quality). But not all analytics require big data technologies, such as labor cost, revenue, or expense reporting. Ralph Kimball, an original architect of the dimensional model in data warehousing, spent much of his career working to build an enterprise data warehouse methodology that can meet these reporting needs. His book, "The Data Warehouse ETL Toolkit", is a guide for many ETL developers. This session will walk you through his ETL Subsystem categories; Extracting, Cleaning & Conforming, Delivering, and Managing, describing how the Oracle Data Integration products are perfectly suited for the Kimball approach. Presented at Collaborate16 in Las Vegas.

goldengateoracle data integratoredq
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS

Build a simple data lake on AWS using a combination of services, including AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, Amazon Relational Database Service (Amazon RDS), and Amazon S3. Link to the blog post and video: https://garystafford.medium.com/building-a-simple-data-lake-on-aws-df21ca092e32

awsanalyticsdata lake
info@rittmanmead.com www.rittmanmead.com @rittmanmead 9
•Typical implementation of Hadoop and big data in an analytic context is the “data lake”

•Additional data storage platform with cheap storage, flexible schema support + compute

•Data lands in the data lake or reservoir in raw form, then minimally processed

•Data then accessed directly by “data scientists”, or processed further into DW
In the Context of BI & Analytics : The Data Reservoir
Data	Transfer Data	Access
Data	Factory
Data	Reservoir
Business	
Intelligence	Tools
Hadoop	Platform
File	Based	
Integration
Stream	
Based	
Integration
Data	streams
Discovery	&	Development	Labs
Safe	&	secure	Discovery	and	Development	
environment
Data	sets	and	
samples
Models	and	
programs
Marketing	/
Sales	Applications
Models
Machine
Learning
Segments
Operational	Data
Transactions
Customer
Master	ata
Unstructured	Data
Voice	+	Chat	
Transcripts
ETL	Based
Integration
Raw	
Customer	Data
Data	stored	in	
the	original	
format	(usually	
files)		such	as	
SS7,	ASN.1,	
JSON	etc.
Mapped	
Customer	Data
Data	sets	
produced	by	
mapping	and	
transforming	
raw	data
info@rittmanmead.com www.rittmanmead.com @rittmanmead 10
•Oracle Engineered system for big data processing and analysis

•Start with Oracle Big Data Appliance Starter Rack - expand up to 18 nodes per rack

•Cluster racks together for horizontal scale-out using enterprise-quality infrastructure
Oracle	Big	Data	Appliance	
Starter	Rack	+	Expansion	
• Cloudera	CDH	+	Oracle	software		
• 18	High-spec	Hadoop	Nodes	with	
InfiniBand	switches	for	internal	Hadoop	
traffic,	optimised	for	network	throughput	
• 1	Cisco	Management	Switch	
• Single	place	for	support	for	H/W	+	S/W

Deployed on Oracle Big Data Appliance
Oracle	Big	Data	Appliance	
Starter	Rack	+	Expansion	
• Cloudera	CDH	+	Oracle	software		
• 18	High-spec	Hadoop	Nodes	with	
InfiniBand	switches	for	internal	Hadoop	
traffic,	optimised	for	network	throughput	
• 1	Cisco	Management	Switch	
• Single	place	for	support	for	H/W	+	S/W

Enriched	

Customer	Profile
Modeling
Scoring
Infiniband
info@rittmanmead.com www.rittmanmead.com @rittmanmead
•Programming model for processing large data sets in parallel on a cluster

•Not specific to a particular language, but usually written in Java

•Inspired by the map and reduce functions commonly used in functional programming

‣Map() performs filtering and sorting

‣Reduce() aggregates the output of mappers

‣and a Shuffle() step to redistribute output by keys

•Resolved several complications of distributed computing:

‣Allows unlimited computations on unlimited data

‣Map and reduce functions can be easily distributed

‣Originated at Google; Hadoop was Yahoo’s open-source

implementation of MapReduce, + two are synonymous
MapReduce - The Original Big Data Query Framework
Mapper
Filter, Project
Mapper
Filter, Project
Mapper
Filter, Project
Reducer
Aggregate
Reducer
Aggregate
Output

One HDFS file per reducer,

in a directory
info@rittmanmead.com www.rittmanmead.com @rittmanmead 12
•Original developed at Facebook, now foundational within the Hadoop project

•Allows users to query Hadoop data using SQL-like language

•Tabular metadata layer that overlays files, can interpret semi-structured data (e.g. JSON)

•Generates MapReduce code to return required data

•Extensible through SerDes and Storage Handlers

•JDBC and ODBC drivers for most platforms/tools

•Perfect for set-based access + batch ETL work
Apache Hive : SQL Metadata + Engine over Hadoop

Recommended for you

Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building Blocks

The document discusses building data lakes on AWS. It describes how data lakes extend the traditional data warehouse approach by allowing storage of both structured and unstructured data at massive scales. Amazon S3 provides durable, available, scalable, and easy-to-use storage for the data lake. AWS Glue crawls data to create a data catalog and can automate ETL processes. Amazon Athena and Amazon EMR enable interactive analysis and big data processing through SQL and Spark. The data lake architecture on AWS supports a variety of analytical use cases.

awscloudcloudcomputing
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse

Doug Bateman, a principal data engineering instructor at Databricks, presented on how to build a Lakehouse architecture. He began by introducing himself and his background. He then discussed the goals of describing key Lakehouse features, explaining how Delta Lake enables it, and developing a sample Lakehouse using Databricks. The key aspects of a Lakehouse are that it supports diverse data types and workloads while enabling using BI tools directly on source data. Delta Lake provides reliability, consistency, and performance through its ACID transactions, automatic file consolidation, and integration with Spark. Bateman concluded with a demo of creating a Lakehouse.

lakehousefree trainingdatabricks
KnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseKnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge base

knowIT is a collaborative semantic wiki used by Johnson & Johnson to map their IT systems, applications, servers and stakeholders. It aims to capture knowledge about these informatics systems, their relationships and components to answer questions, facilitate knowledge sharing and enable self-service. The wiki uses Semantic MediaWiki and has grown to include systems portfolio management, configuration management and other features to increase IT systems knowledge across the organization.

knowledge sharingsemantic mediawiki
info@rittmanmead.com www.rittmanmead.com @rittmanmead
•Data integration tools such as Oracle Data Integrator can load and process Hadoop data

•BI tools such as Oracle Business Intelligence 12c can report on Hadoop data

•Generally use MapReduce and Hive to access data

‣ODBC and JDBC access to Hive tabular data

‣Allows Hadoop unstructured/semi-structured

data on HDFS to be accessed like RDBMS
Hive Provides a SQL Interface for BI + ETL Tools
Access direct Hive or extract using ODI12c
for structured OBIEE dashboard analysis
What pages are people visiting?
Who is referring to us on Twitter?
What content has the most reach?
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
?
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :

Recommended for you

Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House

Data Con LA 2020 Description In this session, I introduce the Amazon Redshift lake house architecture which enables you to query data across your data warehouse, data lake, and operational databases to gain faster and deeper insights. With a lake house architecture, you can store data in open file formats in your Amazon S3 data lake. Speaker Antje Barth, Amazon Web Services, Sr. Developer Advocate, AI and Machine Learning

data con ladata con la 2020dcla
Does it only have to be ML + AI?
Does it only have to be ML + AI?Does it only have to be ML + AI?
Does it only have to be ML + AI?

The document discusses machine learning and artificial intelligence applications inside and outside of Snowflake's cloud data warehouse. It provides an overview of Snowflake and its architecture. It then discusses how machine learning can be implemented directly in the database using SQL, user-defined functions, and stored procedures. However, it notes that pure coding is not suitable for all users and that automated machine learning outside the database may be preferable to enable more business analysts and power users. It provides an example of using Amazon Forecast for time series forecasting and integrating it with Snowflake.

snowflakeclouddata warehouse
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions

INFORMATICA ONLINE TRAINING BY QUONTRA SOLUTIONS WITH PLACEMENT ASSISTANCE We offer online IT training with placements, project assistance in different platforms with real time industry consultants to provide quality training for all it professionals, corporate clients and students etc. Special features by Quontra Solutions are Extensive Training will be in both Informatica Online Training and Placement. We help you in resume preparation and conducting Mock Interviews. Emphasis is given on important topics which are essential and mostly used in real time projects. Quontra Solutions is an Online Training Leader when it comes to high-end effective and efficient I.T Training. We have always been and still are focusing on the key aspects which are providing utmost effective and competent training to both students and professionals who are eager to enrich their technical skills. Training Features at Quontra Solutions: We believe that online training has to be measured by three major aspects viz., Quality, Content and Relationship with the Trainer and Student. Not only our online training classes are important but apart from that the material which we provide are in tune with the latest IT training standards, so a student has not to worry at all whether the training imparted is outdated or latest. Course content: • Basics of data warehousing concepts • Power center components • Informatica concepts and overview • Sources • Targets • Transformations • Advanced Informatica concepts Please Visit us for the Demo Classes, we have regular batches and weekend batches. QUONTRASOLUTIONS 204-226 Imperial Drive,Rayners Lane, Harrow-HA2 7HH Phone : +44 (0)20 3734 1498 / 99 Email: info@quontrasolutions.co.uk

online trainingdatawarehouse introductiontraining classes
MapReduce is for losers
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Hive is Slow
Hive is slow

Recommended for you

How To Leverage OBIEE Within A Big Data Architecture
How To Leverage OBIEE Within A Big Data ArchitectureHow To Leverage OBIEE Within A Big Data Architecture
How To Leverage OBIEE Within A Big Data Architecture

If you've invested in OBIEE and want to start exploring the use of Big Data technology, this presentation talks about how and why you might want to use OBIEE as the common visualization layer across both.

big dataoracleobiee
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution

This session will cover building the modern Data Warehouse by migration from the traditional DW platform into the cloud, using Amazon Redshift and Cloud ETL Matillion in order to provide Self-Service BI for the business audience. This topic will cover the technical migration path of DW with PL/SQL ETL to the Amazon Redshift via Matillion ETL, with a detailed comparison of modern ETL tools. Moreover, this talk will be focusing on working backward through the process, i.e. starting from the business audience and their needs that drive changes in the old DW. Finally, this talk will cover the idea of self-service BI, and the author will share a step-by-step plan for building an efficient self-service environment using modern BI platform Tableau.

cloud analyticsawsmatillion
Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014

The document discusses Talend's big data solutions and sandbox. It introduces Rajan Kanitkar as a senior solutions engineer at Talend with 15 years of experience in data integration. It then summarizes Talend's big data platform and ecosystem including Hadoop, MapReduce, HDFS, Hive and more. The rest of the document describes Talend's sandbox, which provides a pre-configured virtual image with Hadoop distributions, Talend software, and data scenarios to demonstrate ingesting, transforming and delivering big data.

very slow
too slow

for ad-hoc querying
But the future is fast
And pretty damn exciting

Recommended for you

Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution

As a follow-on to the presentation "Building an Effective Data Warehouse Architecture", this presentation will explain exactly what Big Data is and its benefits, including use cases. We will discuss how Hadoop, the cloud and massively parallel processing (MPP) is changing the way data warehouses are being built. We will talk about hybrid architectures that combine on-premise data with data in the cloud as well as relational data and non-relational (unstructured) data. We will look at the benefits of MPP over SMP and how to integrate data from Internet of Things (IoT) devices. You will learn what a modern data warehouse should look like and how the role of a Data Lake and Hadoop fit in. In the end you will have guidance on the best solution for your data warehouse going forward.

data warehousebig data
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"

AI&BigData Conference 2017 Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"

ai&bigdataconfai&bigdataconf 2017ai&bigdataconf 2017a
A New Day for Oracle Analytics
A New Day for Oracle AnalyticsA New Day for Oracle Analytics
A New Day for Oracle Analytics

The document discusses Oracle's new approach to business analytics and visualization. It notes that traditional corporate BI systems are viewed as inflexible and analytics are only for a privileged few. However, it argues there is still hope as analytics can provide a 10x ROI. The new approach involves visual analytics embedded in every Oracle solution across mobile, cloud, on-premises and big data to provide a single, integrated platform that allows business users to easily access, blend and scale insights from various data sources.

#analytics #visualization #oracle
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
info@rittmanmead.com www.rittmanmead.com @rittmanmead 26
Hadoop 2.0 Processing Frameworks + Tools
info@rittmanmead.com www.rittmanmead.com @rittmanmead 27
•MapReduce’s great innovation was to break processing down into distributed jobs

•Jobs that have no functional dependency on each other, only upstream tasks

•Provides a framework that is infinitely scalable and very fault tolerant

•Hadoop handled job scheduling and resource management

‣All MapReduce code had to do was provide the “map” and “reduce” functions

‣Automatic distributed processing

‣Slow but extremely powerful
Hadoop 1.0 and MapReduce
info@rittmanmead.com www.rittmanmead.com @rittmanmead 28
•A typical Hive or Pig script compiles down into multiple MapReduce jobs

•Each job stages its intermediate results to disk

•Safe, but slow - write to disk, spin-up separate JVMs for each job
MapReduce - Scales By Writing Intermediate Results to Disk
SELECT
LOWER(hashtags.text),
COUNT(*) AS total_count
FROM (
SELECT * FROM tweets WHERE regexp_extract(created_at,"(2015)*",1) = "2015"
) tweets
LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags
GROUP BY LOWER(hashtags.text)
ORDER BY total_count DESC
LIMIT 15
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.34 sec HDFS Read: 10952994 HDFS Write: 5239 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 2.1 sec HDFS Read: 9983 HDFS Write: 164 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 440 msec
OK
1
2

Recommended for you

Why a yorkie is right for me
Why a yorkie is right for meWhy a yorkie is right for me
Why a yorkie is right for me

The Yorkshire Terrier, commonly known as the Yorkie, is descended from the Clydesdale Terrier and Paisley Terrier, which are now extinct. It also has ancestry from the Skye Terrier and Airdale Terrier. Yorkies are small dogs with black fur and a brown underside. They are intelligent but can be snappy. This document provides a list of supplies needed to care for a Yorkie and background on the breed. It encourages adopting from rescues, sharing details about a Yorkie named Magellan available for adoption.

Ppt ta deal
Ppt ta dealPpt ta deal
Ppt ta deal

TUGAS AKHIR PPT

info@rittmanmead.com www.rittmanmead.com @rittmanmead 29
•MapReduce 2 (MR2) splits the functionality of the JobTracker

by separating resource management and job scheduling/monitoring

•Introduces YARN (Yet Another Resource Manager)

•Permits other processing frameworks to MR

‣For example, Apache Spark

•Maintains backwards compatibility with MR1

•Introduced with CDH5+
MapReduce 2 and YARN
Node

Manager
Node

Manager
Node

Manager
Resource

Manager
Client
Client
info@rittmanmead.com www.rittmanmead.com @rittmanmead 30
•Runs on top of YARN, provides a faster execution engine than MapReduce for Hive, Pig etc

•Models processing as an entire data flow graph (DAG), rather than separate job steps

‣DAG (Directed Acyclic Graph) is a new programming style for distributed systems

‣Dataflow steps pass data between them as streams, rather than writing/reading from disk

•Supports in-memory computation, enables Hive on Tez (Stinger) and Pig on Tez

•Favoured In-memory / Hive v2 

route by Hortonworks
Apache Tez
InputData
TEZ DAG
Map()
Map()
Map()
Reduce()
OutputData
Reduce()
Reduce()
Reduce()
InputData
Map()
Map()
Reduce()
Reduce()
info@rittmanmead.com www.rittmanmead.com @rittmanmead 31
Tez Advantage - Drop-In Replacement for MR with Hive, Pig
set hive.execution.engine=mr
set hive.execution.engine=tez
4m 17s
2m 25s
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :

Recommended for you

Aws seminar-tokyo ken-final-publish
Aws seminar-tokyo ken-final-publishAws seminar-tokyo ken-final-publish
Aws seminar-tokyo ken-final-publish
Excelentes dibujos de julian beever
Excelentes dibujos de julian beeverExcelentes dibujos de julian beever
Excelentes dibujos de julian beever

El documento habla sobre Julian Beever, un artista que ha estado creando ilusiones ópticas en aceras de Europa, Estados Unidos y Australia durante los últimos 10 años. Sus creaciones se ven tan realistas que parecen tridimensionales aunque en realidad son dibujos en el pavimento, lo que sorprende a quienes las ven.

lockheed martin 2006 Annual Report
lockheed martin 2006 Annual Reportlockheed martin 2006 Annual Report
lockheed martin 2006 Annual Report
financeearningsbusiness
info@rittmanmead.com www.rittmanmead.com @rittmanmead 33
•Cloudera’s answer to Hive query response time issues

•MPP SQL query engine running on Hadoop, bypasses MapReduce for
direct data access

•Mostly in-memory, but spills to disk if required

•Uses Hive metastore to access Hive table metadata

•Similar SQL dialect to Hive - not as rich though and no support for Hive
SerDes, storage handlers etc
Cloudera Impala - Fast, MPP-style Access to Hadoop Data
info@rittmanmead.com www.rittmanmead.com @rittmanmead 34
•A replacement for Hive, but uses Hive concepts and

data dictionary (metastore)

•MPP (Massively Parallel Processing) query engine

that runs within Hadoop

‣Uses same file formats, security,

resource management as Hadoop

•Processes queries in-memory

•Accesses standard HDFS file data

•Option to use Apache AVRO, RCFile,

LZO or Parquet (column-store)

•Designed for interactive, real-time

SQL-like access to Hadoop
How Impala Works
Impala
Hadoop
HDFS etc
BI Server
Presentation Svr
Cloudera Impala

ODBC Driver
Impala
Hadoop
HDFS etc
Impala
Hadoop
HDFS etc
Impala
Hadoop
HDFS etc
Impala
Hadoop
HDFS etc
Multi-Node

Hadoop Cluster
info@rittmanmead.com www.rittmanmead.com @rittmanmead 35
•Log into Impala Shell, run INVALIDATE METADATA command to refresh Impala table list

•Run SHOW TABLES Impala SQL command to view tables available

•Run COUNT(*) on main ACCESS_PER_POST table to see typical response time
Enabling Hive Tables for Impala
[oracle@bigdatalite ~]$ impala-shell
Starting Impala Shell without Kerberos authentication
[bigdatalite.localdomain:21000] > invalidate metadata;
Query: invalidate metadata
Fetched 0 row(s) in 2.18s
[bigdatalite.localdomain:21000] > show tables;
Query: show tables
+-----------------------------------+
| name |
+-----------------------------------+
| access_per_post |
| access_per_post_cat_author |
| … |
| posts |
|——————————————————————————————————-+
Fetched 45 row(s) in 0.15s
[bigdatalite.localdomain:21000] > select count(*) 

from access_per_post;
Query: select count(*) from access_per_post
+----------+
| count(*) |
+----------+
| 343 |
+----------+
Fetched 1 row(s) in 2.76s
info@rittmanmead.com www.rittmanmead.com @rittmanmead 36
•Significant improvement over Hive response time

•Now makes Hadoop suitable for ad-hoc querying
Significantly-Improved Ad-Hoc Query Response Time vs Hive
|
Logical Query Summary Stats: Elapsed time 2, Response time 1, Compilation time 0 (seconds)
Logical Query Summary Stats: Elapsed time 50, Response time 49, Compilation time 0 (seconds)
Simple Two-Table Join against Hive Data Only
Simple Two-Table Join against Impala Data Only
vs

Recommended for you

Thuyết trình nhóm 6
Thuyết trình nhóm 6Thuyết trình nhóm 6
Thuyết trình nhóm 6

The document discusses important conditions for starting new businesses and defines economic terms. The three most important conditions are low taxes, skilled staff, and low interest rates. It also provides definitions for various economic terms like interest rate, exchange rate, inflation rate, labor force, tax incentives, and balance of trade.

Infographics_book_final
Infographics_book_finalInfographics_book_final
Infographics_book_final

This infographic document provides information on what infographics are and why they are effective communication tools. It discusses that infographics tell visual stories using images and graphics to engage audiences better than plain text. Effective infographics are simple, visually pleasing, and help explain complex topics. They improve comprehension and retention of information by leveraging human visual processing abilities.

Ashutosh rubber-pvt-ltd
Ashutosh rubber-pvt-ltdAshutosh rubber-pvt-ltd
Ashutosh rubber-pvt-ltd

Ashutosh Rubber Pvt Ltd is an Indian manufacturer and exporter of rubber products established in 2007. It produces over 9000 rubber products for industries like automotive, generators, and oil. The company exports to countries including Australia, Singapore, and Denmark. It has 26-50 employees and certifications for quality standards.

info@rittmanmead.com www.rittmanmead.com @rittmanmead 37
•Beginners usually store data in HDFS using text file formats (CSV) but these have limitations

•Apache AVRO often used for general-purpose processing

‣Splitability, schema evolution, in-built metadata, support for block compression

•Parquet now commonly used with Impala due to column-orientated storage

‣Mirrors work in RDBMS world around column-store

‣Only return (project) the columns you require across a wide table
Apache Parquet - Column-Orientated Storage for Analytics
info@rittmanmead.com www.rittmanmead.com @rittmanmead 38
•But Parquet (and HDFS) have significant limitation for real-time analytics applications

‣Append-only orientation, focus on column-store 

makes streaming ingestion harder

•Cloudera Kudu aims to combine best of HDFS + HBase

‣Real-time analytics-optimised 

‣Supports updates to data

‣Fast ingestion of data

‣Accessed using SQL-style tables

and get/put/update/delete API
Cloudera Kudu - Combining Best of HBase and Column-Store
info@rittmanmead.com www.rittmanmead.com @rittmanmead 39
•Part of Oracle Big Data 4.0 (BDA-only)

‣Also requires Oracle Database 12c, Oracle Exadata Database Machine

•Extends Oracle Data Dictionary to cover Hive

•Extends Oracle SQL and SmartScan to Hadoop

•Extends Oracle Security Model over Hadoop

‣Fine-grained access control

‣Data redaction, data masking

‣Uses fast c-based readers where possible

(vs. Hive MapReduce generation)

‣Map Hadoop parallelism to Oracle PQ

‣Big Data SQL engine works on top of YARN

‣Like Spark, Tez, MR2
Oracle Big Data SQL
Exadata

Storage Servers
Hadoop

Cluster
Exadata Database

Server
Oracle Big

Data SQL
SQL Queries
SmartScan SmartScan
hold on…

Recommended for you

Building A City
Building A CityBuilding A City
Building A City

This document discusses building a city as a hands-on project to develop fine motor, visual motor, problem solving, imagination, and creativity skills in children. The project uses construction materials to build a cityscape that allows children to use their hands to physically construct buildings and structures while also using their visual skills to envision what they are creating, their problem solving to determine how pieces fit together, and their imagination to design the city layout.

Power point hbsc3103 faudzi
Power point hbsc3103 faudziPower point hbsc3103 faudzi
Power point hbsc3103 faudzi

Dokumen tersebut membahas mengenai kejuruteraan genetik yang merupakan teknik modifikasi genetika untuk meningkatkan kualitas tanaman dan hewan. Teknologi ini telah banyak digunakan dalam industri pertanian, peternakan, dan kedokteran untuk menghasilkan produk yang lebih baik dan bermanfaat bagi masyarakat. Contoh penerapannya adalah tanaman yang tahan penyakit dan cuaca ekstrem, hewan ternak

Tm31
Tm31Tm31
Tm31

Dokumen ini membahas tentang sumberdaya alam, sumberdaya manusia, dan klasifikasi sumberdaya alam. Sumberdaya alam dibedakan menjadi sumberdaya terbarukan dan tidak terbarukan, sedangkan sumberdaya manusia diukur berdasarkan jumlah penduduk yang bekerja di sektor pertanian.

Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
“where we’re going

we don’t need roads”
“where we’re going

we don’t need roads”
info@rittmanmead.com www.rittmanmead.com @rittmanmead
•Apache Drill is another SQL-on-Hadoop project that focus on schema-free data discovery

•Inspired by Google Dremel, innovation is querying raw data with schema optional

•Automatically infers and detects schema from semi-structured datasets and NoSQL DBs

•Join across different silos of data e.g. JSON records, Hive tables and HBase database

•Aimed at different use-cases than Hive - 

low-latency queries, discovery 

(think Endeca vs OBIEE)
Introducing Apache Drill - “We Don’t Need No Roads”

Recommended for you

React Lightning Design System
React Lightning Design SystemReact Lightning Design System
React Lightning Design System

Salesforce もくもく会 #10

#salesforcedevjp
1 a pengertian-dasar-statistika
1 a pengertian-dasar-statistika1 a pengertian-dasar-statistika
1 a pengertian-dasar-statistika
statistika
LATIHAN BAB 7
LATIHAN BAB 7LATIHAN BAB 7
LATIHAN BAB 7

Teks tersebut berisi soalan-soalan latihan semantik dan peribahasa. Teks tersebut memberikan contoh-contoh kata dan ungkapan yang memiliki makna ganda atau kiasan, serta meminta pembaca untuk mengidentifikasi makna sebenarnya atau ungkapan yang tepat. Teks tersebut juga memberikan jawaban untuk soalan-soalan tersebut.

info@rittmanmead.com www.rittmanmead.com @rittmanmead
•Most modern datasource formats embed their schema in the data (“schema-on-read”)

•Apache Drill makes these as easy to join to traditional datasets as “point me at the data”

•Cuts out unnecessary work in defining Hive schemas for data that’s self-describing

•Supports joining across files,

databases, NoSQL etc
Self-Describing Data - Parquet, AVRO, JSON etc
info@rittmanmead.com www.rittmanmead.com @rittmanmead
•Files can exist either on the local filesystem, or on HDFS

•Connection to directory or file defined in storage configuration

•Can work with CSV, TXT, TSV etc

•First row of file can provide schema (column names)
Apache Drill and Text Files
SELECT * FROM dfs.`/tmp/csv_with_header.csv2`;
+-------+------+------+------+
| name | num1 | num2 | num3 |
+-------+------+------+------+
| hello | 1 | 2 | 3 |
| hello | 1 | 2 | 3 |
| hello | 1 | 2 | 3 |
| hello | 1 | 2 | 3 |
| hello | 1 | 2 | 3 |
| hello | 1 | 2 | 3 |
| hello | 1 | 2 | 3 |
+-------+------+------+------+
7 rows selected (0.12 seconds)
SELECT * FROM dfs.`/tmp/csv_no_header.csv`;
+------------------------+
| columns |
+------------------------+
| ["hello","1","2","3"] |
| ["hello","1","2","3"] |
| ["hello","1","2","3"] |
| ["hello","1","2","3"] |
| ["hello","1","2","3"] |
| ["hello","1","2","3"] |
| ["hello","1","2","3"] |
+------------------------+
7 rows selected (0.112 seconds)
info@rittmanmead.com www.rittmanmead.com @rittmanmead
•JSON (Javascript Object Notation) documents are
often used for data interchange

•Exports from Twitter and other consumer services

•Web service responses and other B2B interfaces

•A more lightweight form of XML that is “self-
describing”

•Handles evolving schemas, and optional attributes

•Drill treats each document as a row, and has features
to

•Flatten nested data (extract elements from arrays)

•Generate key/value pairs for loosely structured data
Apache Drill and JSON Documents
use dfs.iot;
show files;
select in_reply_to_user_id, text from `all_tweets.json`
limit 5;
+---------------------+------+
| in_reply_to_user_id | text |
+---------------------+------+
| null | BI Forum 2013 in Brighton has now sold-out |
| null | "Football has become a numbers game |
| null | Just bought Lyndsay Wise’s Book |
| null | An Oracle BI "Blast from the Past" |
| 14716125 | Dilbert on Agile Programming |
+---------------------+------+
5 rows selected (0.229 seconds)
select name, flatten(fillings) as f 

from dfs.users.`/donuts.json` 

where f.cal < 300;
info@rittmanmead.com www.rittmanmead.com @rittmanmead
•Drill can connect to Hive to make use of metastore (incl. multiple Hive metastores)

•NoSQL databases (HBase etc)

•Parquet files (native storage format - columnar + self describing)
Apache Drill and Hive, HBase, Parquet Sources etc
USE hbase;
SELECT * FROM students;
+-------------+-----------------------+-----------------------------------------------------+
| row_key | account | address |
+-------------+-----------------------+------------------------------------------------------+
| [B@e6d9eb7 | {"name":"QWxpY2U="} | {"state":"Q0E=","street":"MTIzIEJhbGxtZXIgQXY="} |
| [B@2823a2b4 | {"name":"Qm9i"} | {"state":"Q0E=","street":"MSBJbmZpbml0ZSBMb29w"} |
| [B@3b8eec02 | {"name":"RnJhbms="} | {"state":"Q0E=","street":"NDM1IFdhbGtlciBDdA=="} |
| [B@242895da | {"name":"TWFyeQ=="} | {"state":"Q0E=","street":"NTYgU291dGhlcm4gUGt3eQ=="} |
+-------------+-----------------------+----------------------------------------------------------------------+
SELECT firstname,lastname FROM 

hiveremote.`customers` limit 10;`

+------------+------------+
| firstname | lastname |
+------------+------------+
| Essie | Vaill |
| Cruz | Roudabush |
| Billie | Tinnes |
| Zackary | Mockus |
| Rosemarie | Fifield |
| Bernard | Laboy |
| Marianne | Earman |
+------------+------------+
SELECT * FROM dfs.`iot_demo/geodata/region.parquet`;
+--------------+--------------+-----------------------+
| R_REGIONKEY | R_NAME | R_COMMENT |
+--------------+--------------+-----------------------+
| 0 | AFRICA | lar deposits. blithe |
| 1 | AMERICA | hs use ironic, even |
| 2 | ASIA | ges. thinly even pin |
| 3 | EUROPE | ly final courts cajo |
| 4 | MIDDLE EAST | uickly special accou |
+--------------+--------------+-----------------------+

Recommended for you

Company Profile- CFMS.-1
Company Profile- CFMS.-1Company Profile- CFMS.-1
Company Profile- CFMS.-1

CFMS provides facility management services including housekeeping, maintenance, and production support staff out of its headquarters in Pune and offices in Ahmedabad. It has a highly trained team and uses quality equipment and chemicals. CFMS is committed to understanding client needs and creating customized service programs. It has an in-house training center to deploy trained staff from day one and implements consistent management systems. Some of CFMS' clients include restaurants, hotels, and automation companies in Pune.

Латеральный маркетинг
Латеральный маркетингЛатеральный маркетинг
Латеральный маркетинг
marketing
環境依存しないSalesforce組織の作り方
環境依存しないSalesforce組織の作り方環境依存しないSalesforce組織の作り方
環境依存しないSalesforce組織の作り方

Salesforce Developers Life

salesforce.com
info@rittmanmead.com www.rittmanmead.com @rittmanmead
•Drill developed for real-time, ad-hoc data exploration with schema discovery on-the-fly
•Individual analysts exploring new datasets, leveraging corporate metadata/data to help

•Hive is more about large-scale, centrally curated set-based big data access

•Drill models conceptually as JSON, vs. Hive’s tabular approach

•Drill introspects schema from whatever it connects to, vs. formal modeling in Hive
Apache Drill vs. Apache Hive
Interactive Queries

(Data Discovery, Tableau/VA)
Reporting Queries

(Canned Reports, OBIEE)
ETL

(ODI, Scripting, Informatica)
Apache Drill Apache Hive
Interactive Queries
100ms - 3mins
Reporting Queries
3mins - 20mins
ETL & Batch Queries
20mins - hours
but…
what’s all this about “Spark”?
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :

Recommended for you

Lorenzoysucazo
LorenzoysucazoLorenzoysucazo
Lorenzoysucazo

Este documento habla sobre una persona extraordinaria que ayudó a Lorenzo a superar sus miedos y sacar la cabeza de su cazo. Esta persona le mostró sus puntos fuertes y le ayudó a expresar sus temores. Ahora Lorenzo puede jugar con los demás y los demás ven sus cualidades, aunque él sigue siendo el mismo.

integracióndiscapacidad
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...

Mark Rittman, CTO of Rittman Mead, gave a keynote presentation on big data for Oracle developers and DBAs with a focus on Apache Spark, real-time analytics, and predictive analytics. He discussed how Hadoop can provide flexible, cheap storage for logs, feeds, and social data. He also explained several Hadoop processing frameworks like Apache Spark, Apache Tez, Cloudera Impala, and Apache Drill that provide faster alternatives to traditional MapReduce processing.

apache drillimpalakudu
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...

Hadoop and NoSQL platforms initially focused on Java developers and slow but massively-scalable MapReduce jobs as an alternative to high-end but limited-scale analytics RDBMS engines. Apache Hive opened-up Hadoop to non-programmers by adding a SQL query engine and relational-style metadata layered over raw HDFS storage, and since then open-source initiatives such as Hive Stinger, Cloudera Impala and Apache Drill along with proprietary solutions from closed-source vendors have extended SQL-on-Hadoop’s capabilities into areas such as low-latency ad-hoc queries, ACID-compliant transactions and schema-less data discovery – at massive scale and with compelling economics. In this session we’ll focus on technical foundations around SQL-on-Hadoop, first reviewing the basic platform Apache Hive provides and then looking in more detail at how ad-hoc querying, ACID-compliant transactions and data discovery engines work along with more specialised underlying storage that each now work best with – and we’ll take a look to the future to see how SQL querying, data integration and analytics are likely to come together in the next five years to make Hadoop the default platform running mixed old-world/new-world analytics workloads.

hivedrillinghadoop
info@rittmanmead.com www.rittmanmead.com @rittmanmead 53
•Another DAG execution engine running on YARN

•More mature than TEZ, with richer API and more vendor support

•Uses concept of an RDD (Resilient Distributed Dataset)

‣RDDs like tables or Pig relations, but can be cached in-memory

‣Great for in-memory transformations, or iterative/cyclic processes

•Spark jobs comprise of a DAG of tasks operating on RDDs

•Access through Scala, Python or Java APIs

•Related projects include

‣Spark SQL

‣Spark Streaming
Apache Spark
info@rittmanmead.com www.rittmanmead.com @rittmanmead 54
•Native support for multiple languages 

with identical APIs

‣Python - prototyping, data wrangling

‣Scala - functional programming features

‣Java - lower-level, application integration

•Use of closures, iterations, and other 

common language constructs to minimize code

•Integrated support for distributed +

functional programming

•Unified API for batch and streaming
Rich Developer Support + Wide Developer Ecosystem
scala> val logfile = sc.textFile("logs/access_log")
14/05/12 21:18:59 INFO MemoryStore: ensureFreeSpace(77353) 

called with curMem=234759, maxMem=309225062
14/05/12 21:18:59 INFO MemoryStore: Block broadcast_2 

stored as values to memory (estimated size 75.5 KB, free 294.6 MB)
logfile: org.apache.spark.rdd.RDD[String] = 

MappedRDD[31] at textFile at <console>:15
scala> logfile.count()
14/05/12 21:19:06 INFO FileInputFormat: Total input paths to process : 1
14/05/12 21:19:06 INFO SparkContext: Starting job: count at <console>:1
...
14/05/12 21:19:06 INFO SparkContext: Job finished: 

count at <console>:18, took 0.192536694 s
res7: Long = 154563
scala> val logfile = sc.textFile("logs/access_log").cache
scala> val biapps11g = logfile.filter(line => line.contains("/biapps11g/"))
biapps11g: org.apache.spark.rdd.RDD[String] = FilteredRDD[34] at filter at <console>:17
scala> biapps11g.count()
...
14/05/12 21:28:28 INFO SparkContext: Job finished: count at <console>:20, took 0.387960876 s
res9: Long = 403
info@rittmanmead.com www.rittmanmead.com @rittmanmead 55
•Spark SQL, and Data Frames, allow RDDs in Spark to be processed using SQL queries

•Bring in and federate additional data from JDBC sources

•Load, read and save data in Hive, Parquet and other structured tabular formats
Spark SQL - Adding SQL Processing to Apache Spark
val accessLogsFilteredDF = accessLogs
.filter( r => ! r.agent.matches(".*(spider|robot|bot|slurp).*"))
.filter( r => ! r.endpoint.matches(".*(wp-content|wp-admin).*")).toDF()
.registerTempTable("accessLogsFiltered")
val topTenPostsLast24Hour = sqlContext.sql("SELECT p.POST_TITLE, p.POST_AUTHOR, COUNT(*) 

as total 

FROM accessLogsFiltered a 

JOIN posts p ON a.endpoint = p.POST_SLUG 

GROUP BY p.POST_TITLE, p.POST_AUTHOR 

ORDER BY total DESC LIMIT 10 ")
// Persist top ten table for this window to HDFS as parquet file
topTenPostsLast24Hour.save("/user/oracle/rm_logs_batch_output/topTenPostsLast24Hour.parquet"

, "parquet", SaveMode.Overwrite)
info@rittmanmead.com www.rittmanmead.com @rittmanmead 56
Choosing the Appropriate SQL-on-Hadoop Engine
Large-Scale Batch Fast and Ad-hoc?
or…

Recommended for you

New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...

Most DBAs are aware something interesting is going on with big data and the Hadoop product ecosystem that underpins it, but aren't so clear about what each component in the stack does, what problem each part solves and why those problems couldn't be solved using the old approach. We'll look at where it's all going with the advent of Spark and machine learning, what's happening with ETL, metadata and analytics on this platform ... why IaaS and datawarehousing-as-a-service will have such a big impact, sooner than you think

hadoopanalyticsgoogle
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012

This document discusses Microsoft's efforts to make big data technologies like Hadoop more accessible through its products. It describes Hadoop, MapReduce, HDFS, and other big data concepts. It then outlines Microsoft's project to create a Hadoop distribution that runs on Windows Server and Windows Azure, including building an ODBC driver to allow tools like Excel to query Hadoop. This will help bring big data to more business users and integrate it with Microsoft's existing BI technologies.

Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data

A primer on Big Data, Hadoop and Microsoft's implementation of Hadoop for Windows Server and Windows Azure.

isotopemsbimicrosoft
info@rittmanmead.com www.rittmanmead.com @rittmanmead 57
Choosing the Appropriate SQL-on-Hadoop Engine
Centralised &

Ordered Metadata
Crazy, Complex and

Changing Schema?
or…
info@rittmanmead.com www.rittmanmead.com @rittmanmead 58
Choosing the Appropriate SQL-on-Hadoop Engine
Integrating with

Existing Corporate DW
Looking to Escape

Legacy DWs to Hadoop?
or…
Coming soon (*)
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :

Recommended for you

Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop

This document summarizes a meetup about Big Data and SQL on Hadoop. The meetup included discussions on what Hadoop is, why SQL on Hadoop is useful, what Hive is, and introduced IBM's BigInsights software for running SQL on Hadoop with improved performance over other solutions. Key topics included HDFS file storage, MapReduce processing, Hive tables and metadata storage, and how BigInsights provides a massively parallel SQL engine instead of relying on MapReduce.

Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)

This document discusses big data analytics platforms and techniques. It describes various open-source projects like Hadoop, Spark, and Mahout that can perform analytics on large datasets. It also discusses commercial analytics platforms from vendors like SAS, Alpine, and Revolution Analytics. Spark is highlighted as gaining rapid adoption for its speed and expanding machine learning capabilities. Key questions are raised about which open-source projects and commercial offerings will emerge as leaders in their categories.

apache sparkapache giraphhadoop
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...

Presentation given at the Danish BI Meetup in September 2016, on the future of BI and how Hadoop, and Big Data, will revolutionise the industry.

hadoopbig databusiness intelligence
(*) possibly
Check-out these Developer VMs:
http://www.cloudera.com/documentation/enterprise/5-3-x/topics/
cloudera_quickstart_vm.html



http://hortonworks.com/products/sandbox/
http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-
bigdatalite-2104726.html
https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill
http://www.rittmanmead.com
info@rittmanmead.com www.rittmanmead.com @rittmanmead
SQL and Data Integration Futures on Hadoop :
A look into the future of analytics on Oracle BDA
Mark Rittman, CTO, Rittman Mead
Enkitec E4, Barcelona, April 2016

Recommended for you

Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech

The document discusses cloud computing systems and MapReduce. It provides background on MapReduce, describing how it works and how it was inspired by functional programming concepts like map and reduce. It also discusses some limitations of MapReduce, noting that it is not designed for general-purpose parallel processing and can be inefficient for certain types of workloads. Alternative approaches like MRlite and DCell are proposed to provide more flexible and efficient distributed processing frameworks.

Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing

This document provides an overview of big data processing techniques including batch processing using MapReduce and Hive, iterative batch processing using Spark, stream processing using Apache Storm, and OLAP over big data using Dremel and Druid. It discusses techniques such as MapReduce, Hive, Spark RDDs, and Storm tuples for processing large datasets and compares small versus big data approaches. Example usages and technologies for different processing types are also outlined.

big datamachine learningsoftware development
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros

This document provides an agenda and overview for a conference session on Big Data and NoSQL for database and BI professionals held from April 10-12 in Chicago, IL. The session will include an overview of big data and NoSQL technologies, then deeper dives into Hadoop, NoSQL databases like HBase, and tools like Hive, Pig, and Sqoop. There will also be demos of technologies like HDInsight, Elastic MapReduce, Impala, and running MapReduce jobs.

#nosql #bi #businessintelligence#bigdata

More Related Content

What's hot

Using Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's ToolkitUsing Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
Mark Rittman
 
Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...
Mark Rittman
 
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
Mark Rittman
 
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case StudyOracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Mark Rittman
 
Deploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle CloudDeploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle Cloud
Mark Rittman
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Rittman Analytics
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
Roman Chukh
 
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
Michael Rainey
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
Gary Stafford
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building Blocks
Amazon Web Services
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
Databricks
 
KnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseKnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge base
Laurent Alquier
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
Data Con LA
 
Does it only have to be ML + AI?
Does it only have to be ML + AI?Does it only have to be ML + AI?
Does it only have to be ML + AI?
Harald Erb
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
Quontra Solutions
 
How To Leverage OBIEE Within A Big Data Architecture
How To Leverage OBIEE Within A Big Data ArchitectureHow To Leverage OBIEE Within A Big Data Architecture
How To Leverage OBIEE Within A Big Data Architecture
Kevin McGinley
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Dmitry Anoshin
 
Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014
Rajan Kanitkar
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
James Serra
 
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
DataConf
 

What's hot (20)

Using Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's ToolkitUsing Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
 
Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...
 
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
 
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case StudyOracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
 
Deploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle CloudDeploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle Cloud
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
 
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building Blocks
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
KnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseKnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge base
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Does it only have to be ML + AI?
Does it only have to be ML + AI?Does it only have to be ML + AI?
Does it only have to be ML + AI?
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
 
How To Leverage OBIEE Within A Big Data Architecture
How To Leverage OBIEE Within A Big Data ArchitectureHow To Leverage OBIEE Within A Big Data Architecture
How To Leverage OBIEE Within A Big Data Architecture
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
 

Viewers also liked

A New Day for Oracle Analytics
A New Day for Oracle AnalyticsA New Day for Oracle Analytics
A New Day for Oracle Analytics
Rich Clayton
 
Why a yorkie is right for me
Why a yorkie is right for meWhy a yorkie is right for me
Why a yorkie is right for me
gmancino
 
Ppt ta deal
Ppt ta dealPpt ta deal
Ppt ta deal
Minie Belle
 
Aws seminar-tokyo ken-final-publish
Aws seminar-tokyo ken-final-publishAws seminar-tokyo ken-final-publish
Aws seminar-tokyo ken-final-publish
awsadovantageseminar
 
Excelentes dibujos de julian beever
Excelentes dibujos de julian beeverExcelentes dibujos de julian beever
Excelentes dibujos de julian beever
Salvador Mata Sosa
 
lockheed martin 2006 Annual Report
lockheed martin 2006 Annual Reportlockheed martin 2006 Annual Report
lockheed martin 2006 Annual Report
finance6
 
Thuyết trình nhóm 6
Thuyết trình nhóm 6Thuyết trình nhóm 6
Thuyết trình nhóm 6
Minh Thanh Thanh
 
Infographics_book_final
Infographics_book_finalInfographics_book_final
Infographics_book_final
Vuyokazi Sodo
 
Ashutosh rubber-pvt-ltd
Ashutosh rubber-pvt-ltdAshutosh rubber-pvt-ltd
Ashutosh rubber-pvt-ltd
Mr. Daxesh Kothari
 
Building A City
Building A CityBuilding A City
Building A City
Christie Lee
 
Power point hbsc3103 faudzi
Power point hbsc3103 faudziPower point hbsc3103 faudzi
Power point hbsc3103 faudzi
Zauhari Hussein
 
Tm31
Tm31Tm31
React Lightning Design System
React Lightning Design SystemReact Lightning Design System
React Lightning Design System
Taiki Yoshikawa
 
1 a pengertian-dasar-statistika
1 a pengertian-dasar-statistika1 a pengertian-dasar-statistika
1 a pengertian-dasar-statistika
Salma Van Licht
 
LATIHAN BAB 7
LATIHAN BAB 7LATIHAN BAB 7
LATIHAN BAB 7
Zakira Hafizah
 
Company Profile- CFMS.-1
Company Profile- CFMS.-1Company Profile- CFMS.-1
Company Profile- CFMS.-1
Shashi Singh
 
Латеральный маркетинг
Латеральный маркетингЛатеральный маркетинг
Латеральный маркетинг
Ermolina Lera
 
環境依存しないSalesforce組織の作り方
環境依存しないSalesforce組織の作り方環境依存しないSalesforce組織の作り方
環境依存しないSalesforce組織の作り方
Taiki Yoshikawa
 
Lorenzoysucazo
LorenzoysucazoLorenzoysucazo
Lorenzoysucazo
Bibiana Del Bianco
 

Viewers also liked (20)

A New Day for Oracle Analytics
A New Day for Oracle AnalyticsA New Day for Oracle Analytics
A New Day for Oracle Analytics
 
Why a yorkie is right for me
Why a yorkie is right for meWhy a yorkie is right for me
Why a yorkie is right for me
 
Ppt ta deal
Ppt ta dealPpt ta deal
Ppt ta deal
 
LATIHAN BAB 2
LATIHAN BAB 2LATIHAN BAB 2
LATIHAN BAB 2
 
Aws seminar-tokyo ken-final-publish
Aws seminar-tokyo ken-final-publishAws seminar-tokyo ken-final-publish
Aws seminar-tokyo ken-final-publish
 
Excelentes dibujos de julian beever
Excelentes dibujos de julian beeverExcelentes dibujos de julian beever
Excelentes dibujos de julian beever
 
lockheed martin 2006 Annual Report
lockheed martin 2006 Annual Reportlockheed martin 2006 Annual Report
lockheed martin 2006 Annual Report
 
Thuyết trình nhóm 6
Thuyết trình nhóm 6Thuyết trình nhóm 6
Thuyết trình nhóm 6
 
Infographics_book_final
Infographics_book_finalInfographics_book_final
Infographics_book_final
 
Ashutosh rubber-pvt-ltd
Ashutosh rubber-pvt-ltdAshutosh rubber-pvt-ltd
Ashutosh rubber-pvt-ltd
 
Building A City
Building A CityBuilding A City
Building A City
 
Power point hbsc3103 faudzi
Power point hbsc3103 faudziPower point hbsc3103 faudzi
Power point hbsc3103 faudzi
 
Tm31
Tm31Tm31
Tm31
 
React Lightning Design System
React Lightning Design SystemReact Lightning Design System
React Lightning Design System
 
1 a pengertian-dasar-statistika
1 a pengertian-dasar-statistika1 a pengertian-dasar-statistika
1 a pengertian-dasar-statistika
 
LATIHAN BAB 7
LATIHAN BAB 7LATIHAN BAB 7
LATIHAN BAB 7
 
Company Profile- CFMS.-1
Company Profile- CFMS.-1Company Profile- CFMS.-1
Company Profile- CFMS.-1
 
Латеральный маркетинг
Латеральный маркетингЛатеральный маркетинг
Латеральный маркетинг
 
環境依存しないSalesforce組織の作り方
環境依存しないSalesforce組織の作り方環境依存しないSalesforce組織の作り方
環境依存しないSalesforce組織の作り方
 
Lorenzoysucazo
LorenzoysucazoLorenzoysucazo
Lorenzoysucazo
 

Similar to Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :

IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
Mark Rittman
 
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Mark Rittman
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Andrew Brust
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
Andrew Brust
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
bddmoscow
 
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)
Thomas W. Dinsmore
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
Mark Rittman
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
Jakir Hossain
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
Michael Ming Lei
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
Andrew Brust
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
Zahra Eskandari
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
N Masahiro
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
Mark Rittman
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
Mark Rittman
 
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerPhilly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Mark Kromer
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
Andrew Brust
 
Big Data training
Big Data trainingBig Data training
Big Data training
vishal192091
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
prajods
 

Similar to Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : (20)

IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
 
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerPhilly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
Big Data training
Big Data trainingBig Data training
Big Data training
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 

More from Mark Rittman

Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Mark Rittman
 
What is Big Data Discovery, and how it complements traditional business anal...
What is Big Data Discovery, and how it complements  traditional business anal...What is Big Data Discovery, and how it complements  traditional business anal...
What is Big Data Discovery, and how it complements traditional business anal...
Mark Rittman
 
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Mark Rittman
 
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
Mark Rittman
 
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
Mark Rittman
 
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
Mark Rittman
 
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODIBIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
Mark Rittman
 
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI ProjectsOGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
Mark Rittman
 
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12cUKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
Mark Rittman
 
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Mark Rittman
 
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gPart 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
Mark Rittman
 
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cPart 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Mark Rittman
 

More from Mark Rittman (12)

Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
 
What is Big Data Discovery, and how it complements traditional business anal...
What is Big Data Discovery, and how it complements  traditional business anal...What is Big Data Discovery, and how it complements  traditional business anal...
What is Big Data Discovery, and how it complements traditional business anal...
 
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
 
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
 
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
 
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
 
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODIBIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
 
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI ProjectsOGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
 
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12cUKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
 
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
 
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gPart 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
 
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cPart 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
 

Recently uploaded

Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
1258strict
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
kihus38
 
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
SARITA PANDEY
 
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeSaket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
shruti singh$A17
 
@Call @Girls in Bangalore 🚒 0000000000 🚒 Tanu Sharma Best High Class Bangalor...
@Call @Girls in Bangalore 🚒 0000000000 🚒 Tanu Sharma Best High Class Bangalor...@Call @Girls in Bangalore 🚒 0000000000 🚒 Tanu Sharma Best High Class Bangalor...
@Call @Girls in Bangalore 🚒 0000000000 🚒 Tanu Sharma Best High Class Bangalor...
ritu36392
 
Bangalore @Call @Girls 0000000000 Riya Khan Beautiful And Cute Girl any Time
Bangalore @Call @Girls 0000000000 Riya Khan Beautiful And Cute Girl any TimeBangalore @Call @Girls 0000000000 Riya Khan Beautiful And Cute Girl any Time
Bangalore @Call @Girls 0000000000 Riya Khan Beautiful And Cute Girl any Time
adityaroy0215
 
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptxBIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
RajdeepPaul47
 
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
Donghwan Lee
 
[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction
Amazon Web Services Korea
 
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
qemnpg
 
Streamlining Legacy Complexity Through Modernization
Streamlining Legacy Complexity Through ModernizationStreamlining Legacy Complexity Through Modernization
Streamlining Legacy Complexity Through Modernization
sanjay singh
 
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
Delhi Call Girls
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
shivvichadda
 
Simon Fraser University degree offer diploma Transcript
Simon Fraser University  degree offer diploma TranscriptSimon Fraser University  degree offer diploma Transcript
Simon Fraser University degree offer diploma Transcript
taqyea
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
Seamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send MoneySeamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send Money
gargtinna79
 
AIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on AzureAIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on Azure
SanelaNikodinoska1
 
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdfOrange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
RealDarrah
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
���❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 

Recently uploaded (20)

Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
 
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
 
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model SafeSaket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
Saket @ℂall @Girls ꧁❤ 9873777170 ❤꧂VIP Neha Singla Top Model Safe
 
@Call @Girls in Bangalore 🚒 0000000000 🚒 Tanu Sharma Best High Class Bangalor...
@Call @Girls in Bangalore 🚒 0000000000 🚒 Tanu Sharma Best High Class Bangalor...@Call @Girls in Bangalore 🚒 0000000000 🚒 Tanu Sharma Best High Class Bangalor...
@Call @Girls in Bangalore 🚒 0000000000 🚒 Tanu Sharma Best High Class Bangalor...
 
Bangalore @Call @Girls 0000000000 Riya Khan Beautiful And Cute Girl any Time
Bangalore @Call @Girls 0000000000 Riya Khan Beautiful And Cute Girl any TimeBangalore @Call @Girls 0000000000 Riya Khan Beautiful And Cute Girl any Time
Bangalore @Call @Girls 0000000000 Riya Khan Beautiful And Cute Girl any Time
 
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptxBIGPPTTTTTTTTtttttttttttttttttttttt.pptx
BIGPPTTTTTTTTtttttttttttttttttttttt.pptx
 
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
[D2T2S04] SageMaker를 활용한 Generative AI Foundation Model Training and Tuning
 
[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction[D3T1S02] Aurora Limitless Database Introduction
[D3T1S02] Aurora Limitless Database Introduction
 
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
一比一原版英国埃塞克斯大学毕业证(essex毕业证书)如何办理
 
Streamlining Legacy Complexity Through Modernization
Streamlining Legacy Complexity Through ModernizationStreamlining Legacy Complexity Through Modernization
Streamlining Legacy Complexity Through Modernization
 
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
 
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
@Call @Girls Coimbatore 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl a...
 
Simon Fraser University degree offer diploma Transcript
Simon Fraser University  degree offer diploma TranscriptSimon Fraser University  degree offer diploma Transcript
Simon Fraser University degree offer diploma Transcript
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA ...
 
Seamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send MoneySeamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send Money
 
AIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on AzureAIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on Azure
 
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdfOrange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
 

Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :

  • 1. info@rittmanmead.com www.rittmanmead.com @rittmanmead SQL and Data Integration Futures on Hadoop : A look into the future of analytics on Oracle BDA Mark Rittman, CTO, Rittman Mead Enkitec E4, Barcelona, April 2016
  • 2. info@rittmanmead.com www.rittmanmead.com @rittmanmead 2 •Mark Rittman, Co-Founder of Rittman Mead ‣Oracle ACE Director, specialising in Oracle BI&DW ‣14 Years Experience with Oracle Technology ‣Regular columnist for Oracle Magazine •Author of two Oracle Press Oracle BI books ‣Oracle Business Intelligence Developers Guide ‣Oracle Exalytics Revealed ‣Writer for Rittman Mead Blog :
 http://www.rittmanmead.com/blog •Email : mark.rittman@rittmanmead.com •Twitter : @markrittman About the Speaker
  • 3. info@rittmanmead.com www.rittmanmead.com @rittmanmead 3 •Started back in 1997 on a bank Oracle DW project •Our tools were Oracle 7.3.4, SQL*Plus, PL/SQL 
 and shell scripts •Went on to use Oracle Developer/2000 and Designer/2000 •Our initial users queried the DW using SQL*Plus •And later on, we rolled-out Discoverer/2000 to everyone else •And life was fun… 15+ Years in Oracle BI and Data Warehousing
  • 4. info@rittmanmead.com www.rittmanmead.com @rittmanmead 4 •Over time, this data warehouse architecture developed •Added Oracle Warehouse Builder to 
 automate and model the DW build •Oracle 9i Application Server (yay!) 
 to deliver reports and web portals •Data Mining and OLAP in the database •Oracle 9i for in-database ETL (and RAC) •Data was typically loaded from 
 Oracle RBDMS and EBS •It was turtles Oracle all the way down… The Oracle-Centric DW Architecture
  • 6. info@rittmanmead.com www.rittmanmead.com @rittmanmead 6 •Many customers and organisations are now running initiatives around “big data” •Some are IT-led and are looking for cost-savings around data warehouse storage + ETL •Others are “skunkworks” projects in the marketing department that are now scaling-up •Projects now emerging from pilot exercises •And design patterns starting to emerge Many Organisations are Running Big Data Initiatives
  • 7. info@rittmanmead.com www.rittmanmead.com @rittmanmead Highly Scalable (and Affordable) Cluster Computing •Enterprise High-End RDBMSs such as Oracle can scale into the petabytes, using clustering ‣Sharded databases (e.g. Netezza) can scale further but with complexity / single workload trade-offs •Hadoop was designed from outside for massive horizontal scalability - using cheap hardware •Anticipates hardware failure and makes multiple copies of data as protection •More nodes you add, more stable it becomes •And at a fraction of the cost of traditional
 RDBMS platforms
  • 8. info@rittmanmead.com www.rittmanmead.com @rittmanmead •Store and analyze huge volumes of structured and unstructured data •In the past, we had to throw away the detail •No need to define a data model during ingest •Supports multiple, flexible schemas •Separation of storage from compute engine •Allows multiple query engines and frameworks
 to work on the same raw datasets Store Everything Forever - And Process in Many Ways Hadoop Data Lake Webserver
 Log Files (txt) Social Media
 Logs (JSON) DB Archives
 (CSV) Sensor Data
 (XML) `Spatial & Graph
 (XML, txt) IoT Logs
 (JSON, txt) Chat Transcripts
 (Txt) DB Transactions
 (CSV, XML) Blogs, Articles
 (TXT, HTML) Raw Data Processed Data NoSQL Key-Value
 Store DB Tabular Data
 (Hive Tables) Aggregates
 (Impala Tables) NoSQL Document 
 Store DB
  • 9. info@rittmanmead.com www.rittmanmead.com @rittmanmead 9 •Typical implementation of Hadoop and big data in an analytic context is the “data lake” •Additional data storage platform with cheap storage, flexible schema support + compute •Data lands in the data lake or reservoir in raw form, then minimally processed •Data then accessed directly by “data scientists”, or processed further into DW In the Context of BI & Analytics : The Data Reservoir Data Transfer Data Access Data Factory Data Reservoir Business Intelligence Tools Hadoop Platform File Based Integration Stream Based Integration Data streams Discovery & Development Labs Safe & secure Discovery and Development environment Data sets and samples Models and programs Marketing / Sales Applications Models Machine Learning Segments Operational Data Transactions Customer Master ata Unstructured Data Voice + Chat Transcripts ETL Based Integration Raw Customer Data Data stored in the original format (usually files) such as SS7, ASN.1, JSON etc. Mapped Customer Data Data sets produced by mapping and transforming raw data
  • 10. info@rittmanmead.com www.rittmanmead.com @rittmanmead 10 •Oracle Engineered system for big data processing and analysis •Start with Oracle Big Data Appliance Starter Rack - expand up to 18 nodes per rack •Cluster racks together for horizontal scale-out using enterprise-quality infrastructure Oracle Big Data Appliance Starter Rack + Expansion • Cloudera CDH + Oracle software • 18 High-spec Hadoop Nodes with InfiniBand switches for internal Hadoop traffic, optimised for network throughput • 1 Cisco Management Switch • Single place for support for H/W + S/W
 Deployed on Oracle Big Data Appliance Oracle Big Data Appliance Starter Rack + Expansion • Cloudera CDH + Oracle software • 18 High-spec Hadoop Nodes with InfiniBand switches for internal Hadoop traffic, optimised for network throughput • 1 Cisco Management Switch • Single place for support for H/W + S/W
 Enriched 
 Customer Profile Modeling Scoring Infiniband
  • 11. info@rittmanmead.com www.rittmanmead.com @rittmanmead •Programming model for processing large data sets in parallel on a cluster •Not specific to a particular language, but usually written in Java •Inspired by the map and reduce functions commonly used in functional programming ‣Map() performs filtering and sorting ‣Reduce() aggregates the output of mappers ‣and a Shuffle() step to redistribute output by keys •Resolved several complications of distributed computing: ‣Allows unlimited computations on unlimited data ‣Map and reduce functions can be easily distributed ‣Originated at Google; Hadoop was Yahoo’s open-source
 implementation of MapReduce, + two are synonymous MapReduce - The Original Big Data Query Framework Mapper Filter, Project Mapper Filter, Project Mapper Filter, Project Reducer Aggregate Reducer Aggregate Output
 One HDFS file per reducer,
 in a directory
  • 12. info@rittmanmead.com www.rittmanmead.com @rittmanmead 12 •Original developed at Facebook, now foundational within the Hadoop project •Allows users to query Hadoop data using SQL-like language •Tabular metadata layer that overlays files, can interpret semi-structured data (e.g. JSON) •Generates MapReduce code to return required data •Extensible through SerDes and Storage Handlers •JDBC and ODBC drivers for most platforms/tools •Perfect for set-based access + batch ETL work Apache Hive : SQL Metadata + Engine over Hadoop
  • 13. info@rittmanmead.com www.rittmanmead.com @rittmanmead •Data integration tools such as Oracle Data Integrator can load and process Hadoop data •BI tools such as Oracle Business Intelligence 12c can report on Hadoop data •Generally use MapReduce and Hive to access data ‣ODBC and JDBC access to Hive tabular data ‣Allows Hadoop unstructured/semi-structured
 data on HDFS to be accessed like RDBMS Hive Provides a SQL Interface for BI + ETL Tools Access direct Hive or extract using ODI12c for structured OBIEE dashboard analysis What pages are people visiting? Who is referring to us on Twitter? What content has the most reach?
  • 15. ?
  • 23. But the future is fast
  • 24. And pretty damn exciting
  • 26. info@rittmanmead.com www.rittmanmead.com @rittmanmead 26 Hadoop 2.0 Processing Frameworks + Tools
  • 27. info@rittmanmead.com www.rittmanmead.com @rittmanmead 27 •MapReduce’s great innovation was to break processing down into distributed jobs •Jobs that have no functional dependency on each other, only upstream tasks •Provides a framework that is infinitely scalable and very fault tolerant •Hadoop handled job scheduling and resource management ‣All MapReduce code had to do was provide the “map” and “reduce” functions ‣Automatic distributed processing ‣Slow but extremely powerful Hadoop 1.0 and MapReduce
  • 28. info@rittmanmead.com www.rittmanmead.com @rittmanmead 28 •A typical Hive or Pig script compiles down into multiple MapReduce jobs •Each job stages its intermediate results to disk •Safe, but slow - write to disk, spin-up separate JVMs for each job MapReduce - Scales By Writing Intermediate Results to Disk SELECT LOWER(hashtags.text), COUNT(*) AS total_count FROM ( SELECT * FROM tweets WHERE regexp_extract(created_at,"(2015)*",1) = "2015" ) tweets LATERAL VIEW EXPLODE(entities.hashtags) t1 AS hashtags GROUP BY LOWER(hashtags.text) ORDER BY total_count DESC LIMIT 15 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.34 sec HDFS Read: 10952994 HDFS Write: 5239 SUCCESS Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 2.1 sec HDFS Read: 9983 HDFS Write: 164 SUCCESS Total MapReduce CPU Time Spent: 7 seconds 440 msec OK 1 2
  • 29. info@rittmanmead.com www.rittmanmead.com @rittmanmead 29 •MapReduce 2 (MR2) splits the functionality of the JobTracker
 by separating resource management and job scheduling/monitoring •Introduces YARN (Yet Another Resource Manager) •Permits other processing frameworks to MR ‣For example, Apache Spark •Maintains backwards compatibility with MR1 •Introduced with CDH5+ MapReduce 2 and YARN Node
 Manager Node
 Manager Node
 Manager Resource
 Manager Client Client
  • 30. info@rittmanmead.com www.rittmanmead.com @rittmanmead 30 •Runs on top of YARN, provides a faster execution engine than MapReduce for Hive, Pig etc •Models processing as an entire data flow graph (DAG), rather than separate job steps ‣DAG (Directed Acyclic Graph) is a new programming style for distributed systems ‣Dataflow steps pass data between them as streams, rather than writing/reading from disk •Supports in-memory computation, enables Hive on Tez (Stinger) and Pig on Tez •Favoured In-memory / Hive v2 
 route by Hortonworks Apache Tez InputData TEZ DAG Map() Map() Map() Reduce() OutputData Reduce() Reduce() Reduce() InputData Map() Map() Reduce() Reduce()
  • 31. info@rittmanmead.com www.rittmanmead.com @rittmanmead 31 Tez Advantage - Drop-In Replacement for MR with Hive, Pig set hive.execution.engine=mr set hive.execution.engine=tez 4m 17s 2m 25s
  • 33. info@rittmanmead.com www.rittmanmead.com @rittmanmead 33 •Cloudera’s answer to Hive query response time issues •MPP SQL query engine running on Hadoop, bypasses MapReduce for direct data access •Mostly in-memory, but spills to disk if required •Uses Hive metastore to access Hive table metadata •Similar SQL dialect to Hive - not as rich though and no support for Hive SerDes, storage handlers etc Cloudera Impala - Fast, MPP-style Access to Hadoop Data
  • 34. info@rittmanmead.com www.rittmanmead.com @rittmanmead 34 •A replacement for Hive, but uses Hive concepts and
 data dictionary (metastore) •MPP (Massively Parallel Processing) query engine
 that runs within Hadoop ‣Uses same file formats, security,
 resource management as Hadoop •Processes queries in-memory •Accesses standard HDFS file data •Option to use Apache AVRO, RCFile,
 LZO or Parquet (column-store) •Designed for interactive, real-time
 SQL-like access to Hadoop How Impala Works Impala Hadoop HDFS etc BI Server Presentation Svr Cloudera Impala
 ODBC Driver Impala Hadoop HDFS etc Impala Hadoop HDFS etc Impala Hadoop HDFS etc Impala Hadoop HDFS etc Multi-Node
 Hadoop Cluster
  • 35. info@rittmanmead.com www.rittmanmead.com @rittmanmead 35 •Log into Impala Shell, run INVALIDATE METADATA command to refresh Impala table list •Run SHOW TABLES Impala SQL command to view tables available •Run COUNT(*) on main ACCESS_PER_POST table to see typical response time Enabling Hive Tables for Impala [oracle@bigdatalite ~]$ impala-shell Starting Impala Shell without Kerberos authentication [bigdatalite.localdomain:21000] > invalidate metadata; Query: invalidate metadata Fetched 0 row(s) in 2.18s [bigdatalite.localdomain:21000] > show tables; Query: show tables +-----------------------------------+ | name | +-----------------------------------+ | access_per_post | | access_per_post_cat_author | | … | | posts | |——————————————————————————————————-+ Fetched 45 row(s) in 0.15s [bigdatalite.localdomain:21000] > select count(*) 
 from access_per_post; Query: select count(*) from access_per_post +----------+ | count(*) | +----------+ | 343 | +----------+ Fetched 1 row(s) in 2.76s
  • 36. info@rittmanmead.com www.rittmanmead.com @rittmanmead 36 •Significant improvement over Hive response time •Now makes Hadoop suitable for ad-hoc querying Significantly-Improved Ad-Hoc Query Response Time vs Hive | Logical Query Summary Stats: Elapsed time 2, Response time 1, Compilation time 0 (seconds) Logical Query Summary Stats: Elapsed time 50, Response time 49, Compilation time 0 (seconds) Simple Two-Table Join against Hive Data Only Simple Two-Table Join against Impala Data Only vs
  • 37. info@rittmanmead.com www.rittmanmead.com @rittmanmead 37 •Beginners usually store data in HDFS using text file formats (CSV) but these have limitations •Apache AVRO often used for general-purpose processing ‣Splitability, schema evolution, in-built metadata, support for block compression •Parquet now commonly used with Impala due to column-orientated storage ‣Mirrors work in RDBMS world around column-store ‣Only return (project) the columns you require across a wide table Apache Parquet - Column-Orientated Storage for Analytics
  • 38. info@rittmanmead.com www.rittmanmead.com @rittmanmead 38 •But Parquet (and HDFS) have significant limitation for real-time analytics applications ‣Append-only orientation, focus on column-store 
 makes streaming ingestion harder •Cloudera Kudu aims to combine best of HDFS + HBase ‣Real-time analytics-optimised ‣Supports updates to data ‣Fast ingestion of data ‣Accessed using SQL-style tables
 and get/put/update/delete API Cloudera Kudu - Combining Best of HBase and Column-Store
  • 39. info@rittmanmead.com www.rittmanmead.com @rittmanmead 39 •Part of Oracle Big Data 4.0 (BDA-only) ‣Also requires Oracle Database 12c, Oracle Exadata Database Machine •Extends Oracle Data Dictionary to cover Hive •Extends Oracle SQL and SmartScan to Hadoop •Extends Oracle Security Model over Hadoop ‣Fine-grained access control ‣Data redaction, data masking ‣Uses fast c-based readers where possible
 (vs. Hive MapReduce generation) ‣Map Hadoop parallelism to Oracle PQ ‣Big Data SQL engine works on top of YARN ‣Like Spark, Tez, MR2 Oracle Big Data SQL Exadata
 Storage Servers Hadoop
 Cluster Exadata Database
 Server Oracle Big
 Data SQL SQL Queries SmartScan SmartScan
  • 42. “where we’re going
 we don’t need roads”
  • 43. “where we’re going
 we don’t need roads”
  • 44. info@rittmanmead.com www.rittmanmead.com @rittmanmead •Apache Drill is another SQL-on-Hadoop project that focus on schema-free data discovery •Inspired by Google Dremel, innovation is querying raw data with schema optional •Automatically infers and detects schema from semi-structured datasets and NoSQL DBs •Join across different silos of data e.g. JSON records, Hive tables and HBase database •Aimed at different use-cases than Hive - 
 low-latency queries, discovery 
 (think Endeca vs OBIEE) Introducing Apache Drill - “We Don’t Need No Roads”
  • 45. info@rittmanmead.com www.rittmanmead.com @rittmanmead •Most modern datasource formats embed their schema in the data (“schema-on-read”) •Apache Drill makes these as easy to join to traditional datasets as “point me at the data” •Cuts out unnecessary work in defining Hive schemas for data that’s self-describing •Supports joining across files,
 databases, NoSQL etc Self-Describing Data - Parquet, AVRO, JSON etc
  • 46. info@rittmanmead.com www.rittmanmead.com @rittmanmead •Files can exist either on the local filesystem, or on HDFS •Connection to directory or file defined in storage configuration •Can work with CSV, TXT, TSV etc •First row of file can provide schema (column names) Apache Drill and Text Files SELECT * FROM dfs.`/tmp/csv_with_header.csv2`; +-------+------+------+------+ | name | num1 | num2 | num3 | +-------+------+------+------+ | hello | 1 | 2 | 3 | | hello | 1 | 2 | 3 | | hello | 1 | 2 | 3 | | hello | 1 | 2 | 3 | | hello | 1 | 2 | 3 | | hello | 1 | 2 | 3 | | hello | 1 | 2 | 3 | +-------+------+------+------+ 7 rows selected (0.12 seconds) SELECT * FROM dfs.`/tmp/csv_no_header.csv`; +------------------------+ | columns | +------------------------+ | ["hello","1","2","3"] | | ["hello","1","2","3"] | | ["hello","1","2","3"] | | ["hello","1","2","3"] | | ["hello","1","2","3"] | | ["hello","1","2","3"] | | ["hello","1","2","3"] | +------------------------+ 7 rows selected (0.112 seconds)
  • 47. info@rittmanmead.com www.rittmanmead.com @rittmanmead •JSON (Javascript Object Notation) documents are often used for data interchange •Exports from Twitter and other consumer services •Web service responses and other B2B interfaces •A more lightweight form of XML that is “self- describing” •Handles evolving schemas, and optional attributes •Drill treats each document as a row, and has features to •Flatten nested data (extract elements from arrays) •Generate key/value pairs for loosely structured data Apache Drill and JSON Documents use dfs.iot; show files; select in_reply_to_user_id, text from `all_tweets.json` limit 5; +---------------------+------+ | in_reply_to_user_id | text | +---------------------+------+ | null | BI Forum 2013 in Brighton has now sold-out | | null | "Football has become a numbers game | | null | Just bought Lyndsay Wise’s Book | | null | An Oracle BI "Blast from the Past" | | 14716125 | Dilbert on Agile Programming | +---------------------+------+ 5 rows selected (0.229 seconds) select name, flatten(fillings) as f 
 from dfs.users.`/donuts.json` 
 where f.cal < 300;
  • 48. info@rittmanmead.com www.rittmanmead.com @rittmanmead •Drill can connect to Hive to make use of metastore (incl. multiple Hive metastores) •NoSQL databases (HBase etc) •Parquet files (native storage format - columnar + self describing) Apache Drill and Hive, HBase, Parquet Sources etc USE hbase; SELECT * FROM students; +-------------+-----------------------+-----------------------------------------------------+ | row_key | account | address | +-------------+-----------------------+------------------------------------------------------+ | [B@e6d9eb7 | {"name":"QWxpY2U="} | {"state":"Q0E=","street":"MTIzIEJhbGxtZXIgQXY="} | | [B@2823a2b4 | {"name":"Qm9i"} | {"state":"Q0E=","street":"MSBJbmZpbml0ZSBMb29w"} | | [B@3b8eec02 | {"name":"RnJhbms="} | {"state":"Q0E=","street":"NDM1IFdhbGtlciBDdA=="} | | [B@242895da | {"name":"TWFyeQ=="} | {"state":"Q0E=","street":"NTYgU291dGhlcm4gUGt3eQ=="} | +-------------+-----------------------+----------------------------------------------------------------------+ SELECT firstname,lastname FROM 
 hiveremote.`customers` limit 10;`
 +------------+------------+ | firstname | lastname | +------------+------------+ | Essie | Vaill | | Cruz | Roudabush | | Billie | Tinnes | | Zackary | Mockus | | Rosemarie | Fifield | | Bernard | Laboy | | Marianne | Earman | +------------+------------+ SELECT * FROM dfs.`iot_demo/geodata/region.parquet`; +--------------+--------------+-----------------------+ | R_REGIONKEY | R_NAME | R_COMMENT | +--------------+--------------+-----------------------+ | 0 | AFRICA | lar deposits. blithe | | 1 | AMERICA | hs use ironic, even | | 2 | ASIA | ges. thinly even pin | | 3 | EUROPE | ly final courts cajo | | 4 | MIDDLE EAST | uickly special accou | +--------------+--------------+-----------------------+
  • 49. info@rittmanmead.com www.rittmanmead.com @rittmanmead •Drill developed for real-time, ad-hoc data exploration with schema discovery on-the-fly •Individual analysts exploring new datasets, leveraging corporate metadata/data to help •Hive is more about large-scale, centrally curated set-based big data access •Drill models conceptually as JSON, vs. Hive’s tabular approach •Drill introspects schema from whatever it connects to, vs. formal modeling in Hive Apache Drill vs. Apache Hive Interactive Queries
 (Data Discovery, Tableau/VA) Reporting Queries
 (Canned Reports, OBIEE) ETL
 (ODI, Scripting, Informatica) Apache Drill Apache Hive Interactive Queries 100ms - 3mins Reporting Queries 3mins - 20mins ETL & Batch Queries 20mins - hours
  • 51. what’s all this about “Spark”?
  • 53. info@rittmanmead.com www.rittmanmead.com @rittmanmead 53 •Another DAG execution engine running on YARN •More mature than TEZ, with richer API and more vendor support •Uses concept of an RDD (Resilient Distributed Dataset) ‣RDDs like tables or Pig relations, but can be cached in-memory ‣Great for in-memory transformations, or iterative/cyclic processes •Spark jobs comprise of a DAG of tasks operating on RDDs •Access through Scala, Python or Java APIs •Related projects include ‣Spark SQL ‣Spark Streaming Apache Spark
  • 54. info@rittmanmead.com www.rittmanmead.com @rittmanmead 54 •Native support for multiple languages 
 with identical APIs ‣Python - prototyping, data wrangling ‣Scala - functional programming features ‣Java - lower-level, application integration •Use of closures, iterations, and other 
 common language constructs to minimize code •Integrated support for distributed +
 functional programming •Unified API for batch and streaming Rich Developer Support + Wide Developer Ecosystem scala> val logfile = sc.textFile("logs/access_log") 14/05/12 21:18:59 INFO MemoryStore: ensureFreeSpace(77353) 
 called with curMem=234759, maxMem=309225062 14/05/12 21:18:59 INFO MemoryStore: Block broadcast_2 
 stored as values to memory (estimated size 75.5 KB, free 294.6 MB) logfile: org.apache.spark.rdd.RDD[String] = 
 MappedRDD[31] at textFile at <console>:15 scala> logfile.count() 14/05/12 21:19:06 INFO FileInputFormat: Total input paths to process : 1 14/05/12 21:19:06 INFO SparkContext: Starting job: count at <console>:1 ... 14/05/12 21:19:06 INFO SparkContext: Job finished: 
 count at <console>:18, took 0.192536694 s res7: Long = 154563 scala> val logfile = sc.textFile("logs/access_log").cache scala> val biapps11g = logfile.filter(line => line.contains("/biapps11g/")) biapps11g: org.apache.spark.rdd.RDD[String] = FilteredRDD[34] at filter at <console>:17 scala> biapps11g.count() ... 14/05/12 21:28:28 INFO SparkContext: Job finished: count at <console>:20, took 0.387960876 s res9: Long = 403
  • 55. info@rittmanmead.com www.rittmanmead.com @rittmanmead 55 •Spark SQL, and Data Frames, allow RDDs in Spark to be processed using SQL queries •Bring in and federate additional data from JDBC sources •Load, read and save data in Hive, Parquet and other structured tabular formats Spark SQL - Adding SQL Processing to Apache Spark val accessLogsFilteredDF = accessLogs .filter( r => ! r.agent.matches(".*(spider|robot|bot|slurp).*")) .filter( r => ! r.endpoint.matches(".*(wp-content|wp-admin).*")).toDF() .registerTempTable("accessLogsFiltered") val topTenPostsLast24Hour = sqlContext.sql("SELECT p.POST_TITLE, p.POST_AUTHOR, COUNT(*) 
 as total 
 FROM accessLogsFiltered a 
 JOIN posts p ON a.endpoint = p.POST_SLUG 
 GROUP BY p.POST_TITLE, p.POST_AUTHOR 
 ORDER BY total DESC LIMIT 10 ") // Persist top ten table for this window to HDFS as parquet file topTenPostsLast24Hour.save("/user/oracle/rm_logs_batch_output/topTenPostsLast24Hour.parquet"
 , "parquet", SaveMode.Overwrite)
  • 56. info@rittmanmead.com www.rittmanmead.com @rittmanmead 56 Choosing the Appropriate SQL-on-Hadoop Engine Large-Scale Batch Fast and Ad-hoc? or…
  • 57. info@rittmanmead.com www.rittmanmead.com @rittmanmead 57 Choosing the Appropriate SQL-on-Hadoop Engine Centralised &
 Ordered Metadata Crazy, Complex and
 Changing Schema? or…
  • 58. info@rittmanmead.com www.rittmanmead.com @rittmanmead 58 Choosing the Appropriate SQL-on-Hadoop Engine Integrating with
 Existing Corporate DW Looking to Escape
 Legacy DWs to Hadoop? or…
  • 62. Check-out these Developer VMs: http://www.cloudera.com/documentation/enterprise/5-3-x/topics/ cloudera_quickstart_vm.html
 
 http://hortonworks.com/products/sandbox/ http://www.oracle.com/technetwork/database/bigdata-appliance/oracle- bigdatalite-2104726.html https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill
  • 64. info@rittmanmead.com www.rittmanmead.com @rittmanmead SQL and Data Integration Futures on Hadoop : A look into the future of analytics on Oracle BDA Mark Rittman, CTO, Rittman Mead Enkitec E4, Barcelona, April 2016