SlideShare a Scribd company logo
BUILD A SUPERCHARGED DATAMART
WITH SOLR
Elliott Cordo
Chief Architect, Caserta Concepts
What is Solr
• Solr is an open source search platform
• Based on Core Lucene search technology
• Bundled up with an API, Management Tools, UI, scalability
But isn’t it just a search engine
• Although Solr was primarily architected as a search
engine, there is no reason you can’t use it as a database
• Search based application movement promotes search
engine as a data store
• Search has long been a “cheap” option fast and
interactive queries NoSQL and Hadoop datastores
So why would we use it
• Solr is fast –
• expect low ms response times on simple lookups
• properly tuned even complex queries will take less than 100ms
• Solr scales
• High concurrency
• Scales horizontally and vertically (larger hardware)

Recommended for you

Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics Platform

Netflix collects over 100 billion events per day from over 1000 device types and 500 apps/services. They built a big data pipeline using open source tools like NetflixOSS, Hadoop, Druid, Elasticsearch, and RxJava to ingest, process, store, and query this data in real-time and perform tasks like intelligent alerts, distributed tracing, and guided debugging. The system is designed for high throughput and fault tolerance to support a variety of use cases while being simple for message producing and consumption. Developers are encouraged to contribute to improving the open source tools that power Netflix's data platform.

Solr for Data Science
Solr for Data ScienceSolr for Data Science
Solr for Data Science

Solr is a great tool to have in the data scientist toolbox. In this talk, I walk through several demos of using Solr to data science activities as well as explore various use cases for Solr and data science

apache solrscalable searchsolr
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy

Gregg Donovan presented on lessons learned from sharding Solr at Etsy over three versions: 1) Initially, Etsy did not shard to avoid problems, but the single node approach did not scale. 2) The first sharding version used local sharding across multiple JVMs per host for better latency and manageability. 3) The current version uses distributed sharding across data centers for further latency gains, but this introduced challenges of partial failures, synchronization, and distributed queries.

solrlucene/solr revolution 2015
And it has the best query flexibility of any
NOSQL store!
..and in many cases RDBMS
• Grouping and Aggregation via Facets
• Fuzzy Search
• Equality and Range queries
• Geospatial capabilities
• HIGHLY extensible!
Another Datastore to Manage??
• Polygot persistence/polygot programming
• Feature/function will drive which technology should be
used
• Use the right tool for the job: Relational, MPP, Hadoop,
Graph, KV, NOSQL
Thankfully Solr is pretty easy to learn and manage!
When it works well
• Search is front and center  end users need to fuzzy
search dimensional attributes
• Flexible /Sparse schema
• Need for speed -> faster queries for more user
engagement
• Concurrency -> fast queries on client facing or open web
Use Cases
• Real time analytics  ingest incomming events from
Flume/Logstash/Custom app
• Supplement NOSQL, MPP, or Hadoop analytics
• Web facing analytics DB

Recommended for you

Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business

This document discusses technologies for data analytics services for enterprise businesses. It begins by defining enterprise businesses as those "not about IT" and data analytics services as providing insights into business metrics like customer reach, ad views, purchases, and more using data. It then outlines some key technologies needed for such services, including data management systems, distributed processing systems, queues and schedulers, tools for connecting systems, and methods for controlling jobs and workflows with retries to handle failures. Specific challenges around deadlines, idempotent operations, and replay-able workflows are also addressed.

Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User GroupCascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Group

Cascalog is a Clojure-based query language for Hadoop that provides a powerful and easy-to-use tool for data analysis. It allows users to write queries as regular Clojure code, offering features like joins, aggregators, functions, and sorting. Cascalog is unique in that it offers the full power of Clojure at all times by integrating queries directly into the programming language. BackType uses Cascalog for tasks like identifying influencers on social media, determining exposure to URLs, and studying engagement over time.

clojurehadoopcascalog
Managed Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty ImagesManaged Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty Images

Getty Images uses a managed search system to allow business users to control image search results. The system breaks search scoring into relevancy, recency, and image source components. It provides interfaces to adjust component weights and visualize the effects. Test algorithms can be run on a percentage of users before being promoted to the main search. The system is built on SOLR and uses custom plugins and functions to implement complex scoring and result shuffling while providing business users simple controls.

lucenesolrlucene/solr revolution 2014
What it doesn’t do well
• Joins across collections/cores (tables)
• Complex arbitrary queries
• Limited integration to standard ETL and BI frameworks
How do you get data in?
• A robust API
• Modules and libraries for just about any programing
language
• Index data in any DB via JDBC
• Pull in XML and Delimited files with Simple Posting Tool
• Flume/Logstash
NOTE: that like it NOSQL cousins, data needs to be
Flattened!
How do you interact with Solr?
HTTP, Nice concise query language
http://localhost:8983/solr/collection1/select?q=city:Yuma&wt=json&indent=true
And what does the response look like:
"responseHeader":{
"status":0,
"QTime":1,
"params":{
"indent":"true",
"q":"city:Yuma",
"wt":"json"}},
"response":{"numFound":6,"start":0,"docs":[
{
"review_id":"JhUliQTD9iyGWov2nv-ZJA",
"stars":2,
"review_date":"2009-09-23T00:00:00Z",
"business_id":"gKRUdbTPBZ7kwBRCeZDDWA",
"business_name":"Wingate By Wyndham",
"city":"Yuma",
"state":"AZ",
"longitude":"-112.09343969999999",
"latitude":"33.434925100000001",
"user_id":"AqlZdDD7NK1fpQi9ltqIXQ",
"user_name":"Studl",
"_version_":1475098783569149953},
So what about analytic queries
select city, count(1)
from reviews
where state=‘AZ’
group by city
http://localhost:8983/solr/collection1/select?q=state:AZ&wt=json&indent
=true&facet=true&facet.field=city&rows=0&facet.mincount=1&facet.limit
=-1

Recommended for you

Get involved with the Apache Software Foundation
Get involved with the Apache Software FoundationGet involved with the Apache Software Foundation
Get involved with the Apache Software Foundation

Presented at Indian Institute of Information Technology (IIIT) Allahabad on 21 Oct 2009 to students about the Apache Software Foundation, Lucene, Solr, Hadoop and on the benefits of contributing to open source projects. The target audience was sophomore, junior and senior B.Tech students.

solrlucenehadoop
Real time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchReal time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and Elasticsearch

This document discusses using Hadoop and Elasticsearch for real-time analytics. It provides an overview of Elasticsearch, including how it is document-oriented, schema-free, distributed and fast. It also demonstrates indexing, retrieving, updating and deleting documents from Elasticsearch. The demo portion involves extracting data from a SQL database using Hive, transforming it with Hadoop/Hive, and loading it into Elasticsearch to run queries. Lessons learned focus on concurrency, filtering, field data caching and JVM memory usage.

elasticsearchanalyticshadoop
Real Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchReal Time search using Spark and Elasticsearch
Real Time search using Spark and Elasticsearch

This document discusses using Spark Streaming and Elasticsearch to enable real-time search and analysis of streaming data. Spark Streaming processes and enriches streaming data and stores it in Elasticsearch for low-latency search and alerts. The elasticsearch-hadoop connector allows Spark jobs to read from and write to Elasticsearch, integrating the batch processing of Spark with the real-time search of Elasticsearch.

More query fun
select city, count(1)
from reviews
where state=‘AZ’
and review_date between ‘2012-03-01’ and ‘2012-03-06’
group by city
having count(1)>=20
http://localhost:8983/solr/collection1/select?q=state:AZ+review_date:%
5B2012-03-01T23:59:59.999Z TO 2012-03-
06T00:00:00Z%5D&wt=json&indent=true&facet=true&facet.field=city&r
ows=0&facet.mincount=20&facet.limit=-1
Facet stats give you aggregation
http://localhost:8983/solr/collection1/select?q=state:AZ+review_date:%5B2012-
03-01T23:59:59.999Z%20TO%202012-03-
06T00:00:00Z%5D&wt=json&indent=true&facet=true&rows=0&facet.mincount=
1&facet.limit=-1&stats=true&stats.field=stars&stats.facet=city
"stats":{
"stats_fields":{
"stars":{
"min":1.0,
"max":5.0,
"count":991,
"missing":0,
"sum":3685.0,
"sumOfSquares":15313.0,
"mean":3.7184661957618568,
"stddev":1.2754290498612053,
"facets":{
"city":{
"Peoria":{
"min":1.0,
"max":5.0,
"count":14,
"missing":0,
"sum":54.0,
"sumOfSquares":234.0,
"mean":3.857142857142857,
"stddev":1.4064216928154862,
"facets":{}},
"Goodyear":{
"min":2.0,
"max":5.0,
"count":7,
….
Facet pivots too!
http://localhost:8983/solr/collection1/select?q=state:AZ+review_date:%5B2012-
03-01T23:59:59.999Z%20TO%202012-03-
06T00:00:00Z%5D&wt=json&indent=true&facet=true&rows=0&facet.mincount=
1&facet.limit=-1&facet.pivot=city,business_name
"facet_pivot":{
"city,business_name":[{
"field":"city",
"value":"Anthem",
"count":3,
"pivot":[{
"field":"business_name",
"value":"Outlets At Anthem",
"count":1},
{
"field":"business_name",
"value":"Q to U BBQ",
"count":1},
{
"field":"business_name",
"value":"Shanghai Club",
"count":1}]},
{
"field":"city",
"value":"Apache Junction",
"count":1,
"pivot":[{
"field":"business_name",
"value":"Lost Dutchman State Park",
"count":1}]},
UI please!
• Roll your own  it’s not that hard
• How about using Python Flask to render a Solr Response
to D3 or Google Charts
• Sometimes a custom solution is the best option

Recommended for you

Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean Downes

Data can be viewed as the exhaust of online activity. With the rise of cloud-based data platforms, barriers to data storage and transfer have crumbled. The demand for creative applications and learning from those datasets has accelerated. Rapid acceleration can quickly accrue disorder, and disorderly data design can turn the deepest data lake into an impenetrable swamp. In this talk, I will discuss the evolution of the data science workflow at Expedia with a special emphasis on Learning to Rank problems. From the heroic early days of ad-hoc Spark exploration to our first production sort model on the cloud, we will explore the process of industrializing the workflow. Layered over our story, I will share some best practices and suggestions on how to keep your data productive, or even pull your organization out of the data swamp.

apache sparkspark summit
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013

The document discusses Solr 4, an open source search platform built on Apache Lucene. Some key points: - Solr 4 is a NoSQL search server that provides distributed indexing, fault tolerance, and real-time search capabilities. - Solr Cloud is Solr's distributed architecture which uses Zookeeper for coordination to provide features like automatic sharding and replication of indexes across multiple servers. - The document outlines Solr 4's capabilities including schema-less options, atomic updates, optimistic concurrency, and a REST API for managing the schema dynamically.

solr 4no sql
Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineering

This document discusses data engineering. It defines data engineering as software engineering focused on dealing with large amounts of data. It explains why data engineering has become important now due to advances in technology and economics. The document then discusses data engineering concepts like distributed systems, parallel processing, and databases. It provides an example of a data pipeline that collects tweets and processes them. Finally, it discusses qualities of an ideal data engineer.

And the really easy way
Banana – A Solr port of Kibana!
Why should Elasticache fans have all the fun?
And it’s open source!
https://github.com/LucidWorks/banana
Banana
• An AngularJS app (pure javascript, runs in any
browser)
• Make a pretty dashboard with no development in a
couple minutes
• Very user friendly, users can create their own
content
https://github.com/Caserta-Concepts/Solr-Datamart
elliott@casertaconcepts.com

More Related Content

What's hot

Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksSearching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Lucidworks
 
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
DataWorks Summit
 
Dictionary Based Annotation at Scale with Spark by Sujit Pal
Dictionary Based Annotation at Scale with Spark by Sujit PalDictionary Based Annotation at Scale with Spark by Sujit Pal
Dictionary Based Annotation at Scale with Spark by Sujit Pal
Spark Summit
 
Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics Platform
Sudhir Tonse
 
Solr for Data Science
Solr for Data ScienceSolr for Data Science
Solr for Data Science
Grant Ingersoll
 
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lucidworks
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
SATOSHI TAGOMORI
 
Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User GroupCascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Group
nathanmarz
 
Managed Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty ImagesManaged Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty Images
Lucidworks
 
Get involved with the Apache Software Foundation
Get involved with the Apache Software FoundationGet involved with the Apache Software Foundation
Get involved with the Apache Software Foundation
Shalin Shekhar Mangar
 
Real time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchReal time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and Elasticsearch
Abhishek Andhavarapu
 
Real Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchReal Time search using Spark and Elasticsearch
Real Time search using Spark and Elasticsearch
Sigmoid
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean Downes
Databricks
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
Lucidworks (Archived)
 
Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineering
nathanmarz
 
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
Lucidworks
 
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetJourney of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Lucidworks
 
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Lucidworks
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
Lucidworks
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
Rahul Jain
 

What's hot (20)

Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, LucidworksSearching for Better Code: Presented by Grant Ingersoll, Lucidworks
Searching for Better Code: Presented by Grant Ingersoll, Lucidworks
 
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
 
Dictionary Based Annotation at Scale with Spark by Sujit Pal
Dictionary Based Annotation at Scale with Spark by Sujit PalDictionary Based Annotation at Scale with Spark by Sujit Pal
Dictionary Based Annotation at Scale with Spark by Sujit Pal
 
Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics Platform
 
Solr for Data Science
Solr for Data ScienceSolr for Data Science
Solr for Data Science
 
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User GroupCascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Group
 
Managed Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty ImagesManaged Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty Images
 
Get involved with the Apache Software Foundation
Get involved with the Apache Software FoundationGet involved with the Apache Software Foundation
Get involved with the Apache Software Foundation
 
Real time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchReal time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and Elasticsearch
 
Real Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchReal Time search using Spark and Elasticsearch
Real Time search using Spark and Elasticsearch
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean Downes
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 
Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineering
 
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
 
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetJourney of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
 
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
 

Similar to Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using Solr

Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
Lucidworks
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_databaseOracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_database
Getting value from IoT, Integration and Data Analytics
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Lucidworks
 
Search On Hadoop Frontier Meetup
Search On Hadoop Frontier MeetupSearch On Hadoop Frontier Meetup
Search On Hadoop Frontier Meetup
gregchanan
 
Search On Hadoop
Search On HadoopSearch On Hadoop
Search On Hadoop
bigdatagurus_meetup
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
Mark Miller
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Lucas Jellema
 
Apache drill
Apache drillApache drill
Apache drill
MapR Technologies
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
bddmoscow
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
ssuserd3a367
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
N Masahiro
 
Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
Lucidworks (Archived)
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesDropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Kyle Banerjee
 
Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)
Remy Rosenbaum
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summit
Open Analytics
 
Open Source SQL Databases
Open Source SQL DatabasesOpen Source SQL Databases
Open Source SQL Databases
Emanuel Calvo
 
Practical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+SolrPractical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+Solr
Jake Mannix
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
Jake Mannix
 
Search onhadoopsfhug081413
Search onhadoopsfhug081413Search onhadoopsfhug081413
Search onhadoopsfhug081413
gregchanan
 

Similar to Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using Solr (20)

Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_databaseOracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_database
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
 
Search On Hadoop Frontier Meetup
Search On Hadoop Frontier MeetupSearch On Hadoop Frontier Meetup
Search On Hadoop Frontier Meetup
 
Search On Hadoop
Search On HadoopSearch On Hadoop
Search On Hadoop
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
 
Apache drill
Apache drillApache drill
Apache drill
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesDropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
 
Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summit
 
Open Source SQL Databases
Open Source SQL DatabasesOpen Source SQL Databases
Open Source SQL Databases
 
Practical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+SolrPractical Machine Learning for Smarter Search with Spark+Solr
Practical Machine Learning for Smarter Search with Spark+Solr
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
 
Search onhadoopsfhug081413
Search onhadoopsfhug081413Search onhadoopsfhug081413
Search onhadoopsfhug081413
 

More from Caserta

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
Caserta
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Caserta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
Caserta
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Caserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Caserta
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Caserta
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
Caserta
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
Caserta
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Caserta
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
Caserta
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
Caserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
Caserta
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
Caserta
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
Caserta
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
Caserta
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
Caserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
Caserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
Caserta
 

More from Caserta (20)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 

Recently uploaded

20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
Matthew Sinclair
 
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
kantakumariji156
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
BookNet Canada
 
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
amitchopra0215
 
Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1
FellyciaHikmahwarani
 
HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)
Alpen-Adria-Universität
 
AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)
apoorva2579
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Aurora Consulting
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
Matthew Sinclair
 
Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024
The Digital Insurer
 
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
kantakumariji156
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
Vijayananda Mohire
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
ScyllaDB
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
BookNet Canada
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
ScyllaDB
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx
SATYENDRA100
 

Recently uploaded (20)

20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
 
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
 
Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
 
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
 
Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1
 
HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)
 
AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
 
Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024Verti - EMEA Insurer Innovation Award 2024
Verti - EMEA Insurer Innovation Award 2024
 
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx
 

Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using Solr

  • 1. BUILD A SUPERCHARGED DATAMART WITH SOLR Elliott Cordo Chief Architect, Caserta Concepts
  • 2. What is Solr • Solr is an open source search platform • Based on Core Lucene search technology • Bundled up with an API, Management Tools, UI, scalability
  • 3. But isn’t it just a search engine • Although Solr was primarily architected as a search engine, there is no reason you can’t use it as a database • Search based application movement promotes search engine as a data store • Search has long been a “cheap” option fast and interactive queries NoSQL and Hadoop datastores
  • 4. So why would we use it • Solr is fast – • expect low ms response times on simple lookups • properly tuned even complex queries will take less than 100ms • Solr scales • High concurrency • Scales horizontally and vertically (larger hardware)
  • 5. And it has the best query flexibility of any NOSQL store! ..and in many cases RDBMS • Grouping and Aggregation via Facets • Fuzzy Search • Equality and Range queries • Geospatial capabilities • HIGHLY extensible!
  • 6. Another Datastore to Manage?? • Polygot persistence/polygot programming • Feature/function will drive which technology should be used • Use the right tool for the job: Relational, MPP, Hadoop, Graph, KV, NOSQL Thankfully Solr is pretty easy to learn and manage!
  • 7. When it works well • Search is front and center  end users need to fuzzy search dimensional attributes • Flexible /Sparse schema • Need for speed -> faster queries for more user engagement • Concurrency -> fast queries on client facing or open web
  • 8. Use Cases • Real time analytics  ingest incomming events from Flume/Logstash/Custom app • Supplement NOSQL, MPP, or Hadoop analytics • Web facing analytics DB
  • 9. What it doesn’t do well • Joins across collections/cores (tables) • Complex arbitrary queries • Limited integration to standard ETL and BI frameworks
  • 10. How do you get data in? • A robust API • Modules and libraries for just about any programing language • Index data in any DB via JDBC • Pull in XML and Delimited files with Simple Posting Tool • Flume/Logstash NOTE: that like it NOSQL cousins, data needs to be Flattened!
  • 11. How do you interact with Solr? HTTP, Nice concise query language http://localhost:8983/solr/collection1/select?q=city:Yuma&wt=json&indent=true And what does the response look like: "responseHeader":{ "status":0, "QTime":1, "params":{ "indent":"true", "q":"city:Yuma", "wt":"json"}}, "response":{"numFound":6,"start":0,"docs":[ { "review_id":"JhUliQTD9iyGWov2nv-ZJA", "stars":2, "review_date":"2009-09-23T00:00:00Z", "business_id":"gKRUdbTPBZ7kwBRCeZDDWA", "business_name":"Wingate By Wyndham", "city":"Yuma", "state":"AZ", "longitude":"-112.09343969999999", "latitude":"33.434925100000001", "user_id":"AqlZdDD7NK1fpQi9ltqIXQ", "user_name":"Studl", "_version_":1475098783569149953},
  • 12. So what about analytic queries select city, count(1) from reviews where state=‘AZ’ group by city http://localhost:8983/solr/collection1/select?q=state:AZ&wt=json&indent =true&facet=true&facet.field=city&rows=0&facet.mincount=1&facet.limit =-1
  • 13. More query fun select city, count(1) from reviews where state=‘AZ’ and review_date between ‘2012-03-01’ and ‘2012-03-06’ group by city having count(1)>=20 http://localhost:8983/solr/collection1/select?q=state:AZ+review_date:% 5B2012-03-01T23:59:59.999Z TO 2012-03- 06T00:00:00Z%5D&wt=json&indent=true&facet=true&facet.field=city&r ows=0&facet.mincount=20&facet.limit=-1
  • 14. Facet stats give you aggregation http://localhost:8983/solr/collection1/select?q=state:AZ+review_date:%5B2012- 03-01T23:59:59.999Z%20TO%202012-03- 06T00:00:00Z%5D&wt=json&indent=true&facet=true&rows=0&facet.mincount= 1&facet.limit=-1&stats=true&stats.field=stars&stats.facet=city "stats":{ "stats_fields":{ "stars":{ "min":1.0, "max":5.0, "count":991, "missing":0, "sum":3685.0, "sumOfSquares":15313.0, "mean":3.7184661957618568, "stddev":1.2754290498612053, "facets":{ "city":{ "Peoria":{ "min":1.0, "max":5.0, "count":14, "missing":0, "sum":54.0, "sumOfSquares":234.0, "mean":3.857142857142857, "stddev":1.4064216928154862, "facets":{}}, "Goodyear":{ "min":2.0, "max":5.0, "count":7, ….
  • 15. Facet pivots too! http://localhost:8983/solr/collection1/select?q=state:AZ+review_date:%5B2012- 03-01T23:59:59.999Z%20TO%202012-03- 06T00:00:00Z%5D&wt=json&indent=true&facet=true&rows=0&facet.mincount= 1&facet.limit=-1&facet.pivot=city,business_name "facet_pivot":{ "city,business_name":[{ "field":"city", "value":"Anthem", "count":3, "pivot":[{ "field":"business_name", "value":"Outlets At Anthem", "count":1}, { "field":"business_name", "value":"Q to U BBQ", "count":1}, { "field":"business_name", "value":"Shanghai Club", "count":1}]}, { "field":"city", "value":"Apache Junction", "count":1, "pivot":[{ "field":"business_name", "value":"Lost Dutchman State Park", "count":1}]},
  • 16. UI please! • Roll your own  it’s not that hard • How about using Python Flask to render a Solr Response to D3 or Google Charts • Sometimes a custom solution is the best option
  • 17. And the really easy way Banana – A Solr port of Kibana! Why should Elasticache fans have all the fun? And it’s open source! https://github.com/LucidWorks/banana
  • 18. Banana • An AngularJS app (pure javascript, runs in any browser) • Make a pretty dashboard with no development in a couple minutes • Very user friendly, users can create their own content