This Ain't Your Parents' Search Engine

•Download as PPTX, PDF•

1 like•468 views

This document discusses how search has evolved beyond simple text matching to include features like faceting, aggregations, and spatial search. It summarizes new capabilities in Apache Lucene and Solr like reduced memory usage, pluggable codecs, and distributed capabilities. The document also describes how LucidWorks provides tools to integrate search with Hadoop, including connectors, ingestion helpers, and open source projects like Logstash for Solr. Finally, it advertises LucidWorks products and services that leverage signals from user interactions to power recommendations, analytics, and discovery.

What's hot

Splunk's Hunk: A Powerful Way to Visualize Your Data Stored in MongoDB

MongoDB

This document discusses Splunk Hunk, which enables users to combine time series event data stored in MongoDB with Splunk's data visualization and search capabilities. It provides an overview of Splunk Hunk's components and architecture, describes how to install and configure the MongoDB virtual index app to integrate MongoDB data with Splunk, and demonstrates how to query and analyze MongoDB data using Splunk.

Webinar: Rapid Solr Development with Fusion

Lucidworks

The document discusses Lucidworks Fusion, a platform that enables rapid development of search applications using Apache Solr. It provides concise summaries of key points about Lucidworks' contributions to Solr, the features and support levels of Fusion and Solr Enterprise, the architecture of Fusion, new connectors in version 1.3 of Fusion, and instructions for downloading and starting a demo of Fusion.

BlueData Hunk Integration: Splunk Analytics for Hadoop

BlueData, Inc.

Hunk is a Splunk analytics tool that allows users to explore, analyze, and visualize raw big data stored in Hadoop and NoSQL data stores. It can interactively query raw data, accelerate reporting, create charts and dashboards, and archive historical data to HDFS. BlueData's EPIC platform enables running Hunk jobs on Hadoop clusters while accessing data from any storage system, such as HDFS, NFS, Gluster, and others. Hunk supports ingesting large amounts of data and provides pre-packaged analytics functions and intuitive visualization of results.

Hadoop world overview trends and topics

Valentin Kropov

Valentyn Kropov, Big Data Solutions Architect has recently attended "Hadoop World / Strata" – biggest and coolest Big Data conference in a World, and he can't wait to share fresh trends and topics straight from New-York. Come and learn how Hadoop cluster will help NASA to explore Mars, how Netflix build 10PB platform, what are the latest trends in Spark, to learn about newest, just announced storage engine from Cloudera called Kudu and many many more interesting stuff.

What's next for Big Data? -- Apache Spark

TUMRA | Big Data Science - Gain a competitive advantage through Big Data & Data Science

Michael Cutler (CTO cofounder of TUMRA) provides a high-level introduction to Apache Spark in a presentation given at ‘Big Data Week 2014’ #BDW14 held at University College London. TUMRA were early adopters of Spark after a brief PoC in Dec ‘12 and took it to production just a few months later. The main motivation to do so was the inflexibility and high-latency of Hadoop Map/Reduce jobs and the knock-on effect for technology that utilises it (Mahout machine learning, Hive data warehousing, Cascading). With two primary uses case ‘Ecommerce Personalisation’ and ‘Marketing Automation’ TUMRA are currently flowing around 29 million ‘user engagement events’ (JSON) each day through Apache Kafka and Spark Streaming at peak rates of up to 800 events per second. TUMRA use Apache Spark on Amazon Web Services (EC2) in production for a mix of machine learning model building, graph analytics and near-real-time reporting. To learn more about how we use Spark and the services we can deliver through our Platform please contact: hello@tumra.com

Hunk - Unlocking The Power of Big Data Breakout Session

Splunk

This document discusses Splunk's Hunk product and how it allows users to analyze data stored in Hadoop using Splunk. Hunk runs natively in Hadoop using MapReduce, supports mixed mode searching that allows previewing data, and auto-deploys Splunk components to Hadoop data nodes for real-time indexing. It also provides role-based security and supports connecting to data in NoSQL databases and SQL databases through Splunk's DB Connect product.

Data Day Seattle 2015: Sarah Guido

Bitly

Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...

Lucidworks

This document discusses how Walmart uses Apache Solr as a "not-so-evil twin" to complement their source of truth database and help scale their data infrastructure. It describes how Walmart abstracts the complexity of managing databases, caches, search queries, and messaging to provide scalable querying across database shards. The use of Solr has allowed Walmart to offload queries, recurring reads, analytics

4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...

PROIDEA

Według szacunków do 2020 roku wygenerujmy 40 Zetta byte’ów, a do roku 2025 aż 163 Zetta byte’ów różnego rodzaju danych, a ich dokładna analiza ACpozwali na odkrywanie nowych zjawisk, optymalizacje procesów, czy wspomaganie procesów decyzyjnych. Aby efektywnie przetwarzać tak duże zbiory danych potrzebujemy nowych technik analizy danych oraz innowacyjnych rozwiązań technologicznych. Ważną role pełni tutaj chmura Azure, która oferuje szereg usług, przy użyciu których możemy tworzyć rozwiązania na potrzeby przetwarzania Big Data zarówno w trybie batch’owych jak i ‘near real time’. Podczas sesji stworzymy przykładowe rozwiązanie przetwarzania Big Data oparte o architekturę Lambda , z wykorzystaniem usług platformy Azure, takich jak Azure Data Factory, Azure Stream Analytics, Azure HdInsight, Azure Event (IoT) Hub, czy Azure Data Lake.

Big data for bay area big data developer

19scottmiller

The document announces a 3-day Big Data Developer Conference to take place from July 15-17 at the Santa Clara Convention Center. The conference will provide extensive workshops and technical talks on various Big Data technologies like Spark, Hadoop, MongoDB, Neo4J, Cassandra, and data analytics tools. Engineers, developers, managers and students involved in Big Data are encouraged to register and attend the conference.

Available platforms for Big Data 2.0

Petr Novotný

Big Data, a recent phenomenon. Everyone talks about it, but do you really know what Big Data is? Join our four-part series about Big Data and you will get answers to your questions! We will cover Introduction to Big Data and available platforms which we can use to deal with Big Data. And in the end, we are going to give you an insight into the possible future of dealing with Big Data. Spark, Flink, Presto and many others. This is just a sample of frameworks which are used in real companies and we will talk about some of them. In the previous episode of this Big Data series, we talked about the basic information concerning Big Data. This presentation, however, will be much more technical as we will be covering the most popular platforms you can use to deal with Big Data 2.0 Systems and learn about the key differences between these platforms. Let’s go! #CHEDTEB www.chedteb.eu

Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica

Databricks

This document discusses Databricks Cloud, a platform for running Apache Spark workloads that aims to accelerate time-to-results from months to days. It provides a unified platform with notebooks, dashboards, and jobs running on Spark clusters managed by Databricks. Key benefits include zero management of clusters, interactive queries and streaming for real-time insights, and the ability to develop models and visualizations in notebooks and deploy them as production jobs or dashboards without code changes. The platform is open source with no vendor lock-in and supports various data sources and third party applications. It is being used by over 3,500 organizations for applications like data preparation, analytics, and machine learning.

Big Data on azure

David Giard

The document discusses Big Data on Azure and provides an overview of HDInsight, Microsoft's Apache Hadoop-based data platform on Azure. It describes HDInsight cluster types for Hadoop, HBase, Storm and Spark and how clusters can be automatically provisioned on Azure. Example applications and demos of Storm, HBase, Hive and Spark are also presented. The document highlights key aspects of using HDInsight including storage integration and tools for interactive analysis.

Uber's data science workbench

Ran Wei

Uber has created a Data Science Workbench to improve the productivity of its data scientists by providing scalable tools, customization, and support. The Workbench provides Jupyter notebooks for interactive coding and visualization, RStudio for rapid prototyping, and Apache Spark for distributed processing. It aims to centralize infrastructure provisioning, leverage Uber's distributed backend, enable knowledge sharing and search, and integrate with Uber's data ecosystem tools. The Workbench manages Docker containers of tools like Jupyter and RStudio running on a Mesos cluster, with files stored in a shared file system. It addresses the problems of wasted time from separate infrastructures and lack of tool standardization across Uber's data science teams.

Data Science at Scale by Sarah Guido

Spark Summit

This document summarizes Sarah Guido's talk on using Apache Spark for data science at Bitly. She discusses how Bitly uses Spark to extract, explore, and model subsets of their data including decoding Bitly links, performing topic modeling using LDA, and trend detection. While Spark provides performance benefits over MapReduce for these tasks, she notes issues with Hadoop servers, JVM, and lack of documentation that must be addressed for full production usage at Bitly.

963

Annu Ahmed

This document provides an overview of real-time big data processing using Apache Kafka, Spark Streaming, Scala, and Elastic Search. It defines key concepts like big data, real-time big data, and describes technologies like Hadoop, Apache Kafka, Spark Streaming, Scala, and Elastic Search and how they can be used together for real-time big data processing. The document also provides details about each technology and how they fit into an overall real-time big data architecture.

Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...

Spark Summit

Moving at the speed of a startup often means rapid iterative development, which can lead to a patchwork of systems and processes. In the early days at Kik (one of the most popular chat apps among U.S. teens), the data team was able to move extremely quickly but often at the expense of scalable data engineering. In this session, Kik’s head of data will share the eight things they did to save time and money. The team took their data stack from a complex combination of systems and processes to a scalable, simple, and robust platform leveraging Apache Spark and Databricks to make data super easy for everyone in the company to use.

Open Source Search FTW

Grant Ingersoll

http://sigir2013.ie/industry_track.html#GrantIngersoll Abstract: Apache Lucene and Solr are the most widely deployed search technology on the planet, powering sites like Twitter, Wikipedia, Zappos and countless applications across a large array of domains. They are also free, open source, extensible and extremely scalable. Lucene and Solr also contain a large number of features for solving common information retrieval problems ranging from pluggable posting list compression and scoring algorithms to faceting and spell checking. Increasingly, Lucene and Solr also are being (ab)used to power applications going way beyond the search box. In this talk, we'll explore the features and capabilities of Lucene and Solr 4.x, as well as look at how to (ab)use your search engine technology for fun and profit.

Building Data Pipelines with Spark and StreamSets

Pat Patterson

Big data tools such as Hadoop and Spark allow you to process data at unprecedented scale, but keeping your processing engine fed can be a challenge. Metadata in upstream sources can ‘drift’ due to infrastructure, OS and application changes, causing ETL tools and hand-coded solutions to fail. StreamSets Data Collector (SDC) is an Apache 2.0 licensed open source platform for building big data ingest pipelines that allows you to design, execute and monitor robust data flows. In this session we’ll look at how SDC’s “intent-driven” approach keeps the data flowing, with a particular focus on clustered deployment with Spark and other exciting Spark integrations in the works.

Future of data visualization

hadoopsphere

Apache Zeppelin is an emerging open-source tool for data visualization that allows for interactive data analytics. It provides a web-based notebook interface that allows users to write and execute code in languages like SQL and Scala. The tool offers features like built-in visualization capabilities, pivot tables, dynamic forms, and collaboration tools. Zeppelin works with backends like Apache Spark and uses interpreters to connect to different data processing systems. It is predicted to influence big data visualization in the coming years.

What's hot (20)

Splunk's Hunk: A Powerful Way to Visualize Your Data Stored in MongoDB

Webinar: Rapid Solr Development with Fusion

BlueData Hunk Integration: Splunk Analytics for Hadoop

Hadoop world overview trends and topics

What's next for Big Data? -- Apache Spark

Hunk - Unlocking The Power of Big Data Breakout Session

Data Day Seattle 2015: Sarah Guido

Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...

4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...

Big data for bay area big data developer

Available platforms for Big Data 2.0

Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica

Big Data on azure

Uber's data science workbench

Data Science at Scale by Sarah Guido

963

Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...

Open Source Search FTW

Building Data Pipelines with Spark and StreamSets

Future of data visualization

Viewers also liked

Box + Solr = Content Search for Business

Lucidworks

Solr Anti-Patterns: Presented by Rafa�� Kuć, Sematext

Solr Anti-Patterns: Presented by Rafał Kuć, Sematext

Lucidworks

This document discusses various anti-patterns and best practices for optimizing Solr configurations and performance. It describes issues that can occur such as faulty indexing, deadlocks, and out of memory errors. It provides recommendations for updating configurations like solrconfig.xml, schema.xml, thread pools, caching, commit settings, and using bulk updates to improve indexing throughput and query performance.

Reading Metadata Between the Lines - Searching for Stories, People, Places an...

Lucidworks

The document discusses making television news metadata searchable to allow users to search for stories, people, places and other elements within news programs. It involves defining a metadata structure with attributes and tags, mapping metadata to documents and fields, interpreting search queries, and filtering results based on program metadata to provide more meaningful and powerful searches across television news content.

The Latest in Spatial & Temporal Search: Presented by David Smiley

Lucidworks

David Smiley presented on the latest developments in spatial and temporal search in Lucene and Solr. He discussed strategies for indexing and searching spatial data like polygons using approaches like RecursivePrefixTreeStrategy and SerializedDVStrategy. He also covered temporal search using approaches like date range fields and the upcoming DateRangePrefixTree. Recent contributions from students were highlighted and future work like spatial heatmaps was discussed.

Integrating Hadoop & Solr

Lucidworks

Interactively Search and Visualize Your Data: Presented by Romain Rigaux, Clo...

Lucidworks

The document describes Hue, a web application that allows users to quickly explore and visualize data stored in Apache Solr or Hadoop. It discusses Hue's architecture, which consists of a front-end that interacts with Solr through its standard REST API. The document outlines Hue's features for interactively searching, building dashboards with different field facets, and its support for enterprise configurations like LDAP authentication and security integration with Kerberos and Sentry. It concludes with a demo of using Hue to index and visualize New York City taxi trip data.

Optimizing Multilingual Search: Presented by David Troiano, Basis Technology

Lucidworks

This document discusses approaches to building a multilingual search engine where documents and queries can span multiple languages. It describes using natural language processing (NLP) pipelines to tokenize, normalize, and index documents and queries in different languages. Several approaches within Apache Solr are presented, including using separate fields or cores per language. A newer approach of applying NLP within a single multilingual field is also described. Enhancing Solr's NLP capabilities with an external tool like Rosette is suggested to improve precision, recall and performance for challenging languages like Chinese.

“N1QL” a Rich Query Language for Couchbase: Presented by Don Pinto, Couchbase

Lucidworks

Building a Solr-Driven Web Portal: Presented by Katia Muser & Ravi Mynampaty,...

Lucidworks

This document outlines a presentation about building a Solr-driven web portal. The agenda includes discussing the roadmap and architecture, and a demo. It describes how data was initially ingested through crawling pages, exporting to XML, and using a Java loader. Over time the system dealt with dirty data through business rules, added geolocation capabilities, and powered searches for more sites. Infrastructure was improved and Ultraseek was retired in favor of site-wide search using Solr.

Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...

Lucidworks

Solr Compute Cloud (SC2) is an elastic Solr infrastructure that allows for dynamic provisioning of Solr clusters on demand. This allows each search pipeline or job to have its own isolated cluster, improving stability, throughput, and cost optimization. The key benefits of SC2 are pipeline isolation, dynamic scaling, production cluster safeguards, and built-in high availability and disaster recovery features through technologies like the Solr HAFT service.

Benchmarking Solr Performance

Lucidworks

The document discusses benchmarking the performance of Apache Solr. It describes testing the indexing performance of SolrCloud clusters of varying sizes. The results show that indexing performance scales nearly linearly as nodes are added. It also discusses using the Solr Scale Toolkit, which is a set of tools for deploying, managing, and benchmarking SolrCloud clusters. Future work mentioned includes benchmarking mixed workloads and integrating chaos monkey tests.

Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...

Lucidworks

The document discusses building a large scale SEO/SEM application using Apache Solr. It describes some of the key challenges faced in indexing and searching over 40 billion records in the application's database each month. It discusses techniques used to optimize the data import process, create a distributed index across multiple tables, address out of memory errors, and improve search performance through partitioning, index optimization, and external caching.

Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target

Lucidworks

This document summarizes Target's implementation of Solr as its search platform. It discusses how Target transitioned from Oracle-Endeca to Solr to handle its large scale data and enable more flexible relevancy controls. It describes how Target tested Solr through handling live guest traffic in two sprints and moving its typeahead functionality to the public cloud. Finally, it outlines how Target leverages key Solr capabilities like collection aliases, atomic updates, and configurable facets to synchronize designer and product launches.

10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks

Lucidworks

This document outlines 10 keys to the future of Solr, an open source search platform. It discusses improving ease of use, modularity, pluggability, APIs, scale, and being more open for development. It also announces new features for Lucidworks Fusion 1.1, including additional connectors for sources like Google Drive and Couchbase. The document promotes using Solr for a variety of use cases and integrating it with other technologies for big data, distributed computing, and security.

Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...

Lucidworks

This document discusses Pearson's use of Apache Blur for distributed search and indexing of data from Kafka streams into Blur. It provides an overview of Pearson's learning platform and data architecture, describes the benefits of using Blur including its scalability, fault tolerance and query support. It also outlines the challenges of integrating Kafka streams with Blur using Spark and the solution developed to provide a reliable, low-level Kafka consumer within Spark that indexes messages from Kafka into Blur in near real-time.

Search at Twitter: Presented by Michael Busch, Twitter

Lucidworks

High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...

Lucidworks

MapQuest developed a search ahead feature for their mobile app to enable auto-complete searching across their large dataset. They used Solr and implemented various techniques to optimize performance, including custom routing, analysis during ETL, and extensive JVM tuning. Their architecture included multiple Solr clusters with different configurations. Through testing and monitoring, they were able to meet their sub-140ms response time requirement for queries.

Deduplication Using Solr: Presented by Neeraj Jain, Stubhub

Lucidworks

The document discusses StubHub's use of SOLR for deduplication. It describes the challenges of deduplicating a large event catalog in real-time. The legacy solution involved iterating over each field and document. The new approach uses SOLR for text similarity comparisons, extends its default behavior, and provides a REST interface. Sample output showing matched venues and their scores is also shown.

Viewers also liked (18)

Box + Solr = Content Search for Business

Solr Anti-Patterns: Presented by Rafał Kuć, Sematext

Reading Metadata Between the Lines - Searching for Stories, People, Places an...

The Latest in Spatial & Temporal Search: Presented by David Smiley

Integrating Hadoop & Solr

Interactively Search and Visualize Your Data: Presented by Romain Rigaux, Clo...

Optimizing Multilingual Search: Presented by David Troiano, Basis Technology

“N1QL” a Rich Query Language for Couchbase: Presented by Don Pinto, Couchbase

Building a Solr-Driven Web Portal: Presented by Katia Muser & Ravi Mynampaty,...

Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...

Benchmarking Solr Performance

Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...

Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target

10 Keys to Solr's Future: Presented by Grant Ingersoll, Lucidworks

Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...

Search at Twitter: Presented by Michael Busch, Twitter

High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...

Deduplication Using Solr: Presented by Neeraj Jain, Stubhub

Similar to This Ain't Your Parents' Search Engine

What's new in Lucene and Solr 4.x

Grant Ingersoll

Grant Ingersoll, CTO of LucidWorks, presented on new features and capabilities in Lucene 4 and Solr 4. Key highlights include major performance improvements in Lucene through optimizations like DocValues and native Near Real Time support. Solr 4 features faster indexing and querying, improved geospatial support, and enhancements to SolrCloud including transaction logging for reliability. LucidWorks is continuing to advance Lucene and Solr to provide more flexible, scalable, and robust open source search capabilities.

Big Data Retrospective - STL Big Data IDEA Jan 2019

Adam Doyle

Oracle big data discovery 994294

Edgar Alejandro Villegas

The document discusses Oracle Big Data Discovery, a product for exploring and analyzing big data stored in Hadoop. It allows users to find, explore, transform, discover and share insights from big data in a visual interface. Key features include an interactive data catalog, visualizing and exploring data attributes, powerful transformations and enrichments, composing data visualizations and projects, and collaboration tools. It aims to make data preparation only 20% of analytics projects so users can focus on analysis. The product runs natively on Hadoop clusters for scalability and integrates with the Hadoop ecosystem.

Hadoop Data Modeling

Adam Doyle

Part 3 - Modern Data Warehouse with Azure Synapse

Nilesh Gule

Building Search Engines

Anant Corporation

Hadoop and Data Science for the Enterprise (Strata & Hadoop World Conference ...

Mark Slusar

Building a data driven search application with LucidWorks SiLK

Lucidworks (Archived)

This document discusses building a data-driven log analysis application using LucidWorks SILK. It begins with an introduction to LucidWorks and discusses the continuum of search capabilities from enterprise search to big data search. It then describes how SILK can enable big data search across structured and unstructured data at massive scale. The solution components involve collecting log data from various sources using connectors, ingesting it into Solr, and building visualizations for analysis. It concludes with a demo and contact information.

CC -Unit4.pptx

Revathiparamanathan

Azure Synapse Analytics is a limitless analytics service that brings together data integration, enterprise data warehousing, and big data analytics. It provides the freedom to query data at scale using either serverless or dedicated options. Azure HDInsight allows the use of open source frameworks like Hadoop, Spark, Hive, and Kafka for processing large volumes of data. Azure Databricks offers environments for SQL, data science/engineering, and machine learning. The Azure IoT Hub enables scalable IoT solutions by allowing bidirectional communication between IoT applications and connected devices.

Webinar: Is Spark Hadoop's Friend or Foe?

Zaloni

Coping Strategies for the Death of Unlimited Storage

Globus

10 Things Learned Releasing Databricks Enterprise Wide

Databricks

Implementing tools, let alone an entire Unified Data Platform, like Databricks, can be quite the undertaking. Implementing a tool which you have not yet learned all the ins and outs of can be even more frustrating. Have you ever wished that you could take some of that uncertainty away? Four years ago, Western Governors University (WGU) took on the task of rewriting all of our ETL pipelines in Scala/Python, as well as migrating our Enterprise Data Warehouse into Delta, all on the Databricks platform. Starting with 4 users and rapidly growing to over 120 users across 8 business units, our Databricks environment turned into an entire unified platform, being used by individuals of all skill levels, data requirements, and internal security requirements. Through this process, our team has had the chance and opportunity to learn while making a lot of mistakes. Taking a look back at those mistakes, there are a lot of things we wish we had known before opening the platform to our enterprise. We would like to share with you 10 things we wish we had known before WGU started operating in our Databricks environment. Covering topics surrounding user management from both an AWS and Databricks perspective, understanding and managing costs, creating custom pipelines for efficient code management, learning about new Apache Spark snippets that helped save us a fortune, and more. We would like to provide our recommendations on how one can overcome these pitfalls to help new, current and prospective users to make their environments easier, safer, and more reliable to work in.

Sept 24 NISO Virtual Conference: Library Data in the Cloud

National Information Standards Organization (NISO)

Teradata Loom Introductory Presentation

mlang222

Teradata Loom is a software that helps users realize the full potential of their Hadoop data lakes. It provides data cataloging, profiling, and lineage tracking to help users find, understand, and prepare their data. Loom's active scanning capabilities automatically discover and profile new data. Its interactive Weaver tool allows self-service data wrangling. Loom is integrated with Hadoop and simplifies data lake management to increase analyst productivity.

Cloudera Search Webinar: Big Data Search, Bigger Insights

Cloudera, Inc.

Objectivity/DB: A Multipurpose NoSQL Database

InfiniteGraph

The speakers will describe the flexible configuration possibilities that Objectivity/DB provides, with an emphasis on how best to distribute data across multiple storage nodes. The session will start by describing the distributed processing architecture of Objectivity/DB before covering the new Placement Manager features. The speakers will also describe how Objectivity/DB compares and contrasts with other NoSQL solutions.

Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...

Cloudian

This document discusses implementing Hadoop and Elastic MapReduce on Cloudian's scale-out object storage platform. It describes Cloudian's hybrid cloud storage capabilities and how their approach reduces costs and provides faster analytics by analyzing log and event data directly on their storage platform without needing to transform the data for HDFS. Key benefits highlighted include no redundant storage, scaling analytics with storage capacity by adding nodes, and taking advantage of multi-core CPUs for MapReduce tasks.

Analyzing Hadoop Data Using Sparklyr 

Cloudera, Inc.

The document discusses how Sparklyr allows data scientists to access and work with data stored in Cloudera Enterprise using the popular RStudio IDE. It describes the challenges data scientists face in accessing secured Hadoop clusters and limitations of notebook environments. Sparklyr integration with RStudio provides a familiar environment for data scientists to access Hadoop data and compute using Spark, enabling distributed data science workflows directly in R. The presentation demonstrates how to analyze over a billion records using Spark and R through Sparklyr.

Search all the things

cyberswat

Searching for Better Code: Presented by Grant Ingersoll, Lucidworks

Lucidworks

Similar to This Ain't Your Parents' Search Engine (20)

What's new in Lucene and Solr 4.x

Big Data Retrospective - STL Big Data IDEA Jan 2019

Oracle big data discovery 994294

Hadoop Data Modeling

Part 3 - Modern Data Warehouse with Azure Synapse

Building Search Engines

Hadoop and Data Science for the Enterprise (Strata & Hadoop World Conference ...

Building a data driven search application with LucidWorks SiLK

CC -Unit4.pptx

Webinar: Is Spark Hadoop's Friend or Foe?

Coping Strategies for the Death of Unlimited Storage

10 Things Learned Releasing Databricks Enterprise Wide

Sept 24 NISO Virtual Conference: Library Data in the Cloud

Teradata Loom Introductory Presentation

Cloudera Search Webinar: Big Data Search, Bigger Insights

Objectivity/DB: A Multipurpose NoSQL Database

Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...

Analyzing Hadoop Data Using Sparklyr 

Search all the things

Searching for Better Code: Presented by Grant Ingersoll, Lucidworks

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy

Lucidworks

With ecommerce experiencing explosive growth, it seems intuitive that the B2B segment of that ecosystem is mirroring the same trajectory. That said, B2B has very different needs when it comes to transacting with the same style of experiences that we see in B2C. For instance, B2B ecommerce is about precision findability, whereas B2C customers can convert at higher rates when they’re just browsing online. In order for the B2B buying experience to be successful, search needs to be tuned to meet the unique needs of the segment. In this webinar with Forrester senior analyst Joe Cicman, you’ll learn: -Which verticals in B2B will drive the most growth, and how machine-learning powered personalization tactics can be deployed to support those specific verticals -Why an omnichannel selling approach must be deployed in order to see success in B2B -How deploying content search capabilities will support a longer sales cycle at scale -What the next steps are to support a robust B2B commerce strategy supported by new technology Speakers Joe Cicman, Senior Analyst, Forrester Jenny Gomez, VP of Marketing, Lucidworks

Drive Agent Effectiveness in Salesforce

Lucidworks

Customer loyalty starts with quickly responding to your customer’s needs. When it comes to resolving open support cases, time is of the essence. Time spent searching for answers adds up and creates inefficiencies in resolving cases at scale. Relevant answers need to be a few clicks away and easily accessible for agents directly from their service console. We will explore how Lucidworks’ Agent Insights application automatically connects agents with the correct answers and resources. You’ll learn how to: -Configure a proactive widget in an agent’s case view page to access resources across third-party systems (such as Sharepoint, Confluence, JIRA, Zendesk, and ServiceNow). -Easily set up query pipelines to autonomously route assets and resources that are relevant to the case-at-hand—directly to the right agent. -Identify subject matter experts within your support data and access tribal knowledge with lightning-fast speed.

How Crate & Barrel Connects Shoppers with Relevant Products

Lucidworks

Lunch and Learn during Retail TouchPoints #RIC21 virtual event. *** Crate & Barrel’s previous search solution couldn’t provide its shoppers with an online search and browse experience consistent with the customer-centric Crate & Barrel brand. Meanwhile, Crate & Barrel merchandisers spent the bulk of their time manually creating and maintaining search rules. The search experience impacted customer retention, loyalty, and revenue growth. Join this lunch & learn for an interactive chat on how Crate & Barrel partnered with Lucidworks to: -Improve search and browse by modernizing the technology stack with ML-based personalization and merchandising solutions -Enhance the experience for both shoppers and merchandisers -Explore signals to transform the omnichannel shopping experience Questions? Visit https://lucidworks.com/contact/

Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery

Lucidworks

Learn how to guide customers to relevant products using eCommerce search, hyper-personalisation, and recommendations in our ‘Best-In-Class Retail Product Discovery’ webinar. Nowadays, shoppers want their online experience to be engaging, inspirational and fulfilling. They want to find what they’re looking for quickly and easily. If the sought after item isn’t available, they want the next best product or content surfaced to them. They want a website to understand their goals as though they were talking to a sales assistant in person, in-store. In this webinar, we explore IMRG industry data insights and a best-in-class example of retail product discovery. You’ll learn: - How AI can drive increased revenue through hyper-personalised experiences - How user intent can be easily understood and results displayed immediately - How merchandisers can be empowered to curate results and product placement – all without having to rely on IT. Presented by: Dave Hawkins, Principal Sales Engineer - Lucidworks Matthew Walsh, Director of Data & Retail - IMRG

Connected Experiences Are Personalized Experiences

Lucidworks

Many companies claim personalization and omnichannel capabilities are top priorities. Few are able to deliver on those experiences. For a recent Lucidworks-commissioned study, Forrester Consulting surveyed 350+ global business decision-makers to see what gets in the way of achieving these goals. They discovered that inefficient technology, lack of behavioral insights, and failure to tie initiatives to enterprise-wide goals are some of the most frequent blockers to personalization success. Join guest speaker, Forrester VP and Principal Analyst, Brendan Witcher, and Lucidworks CEO, Will Hayes, to hear the results of the Forrester Consulting study, how to avoid “digital blindness,” and how to apply VoC data in real-time to delight customers with personalized experiences connected across every touchpoint. In this webinar, you’ll learn: - Why companies who utilize real-time customer signals report more effective personalization - How to connect employees and customers in a shared experience through search and browse - How Lucidworks clients Lenovo, Morgan Stanley and Red Hat fast-tracked improvements in conversion, engagement and customer satisfaction Featuring - Will Hayes, CEO, Lucidworks - Brendan Witcher, VP, Principal Analyst, Forrester

Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...

Lucidworks

Intelligent Policing. Leveraging Data to more effectively Serve Communities. Policing in the next decade is anticipated to be very different from historical methods. More data driven, more focused on the intricacies of communities they serve and more open and collaborative to make informed recommendations a reality. Whether its social populations, NIBRS or organization improvement that’s the driver, the IT requirement is largely the same. Provide 360 access to large volumes of siloed data to gain a full 360 understanding of existing connections and patterns for improved insight and recommendation. Join us for a round table discussion of how the Toronto Police Service is better serving their community through deploying a unified intelligent data platform. Data innovation improves officers' engagement with existing data and streamlines investigation workflows by enhancing collaboration. This improved visibility into existing police data allows for a more intelligent and responsive police force. In this webinar, we'll cover: -The technology needs of an intelligent police force. -How a Global Search improves an officer's interaction with existing data. Featuring: -Simon Taylor, VP, Worldwide Channels & Alliances, Lucidworks -Michael Cizmar, Managing Director, MC+A -Ian Williams, Manager of Analytics & Innovation, Toronto Police Service

[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...

Lucidworks

Policing in the next decade is anticipated to be very different from historical methods. More data driven, more focused on the intricacies of communities they serve and more open and collaborative to make informed recommendations a reality. Whether its social populations, NIBRS or organization improvement that’s the driver, the IT requirement is largely the same. Provide 360 access to large volumes of siloed data to gain a full 360 understanding of existing connections and patterns for improved insight and recommendation. Join us for a round table discussion of how the Toronto Police Service is better serving their community through deploying a unified intelligent data platform. Data innovation improves officers' engagement with existing data and streamlines investigation workflows by enhancing collaboration. This improved visibility into existing police data allows for a more intelligent and responsive police force. In this webinar, we'll cover: The technology needs of an intelligent police force. How a Global Search improves an officer's interaction with existing data. Featuring -Simon Taylor, VP, Worldwide Channels & Alliances, Lucidworks -Michael Cizmar, Managing Director, MC+A -Ian Williams, Manager of Analytics & Innovation, Toronto Police Service

Preparing for Peak in Ecommerce | eTail Asia 2020

Lucidworks

This document provides a framework for prioritizing onsite search problems and key performance indicators (KPIs) to measure for e-commerce search optimization. It recommends prioritizing fixing searches that yield no results, improving relevance of results, and reducing false positives. The most essential KPIs to measure include query latency, throughput, result relevance through click-through rates and NDCG scores. The document also provides tips for self-benchmarking search performance and examples of search performance benchmarks across nine e-commerce sites from various industries.

Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...

Lucidworks

Wish your conversion rates were higher? Can’t figure out how to efficiently and effectively serve all the visitors on your site? Embarrassed by the quality of your product discovery experience? The bar is high and the influx of online shopping over recent months has reminded us that the opportunities are real. We’re all deep in holiday prep, but let’s take a few minutes to think about January 2021 and beyond. How can we position ourselves for success with our customers and against our competition? Grab your lunch and let’s dive into three strategies that need to be part of your 2021 roadmap. You don’t need an army to get there. But you do need to take action and capitalize on the shoppers abandoning the product discovery journey on your site. In this session, attendees will find out how to: -Take control of merchandising at scale; -Implement hands-free search relevancy; and -Address personalization challenges.

AI-Powered Linguistics and Search with Fusion and Rosette

Lucidworks

For a personalized search experience, search curation requires robust text interpretation, data enrichment, relevancy tuning and recommendations. In order to achieve this, language and entity identification are crucial. For teams working on search applications, advanced language packages allow them to achieve greater recall without sacrificing precision. Join us for a guided tour of our new Advanced Linguistics packages, available in Fusion, thanks to the technology partnership between Lucidworks and Basistech. We’ll explore the application of language identification and entity extraction in the context of search, along with practical examples of personalizing search and enhancing entity extraction. In this webinar, we’ll cover: -How Fusion uses the Rosette Basic Linguistics and Entity Extraction packages -Tips for improving language identification and treatment as well as data enrichment for personalization -Speech2 demo modeling Active Recommendation -Use Rosette’s packages with Fusion Pipelines to build custom entities for specific domain use cases Featuring: -Radu Miclaus, Director of Product, AI and Cloud, Lucidworks, Lucidworks -Robert Lucarini, Senior Software Engineer, Lucidworks -Nick Belanger, Solutions Engineer, Basis Technology

The Service Industry After COVID-19: The Soul of Service in a Virtual Moment

Lucidworks

Before COVID-19, almost 80% of the US workforce worked service in jobs that involve in-person interaction with strangers. Now, leaders of service organizations must reshape their offerings during the pandemic and prepare for whatever the new normal turns out to be. Our three panelists will share ideas for adapting their service businesses, now that closer-than-six-feet isn’t an option. Join Lucidworks as we talk shop with 3 service business leaders, covering: -Common impacts of the pandemic on service businesses (and what to do about them), -How service teams can maintain a human touch across virtual channels, and -Plans for the future, before and after the pandemic subsides. Featuring -Sara Nathan, President & CEO, AMIGOS -Anthony Carruesco, Founder, AC Fly Fishing -sara bradley, chef and proprietor, freight house -Justin Sears, VP Product Marketing, Lucidworks

Webinar: Smart answers for employee and customer support after covid 19 - Europe

Lucidworks

The COVID-19 pandemic has forced companies to support far more customers and employees through digital channels than ever before. Many are turning to chatbots to help meet increasing demand, but traditional rules-based approaches can’t keep up. Our new Smart Answers add-on to Lucidworks Fusion makes existing chatbots and virtual assistants more intelligent and more valuable to the people you serve.

Smart Answers for Employee and Customer Support After COVID-19

Lucidworks

Watch our on-demand webinar showcasing Smart Answers on Lucidworks Fusion. This technology makes existing chatbots and virtual assistants more intelligent and more valuable to the people you serve. In this webinar, we’ll cover off: -How search and deep learning extend conversational frameworks for improved experiences -How Smart Answers improves customer care, call deflection, and employee self-service -A live demo of Smart Answers for multi-channel self-service support

Applying AI & Search in Europe - featuring 451 Research

Lucidworks

In the current climate, it’s now more important than ever to digitally enable your workforce and customers. Hear from Simon Taylor, VP Global Partners & Alliances, Lucidworks and Matt Aslett, Research Vice President, 451 Research to get the inside scoop on how industry leaders in Europe are developing and executing their digital transformation strategies. In this webinar, we’ll discuss: The top challenges and aspirations European business and technology leaders are solving using AI and search technology Which search and AI use cases are making the biggest impact in industries such as finance, healthcare, retail and energy in Europe What technology buyers should look for when evaluating AI and search solutions

Webinar: Accelerate Data Science with Fusion 5.1

Lucidworks

This document introduces Fusion 5.1 and its new capabilities for integrating with data science tools like Tensorflow, Scikit-Learn, and Spacy. It provides an overview of Fusion's capabilities for understanding content, users, and delivering insights at scale. The document then demonstrates Fusion's Jupyter Notebook integration for reading and writing data and running SQL queries. Finally, it shows how Fusion integrates with Seldon Core to easily deploy machine learning models with tools like Tensorflow and Scikit-Learn. A live demo is provided of deploying a custom model and using it in Fusion's query and indexing pipelines.

Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy

Lucidworks

In this webinar with 451 Research, you'll understand how retailers are using AI to predict customer intent and learn which key performance metrics are used by more than 120 online retailers in Lucidworks’ 2019 Retail Benchmark Survey. In this webinar, you’ll learn: ● What trends and opportunities are facing the ecommerce industry in 2020 ● Why search is the universal path to understanding customer intent ● How large online retailers apply AI to maximize the effectiveness of their personalization efforts

Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...

Lucidworks

Nordstrom Rack | Hautelook curates and serves customers a wide selection of on-trend apparel, accessories, and shoes at an everyday savings of up to 75 percent off regular prices. With over a million visitors shopping across different platforms every day, and a realization that customers have become accustomed to robust and personalized search interactions, Nordstrom Rack | Hautelook launched an initiative over a year ago to provide data science-driven digital experiences to their customers. In this session, we’ll discuss Nordstrom Rack | Hautelook’s journey of operationalizing a hefty strategy, optimizing a fickle infrastructure, and rallying troops around a single vision of building an expansible machine-learning driven product discovery engine. The audience will learn about: -The key technical challenges and outcomes that come with onboarding a solution -The lessons learned of creating and executing operational design -The use of Lucidworks Fusion to plug custom data science models into search and browse applications to understand user intent and deliver personalized experiences

Apply Knowledge Graphs and Search for Real-World Decision Intelligence

Lucidworks

Knowledge graphs and machine learning are on the rise as enterprises hunt for more effective ways to connect the dots between the data and the business world. With newer technologies, the digital workplace can dramatically improve employee engagement, data-driven decisions, and actions that serve tangible business objectives. In this webinar, you will learn -- Introduction to knowledge graphs and where they fit in the ML landscape -- How breakthroughs in search affect your business -- The key features to consider when choosing a data discovery platform -- Best practices for adopting AI-powered search, with real-world examples

Webinar: Building a Business Case for Enterprise Search

Lucidworks

The document discusses building a business case for enterprise search. It notes that 85% of information is unstructured data locked in various locations and applications. Many knowledge workers spend a significant portion of their day searching across multiple systems for information. The rise of unstructured data and AI capabilities can help organizations unlock value from their information assets. Effective enterprise search powered by AI can provide real-time intelligence, personalized information, and more efficient research to help knowledge workers.

Why Insight Engines Matter in 2020 and Beyond

Lucidworks

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy

Drive Agent Effectiveness in Salesforce

How Crate & Barrel Connects Shoppers with Relevant Products

Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery

Connected Experiences Are Personalized Experiences

Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...

[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...

Preparing for Peak in Ecommerce | eTail Asia 2020

Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...

AI-Powered Linguistics and Search with Fusion and Rosette

The Service Industry After COVID-19: The Soul of Service in a Virtual Moment

Webinar: Smart answers for employee and customer support after covid 19 - Europe

Smart Answers for Employee and Customer Support After COVID-19

Applying AI & Search in Europe - featuring 451 Research

Webinar: Accelerate Data Science with Fusion 5.1

Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy

Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...

Apply Knowledge Graphs and Search for Real-World Decision Intelligence

Webinar: Building a Business Case for Enterprise Search

Why Insight Engines Matter in 2020 and Beyond

Recently uploaded

"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...

Fwdays

.NET 8 brought a lot of improvements for developers and maturity to the Azure serverless container ecosystem. So, this talk will cover these changes and explain how you can apply them to your projects. Another reason for this talk is the re-invention of Serverless from a DevOps perspective as a Platform Engineering trend with Backstage and the recent Radius project from Microsoft. So now is the perfect time to look at developer productivity tooling and serverless apps from Microsoft's perspective.

How UiPath Discovery Suite supports identification of Agentic Process Automat...

DianaGray10

📚 Understand the basics of the newly persona-based LLM-powered Agentic Process Automation and discover how existing UiPath Discovery Suite products like Communication Mining, Process Mining, and Task Mining can be leveraged to identify APA candidates. Topics Covered: 💡 Idea Behind APA: Explore the innovative concept of Agentic Process Automation and its significance in modern workflows. 🔄 How APA is Different from RPA: Learn the key differences between Agentic Process Automation and Robotic Process Automation. 🚀 Discover the Advantages of APA: Uncover the unique benefits of implementing APA in your organization. 🔍 Identifying APA Candidates with UiPath Discovery Products: See how UiPath's Communication Mining, Process Mining, and Task Mining tools can help pinpoint potential APA candidates. 🔮 Discussion on Expected Future Impacts: Engage in a discussion on the potential future impacts of APA on various industries and business processes. Enhance your knowledge on the forefront of automation technology and stay ahead with Agentic Process Automation. 🧠💼✨ Speakers: Arun Kumar Asokan, Delivery Director (US) @ qBotica and UiPath MVP Naveen Chatlapalli, Solution Architect @ Ashling Partners and UiPath MVP

Top 12 AI Technology Trends For 2024.pdf

Marrie Morris

FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx

FIDO Alliance

Scaling Vector Search: How Milvus Handles Billions+

Zilliz

FIDO Munich Seminar Introduction to FIDO.pptx

FIDO Alliance

FIDO Munich Seminar FIDO Automotive Apps.pptx

FIDO Alliance

Generative AI Reasoning Tech Talk - July 2024

siddu769252

FIDO Munich Seminar In-Vehicle Payment Trends.pptx

FIDO Alliance

The Challenge of Interpretability in Generative AI Models.pdf

Sara Kroft

Navigating the intricacies of generative AI models reveals a pressing challenge: interpretability. Our blog delves into the complexities of understanding how these advanced models make decisions, shedding light on the mechanisms behind their outputs. Explore the latest research, practical implications, and ethical considerations, as we unravel the opaque processes that drive generative AI. Join us in this insightful journey to demystify the black box of artificial intelligence. Dive into the complexities of generative AI with our blog on interpretability. Find out why making AI models understandable is key to trust and ethical use and discover current efforts to tackle this big challenge.

FIDO Munich Seminar Workforce Authentication Case Study.pptx

FIDO Alliance

Generative AI technology is a fascinating field that focuses on creating comp...

Nohoax Kanont

Generative AI technology is a fascinating field that focuses on creating computer models capable of generating new, original content. It leverages the power of large language models, neural networks, and machine learning to produce content that can mimic human creativity. This technology has seen a surge in innovation and adoption since the introduction of ChatGPT in 2022, leading to significant productivity benefits across various industries. With its ability to generate text, images, video, and audio, generative AI is transforming how we interact with technology and the types of tasks that can be automated.

UiPath Community Day Amsterdam: Code, Collaborate, Connect

UiPathCommunity

Welcome to our third live UiPath Community Day Amsterdam! Come join us for a half-day of networking and UiPath Platform deep-dives, for devs and non-devs alike, in the middle of summer ☀. 📕 Agenda: 12:30 Welcome Coffee/Light Lunch ☕ 13:00 Event opening speech Ebert Knol, Managing Partner, Tacstone Technology Jonathan Smith, UiPath MVP, RPA Lead, Ciphix Cristina Vidu, Senior Marketing Manager, UiPath Community EMEA Dion Mes, Principal Sales Engineer, UiPath 13:15 ASML: RPA as Tactical Automation Tactical robotic process automation for solving short-term challenges, while establishing standard and re-usable interfaces that fit IT's long-term goals and objectives. Yannic Suurmeijer, System Architect, ASML 13:30 PostNL: an insight into RPA at PostNL Showcasing the solutions our automations have provided, the challenges we’ve faced, and the best practices we’ve developed to support our logistics operations. Leonard Renne, RPA Developer, PostNL 13:45 Break (30') 14:15 Breakout Sessions: Round 1 Modern Document Understanding in the cloud platform: AI-driven UiPath Document Understanding Mike Bos, Senior Automation Developer, Tacstone Technology Process Orchestration: scale up and have your Robots work in harmony Jon Smith, UiPath MVP, RPA Lead, Ciphix UiPath Integration Service: connect applications, leverage prebuilt connectors, and set up customer connectors Johans Brink, CTO, MvR digital workforce 15:00 Breakout Sessions: Round 2 Automation, and GenAI: practical use cases for value generation Thomas Janssen, UiPath MVP, Senior Automation Developer, Automation Heroes Human in the Loop/Action Center Dion Mes, Principal Sales Engineer @UiPath Improving development with coded workflows Idris Janszen, Technical Consultant, Ilionx 15:45 End remarks 16:00 Community fun games, sharing knowledge, drinks, and bites 🍻

FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx

FIDO Alliance

Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...

OnBoard

Indian Privacy law & Infosec for Startups

AMol NAik

The Path to General-Purpose Robots - Coatue

Razin Mustafiz

Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...

Zilliz

Enterprises have traditionally prioritized data quantity, assuming more is better for AI performance. However, a new reality is setting in: high-quality data, not just volume, is the key. This shift exposes a critical gap – many organizations struggle to understand their existing data and lack effective curation strategies and tools. This talk dives into these data challenges and explores the methods of automating data curation.

DefCamp_2016_Chemerkin_Yury_--_publish.pdf

Yury Chemerkin

"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx

Fwdays

Recently uploaded (20)

"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...

How UiPath Discovery Suite supports identification of Agentic Process Automat...

Top 12 AI Technology Trends For 2024.pdf

FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx

Scaling Vector Search: How Milvus Handles Billions+

FIDO Munich Seminar Introduction to FIDO.pptx

FIDO Munich Seminar FIDO Automotive Apps.pptx

Generative AI Reasoning Tech Talk - July 2024

FIDO Munich Seminar In-Vehicle Payment Trends.pptx

The Challenge of Interpretability in Generative AI Models.pdf

FIDO Munich Seminar Workforce Authentication Case Study.pptx

Generative AI technology is a fascinating field that focuses on creating comp...

UiPath Community Day Amsterdam: Code, Collaborate, Connect

FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx

Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...

Indian Privacy law & Infosec for Startups

The Path to General-Purpose Robots - Coatue

Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...

DefCamp_2016_Chemerkin_Yury_--_publish.pdf

"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptx

This Ain't Your Parents' Search Engine

4. Search is good for… • Traditional: Fast, fuzzy text matching across a large document collection • De-normalized data - “light” relational • Top N problems - Key-value (n=1) - Recommendations - “Good enough” classification, clustering • Faceting, aggregations, analytical slicing and dicing of data • Spatial, record/event linkage, alerting Confidential and Proprietary © Copyright 2013 http://cheezburger.com/5243950080

5. Foundational Changes in Lucene/Solr 4 •Reduced Memory usage •Pluggable Codecs/similarity •FS(A|T) •Doc Values (column oriented) •Spatial upgrade •New facets and functions •Cursors (deep paging) •Distributed capabilities •Joins/Grouping Confidential and Proprietary © Copyright 2013

6. Search + Hadoop • What’s Old is New Again • “Traditional” Use Cases: - Build/Store indexes - https://cwiki.apache.org/confluence/display/solr/ Running+Solr+on+HDFS •Enrichment and Signal processing - PageRank, Statistically Interesting Phrases, etc. Confidential and Proprietary © Copyright 2013

7. LucidWorks + Hadoop • Ingestion Help - Flexible Map-Reduce content ingestion supporting: »Directory of files »CSV, Writable, etc. »LogStash »Build Your Own • Pig Load/Store and UDFs • Hive 2-way support •http://www.lucidworks.com/search-for-hadoop/ - Open source this summer Confidential and Proprietary © Copyright 2013

8. LucidWorks SiLK Connectors Confidential and Proprietary © Copyright 2013 LucidWorks Search JDBC Connector Web/File System Crawl Data Warehouse Hadoop Connectors Clickstream Networking Data Sources Servers

9. Search Analytics—Data Ingestion & Visualization Solr/Solr Cloud Confidential and Proprietary © Copyright 2013 Gateway (Reverse Proxy) Solr Output Writer for LogStash (Http) Search Logs Visualization Configurable Dashboards Hadoop Connector LogStash GrokIngestMapper

10. LucidWorks Open Source • Logstash for Solr: https://github.com/LucidWorks/solrlogmanager • Banana (Kibana for Solr): https://github.com/LucidWorks/banana • Effortless AWS deployment and monitoring: http://www.github.com/lucidworks/solr-scale-tk • Data Quality Toolkit: https://github.com/LucidWorks/data-quality Confidential and Proprietary © Copyright 2013

13. Make $$$ • Leverage time series data and visualization using LucidWorks SiLK • Monitor Social • Traditional Research https://github.com/lucidworks/lws-financial-demo Confidential and Proprietary © Copyright 2013

15. Space-Time Continuum 15 • Leverage Solr’s spatial capabilities to index non-spatial data, such as time ranges - Useful for Open Hours, Shifts, etc. • Query using rectangle intersections - q = shift:"Intersects(0 19 23 365)” https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/ Confidential and Proprietary © Copyright 2013

16. Signal Processing for Search and Discovery • Signals power modern relevance – Clicks, conversions, sharing, history, signatures • LucidWorks 5 makes it easy to capture and leverage signals – Recommendations, analytics, discovery • Simplifies your data workflow • Simplify your operational footprint Confidential and Proprietary © Copyright 2013

18. Meta • http://www.lucidworks.com – grant@lucidworks.com – @gsingers • Sales – Steve Drane (based here in Chicago) – steve.drane@lucidworks.com • Lucene/Solr Revolution – Washington DC, Nov 11-14 – http://www.lucenerevolution.org Confidential and Proprietary © Copyright 2013

Editor's Notes

I chose LogStash for data transformation and import for two reasons: It provides a powerful framework for extracting, grokking and transforming log data into a structured format that Solr can consume and that SILK can use for dashboards. LucidWorks’ Hadoop Connectors have a GrokIngestMapper that allows me to reuse the same LogStash Filters to work with larger volumes of files on HDFS (more details on this in a future article).
Highlights: Joins, stats, pivot faceting
http://localhost:3334/#/dashboard/solr/Trading Time series, joins
TARDIS: http://2.bp.blogspot.com/-ysN8JskY4WM/UEZNhBywQKI/AAAAAAAABdg/gXE0A9OO6Mk/s1600/13881_doctor_who.jpg Work under way to formalize
but not as a search engine for content more like a search engine for behavior

This Ain't Your Parents' Search Engine

Related slideshows

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to This Ain't Your Parents' Search Engine

Similar to This Ain't Your Parents' Search Engine (20)

More from Lucidworks

More from Lucidworks (20)

Recently uploaded

Recently uploaded (20)

This Ain't Your Parents' Search Engine

Editor's Notes