This document discusses how search has evolved beyond simple text matching to include features like faceting, aggregations, and spatial search. It summarizes new capabilities in Apache Lucene and Solr like reduced memory usage, pluggable codecs, and distributed capabilities. The document also describes how LucidWorks provides tools to integrate search with Hadoop, including connectors, ingestion helpers, and open source projects like Logstash for Solr. Finally, it advertises LucidWorks products and services that leverage signals from user interactions to power recommendations, analytics, and discovery.
Splunk's Hunk: A Powerful Way to Visualize Your Data Stored in MongoDBMongoDB
This document discusses Splunk Hunk, which enables users to combine time series event data stored in MongoDB with Splunk's data visualization and search capabilities. It provides an overview of Splunk Hunk's components and architecture, describes how to install and configure the MongoDB virtual index app to integrate MongoDB data with Splunk, and demonstrates how to query and analyze MongoDB data using Splunk.
Webinar: Rapid Solr Development with FusionLucidworks
The document discusses Lucidworks Fusion, a platform that enables rapid development of search applications using Apache Solr. It provides concise summaries of key points about Lucidworks' contributions to Solr, the features and support levels of Fusion and Solr Enterprise, the architecture of Fusion, new connectors in version 1.3 of Fusion, and instructions for downloading and starting a demo of Fusion.
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData, Inc.
Hunk is a Splunk analytics tool that allows users to explore, analyze, and visualize raw big data stored in Hadoop and NoSQL data stores. It can interactively query raw data, accelerate reporting, create charts and dashboards, and archive historical data to HDFS. BlueData's EPIC platform enables running Hunk jobs on Hadoop clusters while accessing data from any storage system, such as HDFS, NFS, Gluster, and others. Hunk supports ingesting large amounts of data and provides pre-packaged analytics functions and intuitive visualization of results.
Valentyn Kropov, Big Data Solutions Architect has recently attended "Hadoop World / Strata" – biggest and coolest Big Data conference in a World, and he can't wait to share fresh trends and topics straight from New-York. Come and learn how Hadoop cluster will help NASA to explore Mars, how Netflix build 10PB platform, what are the latest trends in Spark, to learn about newest, just announced storage engine from Cloudera called Kudu and many many more interesting stuff.
Michael Cutler (CTO cofounder of TUMRA) provides a high-level introduction to Apache Spark in a presentation given at ‘Big Data Week 2014’ #BDW14 held at University College London.
TUMRA were early adopters of Spark after a brief PoC in Dec ‘12 and took it to production just a few months later. The main motivation to do so was the inflexibility and high-latency of Hadoop Map/Reduce jobs and the knock-on effect for technology that utilises it (Mahout machine learning, Hive data warehousing, Cascading).
With two primary uses case ‘Ecommerce Personalisation’ and ‘Marketing Automation’ TUMRA are currently flowing around 29 million ‘user engagement events’ (JSON) each day through Apache Kafka and Spark Streaming at peak rates of up to 800 events per second.
TUMRA use Apache Spark on Amazon Web Services (EC2) in production for a mix of machine learning model building, graph analytics and near-real-time reporting.
To learn more about how we use Spark and the services we can deliver through our Platform please contact: hello@tumra.com
Hunk - Unlocking The Power of Big Data Breakout SessionSplunk
This document discusses Splunk's Hunk product and how it allows users to analyze data stored in Hadoop using Splunk. Hunk runs natively in Hadoop using MapReduce, supports mixed mode searching that allows previewing data, and auto-deploys Splunk components to Hadoop data nodes for real-time indexing. It also provides role-based security and supports connecting to data in NoSQL databases and SQL databases through Splunk's DB Connect product.
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Lucidworks
This document discusses how Walmart uses Apache Solr as a "not-so-evil twin" to complement their source of truth database and help scale their data infrastructure. It describes how Walmart abstracts the complexity of managing databases, caches, search queries, and messaging to provide scalable querying across database shards. The use of Solr has allowed Walmart to offload queries, recurring reads, analytics
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...PROIDEA
Według szacunków do 2020 roku wygenerujmy 40 Zetta byte’ów, a do roku 2025 aż 163 Zetta byte’ów różnego rodzaju danych, a ich dokładna analiza ACpozwali na odkrywanie nowych zjawisk, optymalizacje procesów, czy wspomaganie procesów decyzyjnych. Aby efektywnie przetwarzać tak duże zbiory danych potrzebujemy nowych technik analizy danych oraz innowacyjnych rozwiązań technologicznych. Ważną role pełni tutaj chmura Azure, która oferuje szereg usług, przy użyciu których możemy tworzyć rozwiązania na potrzeby przetwarzania Big Data zarówno w trybie batch’owych jak i ‘near real time’. Podczas sesji stworzymy przykładowe rozwiązanie przetwarzania Big Data oparte o architekturę Lambda , z wykorzystaniem usług platformy Azure, takich jak Azure Data Factory, Azure Stream Analytics, Azure HdInsight, Azure Event (IoT) Hub, czy Azure Data Lake.
Big data for bay area big data developer19scottmiller
The document announces a 3-day Big Data Developer Conference to take place from July 15-17 at the Santa Clara Convention Center. The conference will provide extensive workshops and technical talks on various Big Data technologies like Spark, Hadoop, MongoDB, Neo4J, Cassandra, and data analytics tools. Engineers, developers, managers and students involved in Big Data are encouraged to register and attend the conference.
Big Data, a recent phenomenon. Everyone talks about it, but do you really know what Big Data is? Join our four-part series about Big Data and you will get answers to your questions!
We will cover Introduction to Big Data and available platforms which we can use to deal with Big Data. And in the end, we are going to give you an insight into the possible future of dealing with Big Data.
Spark, Flink, Presto and many others. This is just a sample of frameworks which are used in real companies and we will talk about some of them.
In the previous episode of this Big Data series, we talked about the basic information concerning Big Data. This presentation, however, will be much more technical as we will be covering the most popular platforms you can use to deal with Big Data 2.0 Systems and learn about the key differences between these platforms. Let’s go!
#CHEDTEB
www.chedteb.eu
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaDatabricks
This document discusses Databricks Cloud, a platform for running Apache Spark workloads that aims to accelerate time-to-results from months to days. It provides a unified platform with notebooks, dashboards, and jobs running on Spark clusters managed by Databricks. Key benefits include zero management of clusters, interactive queries and streaming for real-time insights, and the ability to develop models and visualizations in notebooks and deploy them as production jobs or dashboards without code changes. The platform is open source with no vendor lock-in and supports various data sources and third party applications. It is being used by over 3,500 organizations for applications like data preparation, analytics, and machine learning.
The document discusses Big Data on Azure and provides an overview of HDInsight, Microsoft's Apache Hadoop-based data platform on Azure. It describes HDInsight cluster types for Hadoop, HBase, Storm and Spark and how clusters can be automatically provisioned on Azure. Example applications and demos of Storm, HBase, Hive and Spark are also presented. The document highlights key aspects of using HDInsight including storage integration and tools for interactive analysis.
Uber has created a Data Science Workbench to improve the productivity of its data scientists by providing scalable tools, customization, and support. The Workbench provides Jupyter notebooks for interactive coding and visualization, RStudio for rapid prototyping, and Apache Spark for distributed processing. It aims to centralize infrastructure provisioning, leverage Uber's distributed backend, enable knowledge sharing and search, and integrate with Uber's data ecosystem tools. The Workbench manages Docker containers of tools like Jupyter and RStudio running on a Mesos cluster, with files stored in a shared file system. It addresses the problems of wasted time from separate infrastructures and lack of tool standardization across Uber's data science teams.
This document summarizes Sarah Guido's talk on using Apache Spark for data science at Bitly. She discusses how Bitly uses Spark to extract, explore, and model subsets of their data including decoding Bitly links, performing topic modeling using LDA, and trend detection. While Spark provides performance benefits over MapReduce for these tasks, she notes issues with Hadoop servers, JVM, and lack of documentation that must be addressed for full production usage at Bitly.
This document provides an overview of real-time big data processing using Apache Kafka, Spark Streaming, Scala, and Elastic Search. It defines key concepts like big data, real-time big data, and describes technologies like Hadoop, Apache Kafka, Spark Streaming, Scala, and Elastic Search and how they can be used together for real-time big data processing. The document also provides details about each technology and how they fit into an overall real-time big data architecture.
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...Spark Summit
Moving at the speed of a startup often means rapid iterative development, which can lead to a patchwork of systems and processes. In the early days at Kik (one of the most popular chat apps among U.S. teens), the data team was able to move extremely quickly but often at the expense of scalable data engineering. In this session, Kik’s head of data will share the eight things they did to save time and money. The team took their data stack from a complex combination of systems and processes to a scalable, simple, and robust platform leveraging Apache Spark and Databricks to make data super easy for everyone in the company to use.
http://sigir2013.ie/industry_track.html#GrantIngersoll
Abstract: Apache Lucene and Solr are the most widely deployed search technology on the planet, powering sites like Twitter, Wikipedia, Zappos and countless applications across a large array of domains. They are also free, open source, extensible and extremely scalable. Lucene and Solr also contain a large number of features for solving common information retrieval problems ranging from pluggable posting list compression and scoring algorithms to faceting and spell checking. Increasingly, Lucene and Solr also are being (ab)used to power applications going way beyond the search box. In this talk, we'll explore the features and capabilities of Lucene and Solr 4.x, as well as look at how to (ab)use your search engine technology for fun and profit.
Building Data Pipelines with Spark and StreamSetsPat Patterson
Big data tools such as Hadoop and Spark allow you to process data at unprecedented scale, but keeping your processing engine fed can be a challenge. Metadata in upstream sources can ‘drift’ due to infrastructure, OS and application changes, causing ETL tools and hand-coded solutions to fail. StreamSets Data Collector (SDC) is an Apache 2.0 licensed open source platform for building big data ingest pipelines that allows you to design, execute and monitor robust data flows. In this session we’ll look at how SDC’s “intent-driven” approach keeps the data flowing, with a particular focus on clustered deployment with Spark and other exciting Spark integrations in the works.
Apache Zeppelin is an emerging open-source tool for data visualization that allows for interactive data analytics. It provides a web-based notebook interface that allows users to write and execute code in languages like SQL and Scala. The tool offers features like built-in visualization capabilities, pivot tables, dynamic forms, and collaboration tools. Zeppelin works with backends like Apache Spark and uses interpreters to connect to different data processing systems. It is predicted to influence big data visualization in the coming years.
Box + Solr = Content Search for BusinessLucidworks
This document discusses how Box uses Apache Solr for content search capabilities. It summarizes that Box has over 25 million users and indexes over 10 trillion documents totaling over 10 terabytes in its Solr index. It also discusses how Box shards its Solr index across multiple servers for high availability and scalability, and how it handles search scope and permissions across shared folders.
Solr Anti-Patterns: Presented by Rafał Kuć, SematextLucidworks
This document discusses various anti-patterns and best practices for optimizing Solr configurations and performance. It describes issues that can occur such as faulty indexing, deadlocks, and out of memory errors. It provides recommendations for updating configurations like solrconfig.xml, schema.xml, thread pools, caching, commit settings, and using bulk updates to improve indexing throughput and query performance.
Reading Metadata Between the Lines - Searching for Stories, People, Places an...Lucidworks
The document discusses making television news metadata searchable to allow users to search for stories, people, places and other elements within news programs. It involves defining a metadata structure with attributes and tags, mapping metadata to documents and fields, interpreting search queries, and filtering results based on program metadata to provide more meaningful and powerful searches across television news content.
The Latest in Spatial & Temporal Search: Presented by David SmileyLucidworks
David Smiley presented on the latest developments in spatial and temporal search in Lucene and Solr. He discussed strategies for indexing and searching spatial data like polygons using approaches like RecursivePrefixTreeStrategy and SerializedDVStrategy. He also covered temporal search using approaches like date range fields and the upcoming DateRangePrefixTree. Recent contributions from students were highlighted and future work like spatial heatmaps was discussed.
This document discusses integrating Hadoop and Solr. Hadoop is useful for storing and processing large amounts of data, while Solr enables fast search across structured and unstructured data. The document outlines how Hadoop can store documents and Solr can index them for search, as well as how technologies like Flume can process streaming data and index it in real-time in Solr.
Interactively Search and Visualize Your Data: Presented by Romain Rigaux, Clo...Lucidworks
The document describes Hue, a web application that allows users to quickly explore and visualize data stored in Apache Solr or Hadoop. It discusses Hue's architecture, which consists of a front-end that interacts with Solr through its standard REST API. The document outlines Hue's features for interactively searching, building dashboards with different field facets, and its support for enterprise configurations like LDAP authentication and security integration with Kerberos and Sentry. It concludes with a demo of using Hue to index and visualize New York City taxi trip data.
Optimizing Multilingual Search: Presented by David Troiano, Basis TechnologyLucidworks
This document discusses approaches to building a multilingual search engine where documents and queries can span multiple languages. It describes using natural language processing (NLP) pipelines to tokenize, normalize, and index documents and queries in different languages. Several approaches within Apache Solr are presented, including using separate fields or cores per language. A newer approach of applying NLP within a single multilingual field is also described. Enhancing Solr's NLP capabilities with an external tool like Rosette is suggested to improve precision, recall and performance for challenging languages like Chinese.
“N1QL” a Rich Query Language for Couchbase: Presented by Don Pinto, CouchbaseLucidworks
The document discusses N1QL, Couchbase's query language for working with rich data. It provides an overview of N1QL's features for querying, indexing, and working with distributed and nested data models. These features allow developers to easily work with complex, real-world data while leveraging the power of Couchbase at scale.
Building a Solr-Driven Web Portal: Presented by Katia Muser & Ravi Mynampaty,...Lucidworks
This document outlines a presentation about building a Solr-driven web portal. The agenda includes discussing the roadmap and architecture, and a demo. It describes how data was initially ingested through crawling pages, exporting to XML, and using a Java loader. Over time the system dealt with dirty data through business rules, added geolocation capabilities, and powered searches for more sites. Infrastructure was improved and Ultraseek was retired in favor of site-wide search using Solr.
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Lucidworks
Solr Compute Cloud (SC2) is an elastic Solr infrastructure that allows for dynamic provisioning of Solr clusters on demand. This allows each search pipeline or job to have its own isolated cluster, improving stability, throughput, and cost optimization. The key benefits of SC2 are pipeline isolation, dynamic scaling, production cluster safeguards, and built-in high availability and disaster recovery features through technologies like the Solr HAFT service.
The document discusses benchmarking the performance of Apache Solr. It describes testing the indexing performance of SolrCloud clusters of varying sizes. The results show that indexing performance scales nearly linearly as nodes are added. It also discusses using the Solr Scale Toolkit, which is a set of tools for deploying, managing, and benchmarking SolrCloud clusters. Future work mentioned includes benchmarking mixed workloads and integrating chaos monkey tests.
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
The document discusses building a large scale SEO/SEM application using Apache Solr. It describes some of the key challenges faced in indexing and searching over 40 billion records in the application's database each month. It discusses techniques used to optimize the data import process, create a distributed index across multiple tables, address out of memory errors, and improve search performance through partitioning, index optimization, and external caching.
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetLucidworks
This document summarizes Target's implementation of Solr as its search platform. It discusses how Target transitioned from Oracle-Endeca to Solr to handle its large scale data and enable more flexible relevancy controls. It describes how Target tested Solr through handling live guest traffic in two sprints and moving its typeahead functionality to the public cloud. Finally, it outlines how Target leverages key Solr capabilities like collection aliases, atomic updates, and configurable facets to synchronize designer and product launches.
10 Keys to Solr's Future: Presented by Grant Ingersoll, LucidworksLucidworks
This document outlines 10 keys to the future of Solr, an open source search platform. It discusses improving ease of use, modularity, pluggability, APIs, scale, and being more open for development. It also announces new features for Lucidworks Fusion 1.1, including additional connectors for sources like Google Drive and Couchbase. The document promotes using Solr for a variety of use cases and integrating it with other technologies for big data, distributed computing, and security.
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
This document discusses Pearson's use of Apache Blur for distributed search and indexing of data from Kafka streams into Blur. It provides an overview of Pearson's learning platform and data architecture, describes the benefits of using Blur including its scalability, fault tolerance and query support. It also outlines the challenges of integrating Kafka streams with Blur using Spark and the solution developed to provide a reliable, low-level Kafka consumer within Spark that indexes messages from Kafka into Blur in near real-time.
Search at Twitter: Presented by Michael Busch, TwitterLucidworks
Twitter processes over 500 million tweets per day and more than 2 billion search queries per day. The company uses a search architecture based on Lucene with custom extensions. This includes an in-memory real-time index optimized for concurrency without locks, and a schema-based document factory. Future work includes support for parallel index segments and additional Lucene features.
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...Lucidworks
MapQuest developed a search ahead feature for their mobile app to enable auto-complete searching across their large dataset. They used Solr and implemented various techniques to optimize performance, including custom routing, analysis during ETL, and extensive JVM tuning. Their architecture included multiple Solr clusters with different configurations. Through testing and monitoring, they were able to meet their sub-140ms response time requirement for queries.
Deduplication Using Solr: Presented by Neeraj Jain, StubhubLucidworks
The document discusses StubHub's use of SOLR for deduplication. It describes the challenges of deduplicating a large event catalog in real-time. The legacy solution involved iterating over each field and document. The new approach uses SOLR for text similarity comparisons, extends its default behavior, and provides a REST interface. Sample output showing matched venues and their scores is also shown.
Grant Ingersoll, CTO of LucidWorks, presented on new features and capabilities in Lucene 4 and Solr 4. Key highlights include major performance improvements in Lucene through optimizations like DocValues and native Near Real Time support. Solr 4 features faster indexing and querying, improved geospatial support, and enhancements to SolrCloud including transaction logging for reliability. LucidWorks is continuing to advance Lucene and Solr to provide more flexible, scalable, and robust open source search capabilities.
Big Data Retrospective - STL Big Data IDEA Jan 2019Adam Doyle
Slides from the STL Big Data IDEA meeting from January 2019. The presenters discussed technologies to continue using, stop using, and start using in 2019.
The document discusses Oracle Big Data Discovery, a product for exploring and analyzing big data stored in Hadoop. It allows users to find, explore, transform, discover and share insights from big data in a visual interface. Key features include an interactive data catalog, visualizing and exploring data attributes, powerful transformations and enrichments, composing data visualizations and projects, and collaboration tools. It aims to make data preparation only 20% of analytics projects so users can focus on analysis. The product runs natively on Hadoop clusters for scalability and integrates with the Hadoop ecosystem.
Slides from May 2018 St. Louis Big Data Innovations, Data Engineering, and Analytics User Group meeting. The presentation focused on Data Modeling in Hive.
Part 3 - Modern Data Warehouse with Azure SynapseNilesh Gule
Slide deck of the third part of building Modern Data Warehouse using Azure. This session covered Azure Synapse, formerly SQL Data Warehouse. We look at the Azure Synapse Architecture, external files, integration with Azuer Data Factory.
The recording of the session is available on YouTube
https://www.youtube.com/watch?v=LZlu6_rFzm8&WT.mc_id=DP-MVP-5003170
Search Engines use web search queries to collect information and present it to the user. How do you go about building a search engine in the first place?
This document discusses building a data-driven log analysis application using LucidWorks SILK. It begins with an introduction to LucidWorks and discusses the continuum of search capabilities from enterprise search to big data search. It then describes how SILK can enable big data search across structured and unstructured data at massive scale. The solution components involve collecting log data from various sources using connectors, ingesting it into Solr, and building visualizations for analysis. It concludes with a demo and contact information.
Azure Synapse Analytics is a limitless analytics service that brings together data integration, enterprise data warehousing, and big data analytics. It provides the freedom to query data at scale using either serverless or dedicated options. Azure HDInsight allows the use of open source frameworks like Hadoop, Spark, Hive, and Kafka for processing large volumes of data. Azure Databricks offers environments for SQL, data science/engineering, and machine learning. The Azure IoT Hub enables scalable IoT solutions by allowing bidirectional communication between IoT applications and connected devices.
Recently, there's been discussion, even some confusion, around the relationship between Hadoop and Spark. Although they're both big data frameworks with many similarities, they are not one in the same - and are in fact complimentary in an enterprise environment.
View the webinar replay here: http://info.zaloni.com/spark-hadoops-friend-or-foe
Coping Strategies for the Death of Unlimited StorageGlobus
Presented at GlobusWorld 2022 by a set of panelists moderated by Bob Flynn from Internet2. Panelists offer their perspectives on migrating between cloud storage providers.
10 Things Learned Releasing Databricks Enterprise WideDatabricks
Implementing tools, let alone an entire Unified Data Platform, like Databricks, can be quite the undertaking. Implementing a tool which you have not yet learned all the ins and outs of can be even more frustrating. Have you ever wished that you could take some of that uncertainty away? Four years ago, Western Governors University (WGU) took on the task of rewriting all of our ETL pipelines in Scala/Python, as well as migrating our Enterprise Data Warehouse into Delta, all on the Databricks platform. Starting with 4 users and rapidly growing to over 120 users across 8 business units, our Databricks environment turned into an entire unified platform, being used by individuals of all skill levels, data requirements, and internal security requirements.
Through this process, our team has had the chance and opportunity to learn while making a lot of mistakes. Taking a look back at those mistakes, there are a lot of things we wish we had known before opening the platform to our enterprise.
We would like to share with you 10 things we wish we had known before WGU started operating in our Databricks environment. Covering topics surrounding user management from both an AWS and Databricks perspective, understanding and managing costs, creating custom pipelines for efficient code management, learning about new Apache Spark snippets that helped save us a fortune, and more. We would like to provide our recommendations on how one can overcome these pitfalls to help new, current and prospective users to make their environments easier, safer, and more reliable to work in.
Teradata Loom is a software that helps users realize the full potential of their Hadoop data lakes. It provides data cataloging, profiling, and lineage tracking to help users find, understand, and prepare their data. Loom's active scanning capabilities automatically discover and profile new data. Its interactive Weaver tool allows self-service data wrangling. Loom is integrated with Hadoop and simplifies data lake management to increase analyst productivity.
Cloudera Search Webinar: Big Data Search, Bigger InsightsCloudera, Inc.
Cloudera Search brings full-text, interactive search and scalable indexing to data in HDFS and Apache HBase. Powered by and adding to Apache Solr, Cloudera Search fully integrates with CDH to bring scale and reliability for next-generation open source search -- Big Data search.
Objectivity/DB: A Multipurpose NoSQL DatabaseInfiniteGraph
The speakers will describe the flexible configuration possibilities that Objectivity/DB provides, with an emphasis on how best to distribute data across multiple storage nodes. The session will start by describing the distributed processing architecture of Objectivity/DB before covering the new Placement Manager features. The speakers will also describe how Objectivity/DB compares and contrasts with other NoSQL solutions.
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Cloudian
This document discusses implementing Hadoop and Elastic MapReduce on Cloudian's scale-out object storage platform. It describes Cloudian's hybrid cloud storage capabilities and how their approach reduces costs and provides faster analytics by analyzing log and event data directly on their storage platform without needing to transform the data for HDFS. Key benefits highlighted include no redundant storage, scaling analytics with storage capacity by adding nodes, and taking advantage of multi-core CPUs for MapReduce tasks.
The document discusses how Sparklyr allows data scientists to access and work with data stored in Cloudera Enterprise using the popular RStudio IDE. It describes the challenges data scientists face in accessing secured Hadoop clusters and limitations of notebook environments. Sparklyr integration with RStudio provides a familiar environment for data scientists to access Hadoop data and compute using Spark, enabling distributed data science workflows directly in R. The presentation demonstrates how to analyze over a billion records using Spark and R through Sparklyr.
This outlines a 24 hackathon project at Acquia that addresses combining generated api documentation and docs from github hosted resources into a single indexeable interface managed by Solr and Drupal.
Searching for Better Code: Presented by Grant Ingersoll, LucidworksLucidworks
The document discusses Lucidworks' Fusion product, which is a search platform that enhances Apache Solr. It provides connectors to various data sources, integrated ETL pipelines, built-in recommendations, and security features. The document outlines Fusion's architecture, demo use cases for basic and code search, and next steps for integrating additional analysis tools like OpenGrok.
Similar to This Ain't Your Parents' Search Engine (20)
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
With ecommerce experiencing explosive growth, it seems intuitive that the B2B segment of that ecosystem is mirroring the same trajectory. That said, B2B has very different needs when it comes to transacting with the same style of experiences that we see in B2C. For instance, B2B ecommerce is about precision findability, whereas B2C customers can convert at higher rates when they’re just browsing online. In order for the B2B buying experience to be successful, search needs to be tuned to meet the unique needs of the segment.
In this webinar with Forrester senior analyst Joe Cicman, you’ll learn:
-Which verticals in B2B will drive the most growth, and how machine-learning powered personalization tactics can be deployed to support those specific verticals
-Why an omnichannel selling approach must be deployed in order to see success in B2B
-How deploying content search capabilities will support a longer sales cycle at scale
-What the next steps are to support a robust B2B commerce strategy supported by new technology
Speakers
Joe Cicman, Senior Analyst, Forrester
Jenny Gomez, VP of Marketing, Lucidworks
Customer loyalty starts with quickly responding to your customer’s needs. When it comes to resolving open support cases, time is of the essence. Time spent searching for answers adds up and creates inefficiencies in resolving cases at scale. Relevant answers need to be a few clicks away and easily accessible for agents directly from their service console.
We will explore how Lucidworks’ Agent Insights application automatically connects agents with the correct answers and resources. You’ll learn how to:
-Configure a proactive widget in an agent’s case view page to access resources across third-party systems (such as Sharepoint, Confluence, JIRA, Zendesk, and ServiceNow).
-Easily set up query pipelines to autonomously route assets and resources that are relevant to the case-at-hand—directly to the right agent.
-Identify subject matter experts within your support data and access tribal knowledge with lightning-fast speed.
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
Lunch and Learn during Retail TouchPoints #RIC21 virtual event.
***
Crate & Barrel’s previous search solution couldn’t provide its shoppers with an online search and browse experience consistent with the customer-centric Crate & Barrel brand. Meanwhile, Crate & Barrel merchandisers spent the bulk of their time manually creating and maintaining search rules. The search experience impacted customer retention, loyalty, and revenue growth.
Join this lunch & learn for an interactive chat on how Crate & Barrel partnered with Lucidworks to:
-Improve search and browse by modernizing the technology stack with ML-based personalization and merchandising solutions
-Enhance the experience for both shoppers and merchandisers
-Explore signals to transform the omnichannel shopping experience
Questions? Visit https://lucidworks.com/contact/
Learn how to guide customers to relevant products using eCommerce search, hyper-personalisation, and recommendations in our ‘Best-In-Class Retail Product Discovery’ webinar.
Nowadays, shoppers want their online experience to be engaging, inspirational and fulfilling. They want to find what they’re looking for quickly and easily. If the sought after item isn’t available, they want the next best product or content surfaced to them. They want a website to understand their goals as though they were talking to a sales assistant in person, in-store.
In this webinar, we explore IMRG industry data insights and a best-in-class example of retail product discovery. You’ll learn:
- How AI can drive increased revenue through hyper-personalised experiences
- How user intent can be easily understood and results displayed immediately
- How merchandisers can be empowered to curate results and product placement – all without having to rely on IT.
Presented by:
Dave Hawkins, Principal Sales Engineer - Lucidworks
Matthew Walsh, Director of Data & Retail - IMRG
Connected Experiences Are Personalized ExperiencesLucidworks
Many companies claim personalization and omnichannel capabilities are top priorities. Few are able to deliver on those experiences.
For a recent Lucidworks-commissioned study, Forrester Consulting surveyed 350+ global business decision-makers to see what gets in the way of achieving these goals. They discovered that inefficient technology, lack of behavioral insights, and failure to tie initiatives to enterprise-wide goals are some of the most frequent blockers to personalization success.
Join guest speaker, Forrester VP and Principal Analyst, Brendan Witcher, and Lucidworks CEO, Will Hayes, to hear the results of the Forrester Consulting study, how to avoid “digital blindness,” and how to apply VoC data in real-time to delight customers with personalized experiences connected across every touchpoint.
In this webinar, you’ll learn:
- Why companies who utilize real-time customer signals report more effective personalization
- How to connect employees and customers in a shared experience through search and browse
- How Lucidworks clients Lenovo, Morgan Stanley and Red Hat fast-tracked improvements in conversion, engagement and customer satisfaction
Featuring
- Will Hayes, CEO, Lucidworks
- Brendan Witcher, VP, Principal Analyst, Forrester
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
Intelligent Policing. Leveraging Data to more effectively Serve Communities.
Policing in the next decade is anticipated to be very different from historical methods. More data driven, more focused on the intricacies of communities they serve and more open and collaborative to make informed recommendations a reality. Whether its social populations, NIBRS or organization improvement that’s the driver, the IT requirement is largely the same. Provide 360 access to large volumes of siloed data to gain a full 360 understanding of existing connections and patterns for improved insight and recommendation.
Join us for a round table discussion of how the Toronto Police Service is better serving their community through deploying a unified intelligent data platform.
Data innovation improves officers' engagement with existing data and streamlines investigation workflows by enhancing collaboration. This improved visibility into existing police data allows for a more intelligent and responsive police force.
In this webinar, we'll cover:
-The technology needs of an intelligent police force.
-How a Global Search improves an officer's interaction with existing data.
Featuring:
-Simon Taylor, VP, Worldwide Channels & Alliances, Lucidworks
-Michael Cizmar, Managing Director, MC+A
-Ian Williams, Manager of Analytics & Innovation, Toronto Police Service
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
Policing in the next decade is anticipated to be very different from historical methods. More data driven, more focused on the intricacies of communities they serve and more open and collaborative to make informed recommendations a reality. Whether its social populations, NIBRS or organization improvement that’s the driver, the IT requirement is largely the same. Provide 360 access to large volumes of siloed data to gain a full 360 understanding of existing connections and patterns for improved insight and recommendation.
Join us for a round table discussion of how the Toronto Police Service is better serving their community through deploying a unified intelligent data platform.
Data innovation improves officers' engagement with existing data and streamlines investigation workflows by enhancing collaboration. This improved visibility into existing police data allows for a more intelligent and responsive police force.
In this webinar, we'll cover:
The technology needs of an intelligent police force.
How a Global Search improves an officer's interaction with existing data.
Featuring
-Simon Taylor, VP, Worldwide Channels & Alliances, Lucidworks
-Michael Cizmar, Managing Director, MC+A
-Ian Williams, Manager of Analytics & Innovation, Toronto Police Service
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
This document provides a framework for prioritizing onsite search problems and key performance indicators (KPIs) to measure for e-commerce search optimization. It recommends prioritizing fixing searches that yield no results, improving relevance of results, and reducing false positives. The most essential KPIs to measure include query latency, throughput, result relevance through click-through rates and NDCG scores. The document also provides tips for self-benchmarking search performance and examples of search performance benchmarks across nine e-commerce sites from various industries.
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
Wish your conversion rates were higher? Can’t figure out how to efficiently and effectively serve all the visitors on your site? Embarrassed by the quality of your product discovery experience? The bar is high and the influx of online shopping over recent months has reminded us that the opportunities are real. We’re all deep in holiday prep, but let’s take a few minutes to think about January 2021 and beyond. How can we position ourselves for success with our customers and against our competition?
Grab your lunch and let’s dive into three strategies that need to be part of your 2021 roadmap. You don’t need an army to get there. But you do need to take action and capitalize on the shoppers abandoning the product discovery journey on your site.
In this session, attendees will find out how to:
-Take control of merchandising at scale;
-Implement hands-free search relevancy; and
-Address personalization challenges.
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
For a personalized search experience, search curation requires robust text interpretation, data enrichment, relevancy tuning and recommendations. In order to achieve this, language and entity identification are crucial.
For teams working on search applications, advanced language packages allow them to achieve greater recall without sacrificing precision.
Join us for a guided tour of our new Advanced Linguistics packages, available in Fusion, thanks to the technology partnership between Lucidworks and Basistech.
We’ll explore the application of language identification and entity extraction in the context of search, along with practical examples of personalizing search and enhancing entity extraction.
In this webinar, we’ll cover:
-How Fusion uses the Rosette Basic Linguistics and Entity Extraction packages
-Tips for improving language identification and treatment as well as data enrichment for personalization
-Speech2 demo modeling Active Recommendation
-Use Rosette’s packages with Fusion Pipelines to build custom entities for specific domain use cases
Featuring:
-Radu Miclaus, Director of Product, AI and Cloud, Lucidworks, Lucidworks
-Robert Lucarini, Senior Software Engineer, Lucidworks
-Nick Belanger, Solutions Engineer, Basis Technology
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
Before COVID-19, almost 80% of the US workforce worked service in jobs that involve in-person interaction with strangers. Now, leaders of service organizations must reshape their offerings during the pandemic and prepare for whatever the new normal turns out to be. Our three panelists will share ideas for adapting their service businesses, now that closer-than-six-feet isn’t an option.
Join Lucidworks as we talk shop with 3 service business leaders, covering:
-Common impacts of the pandemic on service businesses (and what to do about them),
-How service teams can maintain a human touch across virtual channels, and
-Plans for the future, before and after the pandemic subsides.
Featuring
-Sara Nathan, President & CEO, AMIGOS
-Anthony Carruesco, Founder, AC Fly Fishing
-sara bradley, chef and proprietor, freight house
-Justin Sears, VP Product Marketing, Lucidworks
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
The COVID-19 pandemic has forced companies to support far more customers and employees through digital channels than ever before. Many are turning to chatbots to help meet increasing demand, but traditional rules-based approaches can’t keep up. Our new Smart Answers add-on to Lucidworks Fusion makes existing chatbots and virtual assistants more intelligent and more valuable to the people you serve.
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
Watch our on-demand webinar showcasing Smart Answers on Lucidworks Fusion. This technology makes existing chatbots and virtual assistants more intelligent and more valuable to the people you serve.
In this webinar, we’ll cover off:
-How search and deep learning extend conversational frameworks for improved experiences
-How Smart Answers improves customer care, call deflection, and employee self-service
-A live demo of Smart Answers for multi-channel self-service support
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
In the current climate, it’s now more important than ever to digitally enable your workforce and customers.
Hear from Simon Taylor, VP Global Partners & Alliances, Lucidworks and Matt Aslett, Research Vice President, 451 Research to get the inside scoop on how industry leaders in Europe are developing and executing their digital transformation strategies.
In this webinar, we’ll discuss:
The top challenges and aspirations European business and technology leaders are solving using AI and search technology
Which search and AI use cases are making the biggest impact in industries such as finance, healthcare, retail and energy in Europe
What technology buyers should look for when evaluating AI and search solutions
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
This document introduces Fusion 5.1 and its new capabilities for integrating with data science tools like Tensorflow, Scikit-Learn, and Spacy.
It provides an overview of Fusion's capabilities for understanding content, users, and delivering insights at scale. The document then demonstrates Fusion's Jupyter Notebook integration for reading and writing data and running SQL queries.
Finally, it shows how Fusion integrates with Seldon Core to easily deploy machine learning models with tools like Tensorflow and Scikit-Learn. A live demo is provided of deploying a custom model and using it in Fusion's query and indexing pipelines.
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
In this webinar with 451 Research, you'll understand how retailers are using AI to predict customer intent and learn which key performance metrics are used by more than 120 online retailers in Lucidworks’ 2019 Retail Benchmark Survey.
In this webinar, you’ll learn:
● What trends and opportunities are facing the ecommerce industry in 2020
● Why search is the universal path to understanding customer intent
● How large online retailers apply AI to maximize the effectiveness of their personalization efforts
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
Nordstrom Rack | Hautelook curates and serves customers a wide selection of on-trend apparel, accessories, and shoes at an everyday savings of up to 75 percent off regular prices. With over a million visitors shopping across different platforms every day, and a realization that customers have become accustomed to robust and personalized search interactions, Nordstrom Rack | Hautelook launched an initiative over a year ago to provide data science-driven digital experiences to their customers.
In this session, we’ll discuss Nordstrom Rack | Hautelook’s journey of operationalizing a hefty strategy, optimizing a fickle infrastructure, and rallying troops around a single vision of building an expansible machine-learning driven product discovery engine.
The audience will learn about:
-The key technical challenges and outcomes that come with onboarding a solution
-The lessons learned of creating and executing operational design
-The use of Lucidworks Fusion to plug custom data science models into search and browse applications to understand user intent and deliver personalized experiences
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
Knowledge graphs and machine learning are on the rise as enterprises hunt for more effective ways to connect the dots between the data and the business world. With newer technologies, the digital workplace can dramatically improve employee engagement, data-driven decisions, and actions that serve tangible business objectives.
In this webinar, you will learn
-- Introduction to knowledge graphs and where they fit in the ML landscape
-- How breakthroughs in search affect your business
-- The key features to consider when choosing a data discovery platform
-- Best practices for adopting AI-powered search, with real-world examples
Webinar: Building a Business Case for Enterprise SearchLucidworks
The document discusses building a business case for enterprise search. It notes that 85% of information is unstructured data locked in various locations and applications. Many knowledge workers spend a significant portion of their day searching across multiple systems for information. The rise of unstructured data and AI capabilities can help organizations unlock value from their information assets. Effective enterprise search powered by AI can provide real-time intelligence, personalized information, and more efficient research to help knowledge workers.
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...Fwdays
.NET 8 brought a lot of improvements for developers and maturity to the Azure serverless container ecosystem. So, this talk will cover these changes and explain how you can apply them to your projects. Another reason for this talk is the re-invention of Serverless from a DevOps perspective as a Platform Engineering trend with Backstage and the recent Radius project from Microsoft. So now is the perfect time to look at developer productivity tooling and serverless apps from Microsoft's perspective.
How UiPath Discovery Suite supports identification of Agentic Process Automat...DianaGray10
📚 Understand the basics of the newly persona-based LLM-powered Agentic Process Automation and discover how existing UiPath Discovery Suite products like Communication Mining, Process Mining, and Task Mining can be leveraged to identify APA candidates.
Topics Covered:
💡 Idea Behind APA: Explore the innovative concept of Agentic Process Automation and its significance in modern workflows.
🔄 How APA is Different from RPA: Learn the key differences between Agentic Process Automation and Robotic Process Automation.
🚀 Discover the Advantages of APA: Uncover the unique benefits of implementing APA in your organization.
🔍 Identifying APA Candidates with UiPath Discovery Products: See how UiPath's Communication Mining, Process Mining, and Task Mining tools can help pinpoint potential APA candidates.
🔮 Discussion on Expected Future Impacts: Engage in a discussion on the potential future impacts of APA on various industries and business processes.
Enhance your knowledge on the forefront of automation technology and stay ahead with Agentic Process Automation. 🧠💼✨
Speakers:
Arun Kumar Asokan, Delivery Director (US) @ qBotica and UiPath MVP
Naveen Chatlapalli, Solution Architect @ Ashling Partners and UiPath MVP
Top 12 AI Technology Trends For 2024.pdfMarrie Morris
Technology has become an irreplaceable component of our daily lives. The role of AI in technology revolutionizes our lives for the betterment of the future. In this article, we will learn about the top 12 AI technology trends for 2024.
The Challenge of Interpretability in Generative AI Models.pdfSara Kroft
Navigating the intricacies of generative AI models reveals a pressing challenge: interpretability. Our blog delves into the complexities of understanding how these advanced models make decisions, shedding light on the mechanisms behind their outputs. Explore the latest research, practical implications, and ethical considerations, as we unravel the opaque processes that drive generative AI. Join us in this insightful journey to demystify the black box of artificial intelligence.
Dive into the complexities of generative AI with our blog on interpretability. Find out why making AI models understandable is key to trust and ethical use and discover current efforts to tackle this big challenge.
Generative AI technology is a fascinating field that focuses on creating comp...Nohoax Kanont
Generative AI technology is a fascinating field that focuses on creating computer models capable of generating new, original content. It leverages the power of large language models, neural networks, and machine learning to produce content that can mimic human creativity. This technology has seen a surge in innovation and adoption since the introduction of ChatGPT in 2022, leading to significant productivity benefits across various industries. With its ability to generate text, images, video, and audio, generative AI is transforming how we interact with technology and the types of tasks that can be automated.
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPathCommunity
Welcome to our third live UiPath Community Day Amsterdam! Come join us for a half-day of networking and UiPath Platform deep-dives, for devs and non-devs alike, in the middle of summer ☀.
📕 Agenda:
12:30 Welcome Coffee/Light Lunch ☕
13:00 Event opening speech
Ebert Knol, Managing Partner, Tacstone Technology
Jonathan Smith, UiPath MVP, RPA Lead, Ciphix
Cristina Vidu, Senior Marketing Manager, UiPath Community EMEA
Dion Mes, Principal Sales Engineer, UiPath
13:15 ASML: RPA as Tactical Automation
Tactical robotic process automation for solving short-term challenges, while establishing standard and re-usable interfaces that fit IT's long-term goals and objectives.
Yannic Suurmeijer, System Architect, ASML
13:30 PostNL: an insight into RPA at PostNL
Showcasing the solutions our automations have provided, the challenges we’ve faced, and the best practices we’ve developed to support our logistics operations.
Leonard Renne, RPA Developer, PostNL
13:45 Break (30')
14:15 Breakout Sessions: Round 1
Modern Document Understanding in the cloud platform: AI-driven UiPath Document Understanding
Mike Bos, Senior Automation Developer, Tacstone Technology
Process Orchestration: scale up and have your Robots work in harmony
Jon Smith, UiPath MVP, RPA Lead, Ciphix
UiPath Integration Service: connect applications, leverage prebuilt connectors, and set up customer connectors
Johans Brink, CTO, MvR digital workforce
15:00 Breakout Sessions: Round 2
Automation, and GenAI: practical use cases for value generation
Thomas Janssen, UiPath MVP, Senior Automation Developer, Automation Heroes
Human in the Loop/Action Center
Dion Mes, Principal Sales Engineer @UiPath
Improving development with coded workflows
Idris Janszen, Technical Consultant, Ilionx
15:45 End remarks
16:00 Community fun games, sharing knowledge, drinks, and bites 🍻
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Zilliz
Enterprises have traditionally prioritized data quantity, assuming more is better for AI performance. However, a new reality is setting in: high-quality data, not just volume, is the key. This shift exposes a critical gap – many organizations struggle to understand their existing data and lack effective curation strategies and tools. This talk dives into these data challenges and explores the methods of automating data curation.
DefCamp_2016_Chemerkin_Yury-publish.pdf - Presentation by Yury Chemerkin at DefCamp 2016 discussing mobile app vulnerabilities, data protection issues, and analysis of security levels across different types of mobile applications.
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptxFwdays
I will share my personal experience of full-time development on wasm Blazor
What difficulties our team faced: life hacks with Blazor app routing, whether it is necessary to write JavaScript, which technology stack and architectural patterns we chose
What conclusions we made and what mistakes we committed
I chose LogStash for data transformation and import for two reasons:
It provides a powerful framework for extracting, grokking and transforming log data into a structured format that Solr can consume and that SILK can use for dashboards.
LucidWorks’ Hadoop Connectors have a GrokIngestMapper that allows me to reuse the same LogStash Filters to work with larger volumes of files on HDFS (more details on this in a future article).
Highlights: Joins, stats, pivot faceting
http://localhost:3334/#/dashboard/solr/Trading
Time series, joins
TARDIS: http://2.bp.blogspot.com/-ysN8JskY4WM/UEZNhBywQKI/AAAAAAAABdg/gXE0A9OO6Mk/s1600/13881_doctor_who.jpg
Work under way to formalize
but not as a search engine for content
more like a search engine for behavior