Apache Solr is a powerful search and analytics engine with features such as full-text search, faceting, joins, sorting and capable of handling large amounts of data across a large number of servers. However, with all that power and scalability comes complexity. Solr 6 supports a Parallel SQL feature which provides a simplified, well-known interface to your data in Solr, performs key operations such as sorts and shuffling inside Solr for massive speedups, provides best-practices based query optimization and by leveraging the scalability of SolrCloud and a clever implementation, allows you to throw massive amounts of computation power behind analytical queries.
In this talk, we will explore the why, what and how of Parallel SQL and its building block Streaming Expressions in Solr 6 with a hint of the exciting new developments around this feature.
Building real time analytics applications using pinot : A LinkedIn case studyKishore Gopalakrishna
This document discusses using real-time analytics applications with LinkedIn activity data and Apache Pinot. It provides three examples of use cases: 1) article analytics to understand reader demographics, 2) feed ranking to improve relevance, and 3) anomaly detection for monitoring metrics and detecting issues. It compares performance of Pinot to other real-time analytics databases and processing engines. Finally, it outlines an architecture for building analytics applications and dashboards using Pinot to enable real-time insights from large-scale activity data.
Geo-Enablement of the Supply Chain AnalyticsNishant Sinha
Supply chain analytics using GIS and location based services.
These slides were presented to a team of Indian Armed Forces in a short term course on Supply Chain Management at IIFT
Learning To Rank has been the first integration of machine learning techniques with Apache Solr allowing you to improve the ranking of your search results using training data.
One limitation is that documents have to contain the keywords that the user typed in the search box in order to be retrieved(and then reranked). For example, the query “jaguar” won’t retrieve documents containing only the terms “panthera onca”. This is called the vocabulary mismatch problem.
Neural search is an Artificial Intelligence technique that allows a search engine to reach those documents that are semantically similar to the user’s information need without necessarily containing those query terms; it learns the similarity of terms and sentences in your collection through deep neural networks and numerical vector representation(so no manual synonyms are needed!).
This talk explores the first Apache Solr official contribution about this topic, available from Apache Solr 9.0.
We start with an overview of neural search (Don’t worry - we keep it simple!): we describe vector representations for queries and documents, and how Approximate K-Nearest Neighbor (KNN) vector search works. We show how neural search can be used along with deep learning techniques (e.g, BERT) or directly on vector data, and how we implemented this feature in Apache Solr, giving usage examples!
Join us as we explore this new exciting Apache Solr feature and learn how you can leverage it to improve your search experience!
Using Graph and Transformer Embeddings for Vector Based RetrievalSujit Pal
For the longest time, term-based vector representations based on whole-document statistics, such as TF-IDF, have been the staple of efficient and effective information retrieval. The popularity of Deep Learning over the past decade has resulted in the development of many interesting embedding schemes. Like term-based vector representations, these embeddings depend on structure implicit in language and user behavior. Unlike them, they leverage the distributional hypothesis, which states that the meaning of a word is determined by the context in which it appears. These embeddings have been found to better encode the semantics of the word, compared to term-based representations. Despite this, it has only recently become practical to use embeddings in Information Retrieval at scale.
In this presentation, we will describe how we applied two new embedding schemes to Scopus, Elsevier’s broad coverage database of scientific, technical, and medical literature. Both schemes are based on the distributional hypothesis but come from very different backgrounds. The first embedding is a graph embedding called node2vec, that encodes papers using citation relationships between them as specified by their authors. The second embedding leverages Transformers, a recent innovation in the area of Deep Learning, that are essentially language models trained on large bodies of text. These two embeddings exploit the signal implicit in these data sources and produce semantically rich user and content-based vector representations respectively. We will evaluate these embedding schemes and describe how we used the Vespa search engine to search these embeddings for similar documents within the Scopus dataset. Finally, we will describe how RELX staff can access these embeddings for their own data science needs, independent of the search application.
Doing Synonyms Right - John Marquiss, Wolters KluwerLucidworks
The document discusses different options for implementing synonyms in Solr, including the SynonymFilter and SynonymGraphFilter. It explains that the SynonymGraphFilter handles multi-term synonyms correctly at query time, while the SynonymFilter does not without extra work. It also describes using synonym UIDs with the "expand=false" parameter as an alternative method for correct multi-term synonym handling in older Solr versions. The document analyzes examples and compares the performance of indexing and querying with each method. Based on the analysis, the document recommends using the SynonymGraphFilter at query time as the best option for providing accurate positional queries while maintaining good performance.
This document provides an overview of reading the source code of Presto, an open source distributed SQL query engine. It recommends starting on GitHub at prestosql/presto and exploring areas of interest like SQL interfaces to different data sources via connectors, the query engine core, distributed systems implementation, or extending Presto. Useful techniques for navigating the code with IntelliJ IDEA are presented. Specific code locations and concepts are highlighted for connectors, the query execution flow, parsing and analyzing SQL, and Presto's implementation as a distributed REST service. The document aims to help readers find their own interests to learn from Presto's codebase.
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn confluent
(Celia Kung, LinkedIn) Kafka Summit SF 2018
For several years, LinkedIn has been using Kafka MirrorMaker as the mirroring solution for copying data between Kafka clusters across data centers. However, as LinkedIn data continued to grow, mirroring trillions of Kafka messages per day across data centers uncovered the scale limitations and operability challenges of Kafka MirrorMaker. To address such issues, we have developed a new mirroring solution, built on top our stream ingestion service, Brooklin. Brooklin MirrorMaker aims to provide improved performance and stability, while facilitating better management through finer control of data pipelines. Through flushless Kafka produce, dynamic management of data pipelines, per-partition error handling and flow control, we are able to increase throughput, better withstand consume and produce failures and reduce overall operating costs. As a result, we have eliminated the major pain points of Kafka MirrorMaker. In this talk, we will dive deeper into the challenges LinkedIn has faced with Kafka MirrorMaker, how we tackled them with Brooklin MirrorMaker and our plans for iterating further on this new mirroring solution.
Alkin Tezuysal discusses his first 90 days working at ChistaDATA Inc. as EVP of Global Services. He has experience working with databases like MySQL, Oracle, and ClickHouse. ChistaDATA focuses on providing ClickHouse infrastructure operations through managed services, support, and consulting. ClickHouse is an open source columnar database that uses a shared-nothing architecture for high performance analytics workloads.
Data Privacy with Apache Spark: Defensive and Offensive ApproachesDatabricks
In this talk, we’ll compare different data privacy techniques & protection of personally identifiable information and their effects on statistical usefulness, re-identification risks, data schema, format preservation, read & write performance.
We’ll cover different offense and defense techniques. You’ll learn what k-anonymity and quasi-identifier are. Think of discovering the world of suppression, perturbation, obfuscation, encryption, tokenization, watermarking with elementary code examples, in case no third-party products cannot be used. We’ll see what approaches might be adopted to minimize the risks of data exfiltration.
Using Geospatial to Innovate in Last-Mile LogisticsCARTO
The document discusses how geospatial technologies like what3words and CARTO can help optimize last-mile logistics and delivery. what3words assigns a unique 3-word address to every 3m x 3m area in the world to simplify addressing. CARTO provides spatial data and analytics to optimize routing and delivery. Examples are given of companies using these technologies together to improve delivery by integrating 3-word addresses, optimizing routes using real-time traffic and weather data, and analyzing spatial data from CRM systems. This can lead to improvements like reduced costs, increased coverage, and fewer failed or late deliveries.
With what3words, anywhere and anything on the planet can now be identified with a simple 3 word address - from water points and street lights to entrances to hospital labs. With 3 word addresses, location data can be communicated and managed efficiently and accurately.
This document provides an overview of data science work at Zillow. It discusses Zillow's use of machine learning models like the Zestimate and Rent Zestimate to analyze housing data. It describes Zillow's technology stack, which heavily leverages Python, R, and SQL. Specific examples are provided on automated waterfront determination using GIS data and discovering home street features. The document also discusses how tools like Dato and Scikit-Learn are used for tasks like fraud detection, property matching, and data modeling. In closing, current job openings at Zillow are listed.
Analyzing 1.2 Million Network Packets per Second in Real-timeDataWorks Summit
The document describes Cisco's OpenSOC, an open source security operations center that can analyze 1.2 million network packets per second in real time. It discusses the business need for such a solution given how breaches often go undetected for months. The solution architecture utilizes big data technologies like Hadoop, Kafka and Storm to enable real-time processing of streaming data at large scale. It also provides lessons learned around optimizing the performance of components like Kafka, HBase and Storm topologies.
Question Answering - Application and ChallengesJens Lehmann
This document provides an overview of question answering applications and challenges. It defines question answering as receiving natural language questions and providing concise answers. Recent developments in question answering systems are discussed, including IBM Watson. Challenges for question answering over semantic data are explored, such as lexical gaps, ambiguity, granularity, and alternative resources. Large-scale linguistic resources and machine learning approaches for question answering are also covered. Applications of question answering technologies are examined.
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
( ELK Stack Training - https://www.edureka.co/elk-stack-trai... )
This Edureka Elasticsearch Tutorial will help you in understanding the fundamentals of Elasticsearch along with its practical usage and help you in building a strong foundation in ELK Stack. This video helps you to learn following topics:
1. What Is Elasticsearch?
2. Why Elasticsearch?
3. Elasticsearch Advantages
4. Elasticsearch Installation
5. API Conventions
6. Elasticsearch Query DSL
7. Mapping
8. Analysis
9 Modules
1. Kusto (Azure Data Explorer) is a fast and flexible data exploration service for analyzing security and application logs, performance counters, and other streaming data.
2. A Data Engineer's role is evolving to focus more on real-time analysis using Kusto as opposed to traditional SQL. Understanding how to use Kusto's query engine and data ingestion capabilities is key.
3. Techniques like using materialized views, partitioning data, and leader-follower databases can help distribute workloads and improve query performance at scale in Kusto. However, Kusto has limitations around concurrency, memory usage, and result set sizes that need to be considered.
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...Altinity Ltd
The document discusses the history and development of Apache Arrow, an open-source cross-language development platform for in-memory data. Some key points:
- Arrow started in 2016 to optimize data transfer between systems using a standardized columnar memory format.
- It has since expanded to include libraries in many languages, file formats like Arrow and Parquet, and distributed computing capabilities like Arrow Flight for RPC.
- Over time, more projects have adopted Arrow as an internal data structure for improved performance, including Spark, DuckDB, and Streamlit.
- Today Arrow is an ecosystem of interoperable components, with continued work on higher-level tools around databases, machine learning, and geospatial data.
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...DataWorks Summit
SDM is a distributed, reliable and highly available data lake ingestion framework that handles data processing, archival and reconciliation capabilities with an effective change based history management capabilities for batch and streaming data. It is meta-driven and provides automated schema evolution. The SDM platform is built completely on open source software/platforms, making it both extensible and robust. The data management, schema evolution and archival is achieved through Apache NiFi’s in-built capabilities and extensions via custom processors and controller services. The end-of-day construct is generated through an Apache Spark job.Types of Data :
Types of Data :
1. Batch
a. Full dump
b. Incremental
c. Hybrid (Daily incremental + Weekly/Monthly full dump)
2. Near Real time
a. CDC-Kafka
b. JMS-Kafka
3. Extractions
a. Incremental based on Change Data Capture tool (IBM Infosphere CDC)
b. Sqoop
c. JDBC/ODBC
4. Manual File Upload
a. Excel
Types of Process:
1. File validation
a. File integrity (header, trailer, data checksum)
b. File de-duplication
c. New line and non-printable control characters handling
2. Structural validation (Row validation)
a. Fixed width
b. Delimited
c. XML
d. JSON
e. Excel (Single/Multi tab)
f. Datatype validation
g. Constraint validation – Null, primary key and full row de-duplication
3. Defaulting
a. Condition based
b. Special data-type handling (mainframe systems)
4. Operational assurance
a. Row count logging
b. Reconciliation with source
c. File/Record rejections with reasons
5. Lineage tracking
a. Row-id for every single record is generated and referenced against the source file until the processed layer.
Storage formats :
1. Raw - Archival
2. Avro – Staged
3. ORC – Processed
Benefits
1. Metadata driven
2. Extensible
3. Scalable
4. Flexible
Plans
Current State :
1. Custom built ingestion framework leveraging upon standard open source software from Apache.
2. Data of 100+ source systems are ingested into the Hadoop data lake using the ingestion framework.
Plan :
1. Open sourcing the framework for general consumption.
2. Metadata management UI/API which would serve as a glossary of data available in the data lake with search capabilities.
3. Operational and Exception reporting.
4. Centralized data retention within the framework.
5. Health monitoring and alerting.
6. Provenance data maintenance in Atlas.
Speaker
Arun Manivannan, Senior Data Engineer, Standard Chartered Bank
AI from your data lake: Using Solr for analyticsDataWorks Summit
Introductory technical session on Apache Solr's (HDP Search) artificial intelligence and machine learning features to discover relationships and insights across big data in the enterprise. Discussions will include how Solr performs graph traversal, anomaly detection, NLP and time-series analysis, and how you can display this data to users with easy-to-create dashboards.
This technical session will review Apache Solr’s streaming expressions, which were introduced in Solr 6.5. With over 100 expressions and evaluators, conditional logic, variables and data structures these functions form the basis of a new paradigm that brings many of the features from the relational world into search. These new capabilities form the basis of a powerful functional programming language that enables the implementation of many parallel computing use cases such as anomaly detection, streaming NLP, graph traversal and time-series analysis.
In order to discover and analyze big data, third party tools such as Jupyter, Tableau, and Lucidworks Insights will be reviewed.
Speaker
Cassandra Targett, Lucidworks, Director of Engineering
Marcelline Saunders, Lucidworks, Director, Global Partner Enablement
Webinar: Solr 6 Deep Dive - SQL and GraphLucidworks
This document provides an agenda and overview for a conference session on Solr 6 and its new capabilities for parallel SQL and graph queries. The session will cover motivations for adding these features to Solr, how streaming expressions enable parallel SQL, graph capabilities through the new graph query parser and streaming expressions, and comparisons to other technologies. The document includes examples of SQL queries and graph streaming expressions in Solr.
The next major release of Solr is right around the corner! Join Solr Committer Cassandra Targett and Lucidworks SVP of Engineering Trey Grainger for a first look into what’s included in the upcoming release.
Grant Ingersoll presented on using Apache Solr and Apache Spark for data engineering. He discussed how Solr can be used for indexing and searching large amounts of data, while Spark enables large-scale processing on the indexed data. Lucidworks' Fusion product combines Solr and Spark capabilities to allow search-driven applications and machine learning on indexed content.
This document summarizes Joel Bernstein's presentation on parallel SQL in Solr. The key points are:
1. SQL provides an easier way for users to query Solr compared to its other complex APIs, and SQL queries can be optimized.
2. Solr 6.0 introduces a SQL interface that supports high-cardinality aggregations, distributed joins, and Solr search predicates.
3. Under the hood, SQL queries are compiled into TupleStreams and executed in parallel across worker collections using a streaming API and expressions. This allows massive throughput for queries.
Parallel Computing with SolrCloud: Presented by Joel Bernstein, AlfrescoLucidworks
This document summarizes Joel Bernstein's presentation on parallel SQL in Solr 6.0. The key points are:
1. SQL provides an optimizer to choose the best query plan for complex queries in Solr, avoiding the need for users to determine optimal faceting APIs or parameters.
2. SQL queries in Solr 6.0 can perform distributed joins, aggregations, sorting, and filtering using Solr search predicates. Aggregations can be performed using either map-reduce or facets.
3. Under the hood, SQL queries are compiled to TupleStreams which are serialized to Streaming Expressions and executed in parallel across worker collections using Solr's streaming API framework.
This document provides an introduction to Apache Solr, an open-source enterprise search platform built on Apache Lucene. It discusses how Solr indexes content, processes search queries, and returns results with features like faceting, spellchecking, and scaling. The document also outlines how Solr works, how to configure and use it, and examples of large companies that employ Solr for search.
This document provides a summary of the Solr search platform. It begins with introductions from the presenter and about Lucid Imagination. It then discusses what Solr is, how it works, who uses it, and its main features. The rest of the document dives deeper into topics like how Solr is configured, how to index and search data, and how to debug and customize Solr implementations. It promotes downloading and experimenting with Solr to learn more.
Impala Architecture Presentation at Toronto Hadoop User Group, in January 2014 by Mark Grover.
Event details:
http://www.meetup.com/TorontoHUG/events/150328602/
The Pushdown of Everything by Stephan Kessler and Santiago MolaSpark Summit
Stephan Kessler and Santiago Mola presented SAP HANA Vora, which extends Spark SQL's data sources API to allow "pushing down" more of a SQL query's logical plan to the data source for execution. This "Pushdown of Everything" approach leverages data sources' capabilities to process less data and optimize query execution. They described how data sources can implement interfaces like TableScan, PrunedScan, and the new CatalystSource interface to support pushing down projections, filters, and more complex queries respectively. While this approach has advantages in performance, challenges include the complexity of implementing CatalystSource and ensuring compatibility across Spark versions. Future work aims to improve the API and provide utilities to simplify implementation.
Solr at zvents 6 years later & still going stronglucenerevolution
Presented by Amit Nithianandan, Lead Engineer Search/Analytics New Platforms, Zvents/Stubhub
Zvents has been a user of Apache Solr since 2007 when it was very early. Since then, the team has made extensive use of the various features and most recently completed an overhaul of the search engine to Solr 4.0. We'll touch on a variety of development/operational topics including how we manage the build lifecycle of the search application using Maven, release the deployment package using Capistrano and monitor using NewRelic as well as the extensive use of virtual machines to simplify node management. Also, we’ll talk about application level details such as our unique federated search product, and the integration of technologies such as Hypertable, RabbitMQ, and EHCache to power more real-time ranking and filtering based on traffic statistics and ticket inventory.
This document discusses deploying and managing Apache Solr at scale. It introduces the Solr Scale Toolkit, an open source tool for deploying and managing SolrCloud clusters in cloud environments like AWS. The toolkit uses Python tools like Fabric to provision machines, deploy ZooKeeper ensembles, configure and start SolrCloud clusters. It also supports benchmark testing and system monitoring. The document demonstrates using the toolkit and discusses lessons learned around indexing and query performance at scale.
Luis Majano discusses ORM and his CBORM module. He argues that ORM is not a silver bullet and presents 10 keys to ORM success: 1) OO modeling is key; 2) avoid bad engine defaults; 3) understand the Hibernate session; 4) use transaction demarcation; 5) use laziness; 6) avoid bi-directional relationships; 7) do not store entities in scopes; 8) use database indexes; 9) cache for performance boosts; and 10) use HQL maps. CBORM provides base ORM services, virtual ORM services, active entities, entity populators, validation, and event handlers to improve on ORM.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
Solr Recipes provides quick and easy steps for common use cases with Apache Solr. Bite-sized recipes will be presented for data ingestion, textual analysis, client integration, and each of Solr’s features including faceting, more-like-this, spell checking/suggest, and others.
Solr search engine with multiple table relationJay Bharat
Here you can learn how to use solr search engine and implement in your application like in PHP/MYSQL.
I am introducing how to handle multiple table data handling in SOLR.
Similar to Parallel SQL and Streaming Expressions in Apache Solr 6 (20)
Solr is an enterprise search platform used by 90% of Fortune 500 companies. The presentation discusses Solr use cases, lessons learned, areas for improvement, and future work including HTTP/2 support, asynchronous clients, resource throttling, metrics storage, quality of service features, automatic shard splitting, and a cluster suggestions API. It also provides information for the Bangalore Apache Solr/Lucene Meetup Group.
Cross Datacenter Replication aka CDCR has been a long requested feature in Apache Solr. In this talk, we will discuss CDCR as released in Apache Solr 6.0 and beyond to understand its use-cases, limitations, setup and performance. We will also take a quick look at the future enhancements that can further simplify and scale this feature.
In the big data world, our data stores communicate over an asynchronous, unreliable network to provide a facade of consistency. However, to really understand the guarantees of these systems, we must understand the realities of networks and test our data stores against them.
Jepsen is a tool which simulates network partitions in data stores and helps us understand the guarantees of our systems and its failure modes. In this talk, I will help you understand why you should care about network partitions and how can we test datastores against partitions using Jepsen. I will explain what Jepsen is and how it works and the kind of tests it lets you create. We will try to understand the subtleties of distributed consensus, the CAP theorem and demonstrate how different data stores such as MongoDB, Cassandra, Elastic and Solr behave under network partitions. Finally, I will describe the results of the tests I wrote using Jepsen for Apache Solr and discuss the kinds of rare failures which were found by this excellent tool.
The document summarizes new features in Apache Solr 5 including improved JSON support, faceted search enhancements, scaling improvements, and stability enhancements. It also previews upcoming features like improved analytics capabilities and first class support for additional languages.
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
This document discusses scaling SolrCloud to support large numbers of document collections. It begins by introducing SolrCloud and some of its key capabilities and terminology. It then describes four problems that can arise at large scale: high cluster state load, overseer performance issues, inflexible data management, and limitations with data export. For each problem, solutions are proposed that were implemented in Apache Solr to improve scalability, such as splitting the cluster state, optimizing the overseer, enabling more flexible data splitting and migration, and allowing distributed deep paging exports. The document concludes by describing efforts to test SolrCloud at massive scale through automated tools and cloud infrastructure.
This document provides tips for tuning Solr for high performance. It discusses optimizing queries and facets for CPU usage, tuning memory usage such as using docValues, optimizing disk usage through merge policies and commit settings, reducing network overhead through batching and caching, and techniques like deep paging to improve performance for large result sets. The document emphasizes only indexing and retrieving necessary fields to reduce resource usage and tuning garbage collection to avoid pauses.
These slides were presented at the Great Indian Developer Summit 2014 at Bangalore. See http://www.developermarch.com/developersummit/session.html?insert=ShalinMangar2
"SolrCloud" is the name given to Apache Solr's feature set for fault tolerant, highly available, and massively scalable capabilities. SolrCloud has enabled organizations to scale, impressively, into the billions of documents with sub-second search!
Shalin Shekhar Mangar gave an introduction to Apache Lucene and Solr at the 4th Bangalore Lucene/Solr Meetup on April 19th, 2014. He provided an overview of Lucene as a Java-based search library for adding search and indexing to applications. He then discussed Solr, which is based on Lucene and allows accessing Lucene via HTTP with features like faceting, replication, and distributed search. He also demonstrated indexing and searching with Solr using its Java client SolrJ.
This document summarizes a presentation about SolrCloud shard splitting. It introduces the presenter and his background with Apache Lucene and Solr. The presentation covers an overview of SolrCloud, how documents are routed to shards in SolrCloud, the SolrCloud collections API, and the new functionality for splitting shards in Solr 4.3 to allow dynamic resharding of collections without downtime. It provides details on the shard splitting mechanism and tips for using the new functionality.
Presented at Indian Institute of Information Technology (IIIT) Allahabad on 21 Oct 2009 to students about the Apache Software Foundation, Lucene, Solr, Hadoop and on the benefits of contributing to open source projects. The target audience was sophomore, junior and senior B.Tech students.
Waze vs. Google Maps vs. Apple Maps, Who Else.pdfBen Ramedani
Let’s face it, getting lost isn’t really part of the adventure anymore (unless you’re into that sort of thing!). Nowadays, a good navigation app is like your trusty compass, guiding you through busy city streets and winding country roads. But with so many options out there—from big names like Waze, Google Maps, and Apple Maps to some lesser-known contenders—choosing the right one can feel a bit overwhelming.
Think about it: you're about to head out on a road trip, and the last thing you want is to end up in the middle of nowhere because you took a wrong turn. Or maybe you're just trying to navigate your daily commute without hitting every single red light. That's where a solid navigation app comes in handy.
Google Maps is like the old reliable friend who knows every shortcut and scenic route. It's packed with features, from real-time traffic updates to detailed directions, making it a top choice for many. But then there's Waze, the social butterfly of navigation apps. It's all about community, with drivers sharing real-time updates on traffic, accidents, and even speed traps. It’s perfect if you want to feel like you’re part of a huge driving club, all working together to get everyone to their destination faster.
And let’s not forget Apple Maps, which has come a long way since its rocky start. If you're deep into the Apple ecosystem, it's a seamless choice, integrating smoothly with all your devices and offering some pretty neat features like Flyover for 3D city views.
But wait, there are also some underdog apps worth considering! Have you heard of MapQuest? It's still around and offers some great features, especially for planning long trips with multiple stops. Then there's HERE WeGo, which is fantastic for offline navigation—a real lifesaver if you're heading somewhere with spotty cell service.
So, whether you're planning a cross-country adventure or just trying to find the quickest route to work, we’ll help you sift through these options. We’ll dive into what makes each app unique, their pros and cons, and ultimately, guide you to the perfect navigation app for your needs. Buckle up and get ready for a smooth ride!
AI is revolutionizing DevOps by advancing algorithmic optimizations in pipelines, elevating efficiency levels, and introducing predictive functionalities. This article examines how AI is reshaping continuous integration, deployment strategies, monitoring practices, and incident management within DevOps ecosystems, ultimately amplifying efficiency and dependability.
BDRSuite - #1 Cost effective Data Backup and Recovery Solutionpraveene26
BDRSuite and BDRCloud by Vembu are comprehensive and cost-effective backup and disaster recovery solutions designed to meet the diverse data protection requirements of Businesses and Service Providers.
With BDRSuite & BDRCloud, you can backup diverse IT workloads from any location, including VMs (VMware, Hyper-V, KVM, Proxmox VE, oVirt), Servers & Endpoints (Windows, Linux, Mac), SaaS Applications (Microsoft 365, Google Workspace), Cloud VMs (AWS, Azure), NAS/File Shares and Databases & Applications (Microsoft Exchange Server, SQL Server, SharePoint Server, PostgreSQL, MySQL).
You can store backup anywhere like On-Premise/Remote storage, Private/Public Cloud, and BDRCloud.
You can centrally manage the entire backup infrastructure with BDRSuite’s self-hosted centralized management console (or) BDRCloud-hosted centralized management console.
You can quickly recover from data loss or ransomware attacks—all at an affordable price.
To know more visit our website -
https://www.bdrsuite.com/
https://www.bdrcloud.com/
Fix Production Bugs Quickly - The Power of Structured Logging in Ruby on Rail...John Gallagher
Rails apps can be a black box. Have you ever tried to fix a bug where you just can’t understand what’s going on? This talk will give you practical steps to improve the observability of your Rails app, taking the time to understand and fix defects from hours or days to minutes. Rails 8 will bring an exciting new feature: built-in structured logging. This talk will delve into the transformative impact of structured logging on fixing bugs and saving engineers time. Structured logging, as a cornerstone of observability, offers a powerful way to handle logs compared to traditional text-based logs. This session will guide you through the nuances of structured logging in Rails, demonstrating how it can be used to gain better insights into your application’s behavior. This talk will be a practical, technical deep dive into how to make structured logging work with an existing Rails app.
I talk about the Steps to Observable Software - a practical five step process for improving the observability of your Rails app.
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing ToolsBenjamin Bischoff
In the rapidly evolving landscape of software development and testing, it is tempting to chase the latest tools and technologies. However, some of the most effective solutions have been in existence for decades. In this talk, we’ll delve into the enduring value of these timeless testing tools.
We’ll explore how established tools like Selenium, GNU Make, Maven, and Bash remain vital in today’s software development and testing toolkit even though they have been around for a long time (some were even invented before I was born). I’ll share examples of how these tools have addressed our testing and automation challenges, showcasing their adaptability, versatility, and reliability in various scenarios. I aim to demonstrate that sometimes, the “old” ways can indeed be the best ways.
Literals - A Machine Independent Feature21h16charis
Introduction to Literals, A machine independent feature. The presentation is based on the prescribed textbook for System Software and Compiler Design, Computer Science and Engineering - System Software by Leland. L. Beck,
D Manjula.
Bring Strategic Portfolio Management to Monday.com using OnePlan - Webinar 18...OnePlan Solutions
Unlock the full potential of your projects with OnePlan’s seamless integration with monday.com. Join us to discover how OnePlan enhances monday.com by aligning your portfolio of projects with your organization’s strategic goals, optimizing resource allocation, and streamlining performance tracking. Learn how this powerful combination can drive efficiency, cost savings, and strategic success within your organization.
The code is written and the tests pass. I just have to commit this last round of changes to my branch. Wait, why does that say committed to main? Did I commit all those changes to main? Arghh! I can’t redo all of this!
Committing changes to the wrong branch, forgetting files, misspelling the commit message, and needing to undo commits are some of the “advanced” features of Git that we normal people run into way too often and need help with. The fixes are often easy – once you know what they are. But in the heat of the moment, with the deadline (or Friday afternoon) approaching, it isn’t always easy to figure out what magic spell to cast to get Git to do what you need.
We’ll spend some time looking at typical Git situations people get themselves into, and then we’ll demonstrate how to get out of them. This isn’t about Git internals or a Git master’s class – this real-world Git when things aren’t going right. And there will be plenty of time for questions, so bring your “best” Git nightmare scenarios so we can figure out how to recover.
Unlocking the Future of Artificial IntelligencedorinIonescu
Unlock the Future: Dive into AI Today! Videnda AI specializes in developing advanced artificial intelligence solutions, including visual dictionaries and language learning tools that leverage immersive virtual travel experiences. Stay Ahead of the Curve: Master AI Now! Our AI technology integrates machine learning and neural networks to enhance education and business applications. AI: The Next Frontier. Are You Ready to Explore? With a focus on real-time AI solutions and deep learning models, Videnda AI provides innovative tools for multilingual communication and immersive learning.
In this course, you'll find a series of engaging videos packed with vibrant animations that break down complex AI concepts into digestible pieces. Our curriculum covers AI models such as Convolutional Neural Networks (CNN), Multi-Layer Perceptrons (MLP), Generative Adversarial Networks (GAN), and Transformers, providing a solid understanding of these models and their real-world applications. We also offer hands-on experience with Generative AI tools like ChatGPT and Midjourney, and Python programming tutorials to help you implement AI algorithms and build your own AI applications.
We are proud participants in the Nvidia Inception Program, driving AI innovation across various industries. By the end of our course, you'll have a strong understanding of AI principles, enhanced Python programming skills, and practical experience with state-of-the-art Generative AI tools. Whether you're looking to kickstart a career in AI or simply curious about this revolutionary technology, Videnda AI is your partner in mastering the future of artificial intelligence.
The SQDC (Safety, Quality, Delivery, Cost) process enhances manufacturing performance through daily safety meetings, defect tracking, and waste reduction. Orcalean’s FactoryKPI digital dashboard streamlines this process, providing real-time data and AI-powered analytics for continuous improvement.
Get to know Autonomous 2.0, the latest innovation from Applitools, in this sneak peek session showcasing how our AI-powered testing solutions revolutionize how you create, debug, and manage test scripts. See more and sign up for a free trial at https://applitools.info/ml6
Three available editions of Windows Servers crucial to your organization’s op...Q-Advise
Three available editions of Windows Servers crucial to your organization’s operations
Windows Server, Microsoft’s robust operating system, is the cornerstone of enterprise IT infrastructure, tailored for mission-critical operations. It helps in managing enterprise-level tasks, including data storage, applications, and communication.
Proper licensing of Windows Server is essential for both legal compliance and optimal functionality within business environments.
Windows Server comes in various edition and before any edition is used in your organization, it is required you license them appropriately. The licensing can be complex and capital demanding when you don’t know what you want or understand the licensing requirements.
Even if successfully licensed, there are various activities you can practice as an organization to make sure your Server is operating optimally and there is real value for money. This requires a deeper understanding of best practices and our team of cloud and licensing experts can be of support.
Send the team an email, info@q-advise.com let’s have a look at your needs, together with you decide which licensing model will best work in your case, assist you with savings options and share with you how pre-owned licensing can help you get licensed adequately also.
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)Andre Hora
Positive tests (aka, happy path tests) cover the expected behavior of the program, while negative tests (aka, unhappy path tests) check the unexpected behavior. Ideally, test suites should have both positive and negative tests to better protect against regressions. In practice, unfortunately, we cannot easily identify whether a test is positive or negative. A better understanding of whether a test suite is more positive or negative is fundamental to assessing the overall test suite capability in testing expected and unexpected behaviors. In this paper, we propose test polarity, an automated approach to detect positive and negative tests. Our approach runs/monitors the test suite and collects runtime data about the application execution to classify the test methods as positive or negative. In a first evaluation, test polarity correctly classified 117 tests as as positive or negative. Finally, we provide a preliminary empirical study to analyze the test polarity of 2,054 test methods from 12 real-world test suites of the Python Standard Library. We find that most of the analyzed test methods are negative (88%) and a minority is positive (12%). However, there is a large variation per project: while some libraries have an equivalent number of positive and negative tests, others have mostly negative ones.
5. • Full text search (Info Retr.)
• Facets/Guided Nav galore!
• Lots of data types
• Spelling, auto-complete,
highlighting
• Cursors
• More Like This
• De-duplication
• Apache Lucene
• Grouping and Joins
• Stats, expressions, transformations
and more
• Lang. Detection
• Extensible
• Massive Scale/Fault tolerance
Solr Key Features
6. Why SQL
• Simple, well-known interface to data inside Solr
• Hides the complexity of Solr and its various features
• Possible to optimise the query plan according to best-practices
automatically
• Distributed Joins done simply and well
7. Solr 6: Parallel SQL
• Parallel execution of SQL across SolrCloud collections
• Compiled to SolrJ Streaming API (TupleStream) which is a general
purpose parallel computing framework for Solr
• Executed in parallel over SolrCloud worker nodes
• SolrCloud collections are relational ‘tables’
• JDBC thin client as a SolrJ client
9. SQL Interface at a glance
• SQL over Map/Reduce — for high cardinality aggregations and
distributed joins
• SQL over Facets — high performance, moderate cardinality
aggregations
• SQL with Solr powered search queries
• Fully integrated with SolrCloud
• SQL over JDBC or HTTP — http://host:port/solr/collection1/sql
10. Limited vs Unlimited SELECT
• select movie, director from IMDB
Returns the entire result set! Return fields must be DocValues
• select movie, directory from IMDB limit 100
Returns specified number of records. It can sort by score and
retrieve any stored field
• select movie, director from IMDB order by rating desc, num_voters
desc
11. Search predicates
• select movie, director from IMDB where actor = ‘bruce’
• select movie, director from IMDB where actor = ‘(bruce tom)’
• select movie, director from IMDB where rating = ‘[8 TO *]’
• select movie, director from IMDB where (actor = ‘(bruce tom)’ AND
rating = ‘[8 TO *]’)
Search predicates are Solr queries specified inside single-quotes
Can specify arbitrary boolean clauses
12. Select DISTINCT
• select distinct actor_name from IMDB
• Map/Reduce implementation — Tuples are shuffled to worker
nodes and operation is performed by workers
• JSON Facet implementation — operation is ‘pushed down’ to Solr
13. Stats aggregations
• select count(*), sum(num_voters) from IMDB
• Computed using Solr’s StatsComponent under the hood
• count, sum, avg, min, max are the supported aggregations
• Always pushed down into the search engine
14. GROUP BY Aggregations
• select actor_name, director, count(*), sum(num_voters) from IMDB
group by actor_name, director having count(*) > 5 and
sum(num_voters) > 1000 order by sum(num_voters) desc
• Has a map/reduce implementation (shuffle) and a JSON Facet
implementation (push down)
• Multi-dimensional, high cardinality aggregations are possible with
the map/reduce implementation
16. JDBC
• Part of SolrJ
• SolrCloud Aware Load Balancing
• Connection has ‘aggregationMode’ parameter that can switch
between map_reduce or facet
• jdbc:solr://SOLR_ZK_CONNECTION_STRING?
collection=COLLECTION_NAME&aggregationMode=facet
19. Streaming API
• Java API for parallel computation
• Real-time Map/Reduce and Parallel Relational Algebra
• Search results are streams of tuples (TupleStream)
• Transformed in parallel by Decorator streams
• Transformations include group by, rollup, union, intersection,
complement, joins
• org.apache.solr.client.solrj.io.*
20. Streaming API
• Streaming Transformation
Operations that transform the underlying streams e.g. unique,
group by, rollup, union, intersection, complement, join etc
• Streaming Aggregation
Operations that gather metrics and compute aggregates e.g. sum,
count, average, min, max etc
21. Streaming Expressions
• String Query Language and Serialisation format for the Streaming
API
• Streaming expressions compile to TupleStream
• TupleStream serialise to Streaming Expressions
• Human friendly syntax for Streaming API accessible to non-Java
folks as well
• Can be used directly via HTTP to SolrJ
23. Streaming Expressions
• Stream Sources
The origin of a TupleStream
search, jdbc, facet, stats, topic
• Stream Decorators
Wrap other stream functions and perform operations on the stream
complement, hashJoin, innerJoin, merge, intersect, top, unique
• Many streams can be paralleled across worker collections
24. Shuffling
• Shuffling is pushed down to Solr
• Sorting is done by /export handler which stream-sorts entire result sets
• Partitioning is done by HashQParserPlugin which is a filter that
partitions on arbitrary fields
• Tuples (search results) start streaming instantly to worker nodes never
requiring a spill to the disk.
• All replicas shuffle in parallel for the same query which allows for
massively parallel IO and huge throughputs.
25. Worker collections
• Regular SolrCloud collections
• Perform streaming aggregations using the Streaming API
• Receive shuffled streams from the replicas
• Over an HTTP endpoint: /stream
• May be empty or created just-in-time for specific analytical queries
or have data as any regular SolrCloud collection
• The goal is to separate processing from data if necessary
26. Parallel SQL
• The Presto parser compiles SQL to a TupleStream
• TupleStream is serialised to a Streaming Expression and sent over
the wire to worker nodes
• Worker nodes convert the Streaming Expression back into a
TupleStream
• Worker nodes open() and read() the TupleStream in parallel