The document provides an overview of MariaDB ColumnStore, including its history, components, disk storage architecture, writing and querying data processes. It was presented by Andrew Hutchings, the lead software engineer for MariaDB ColumnStore, who has previous experience with MySQL, HP, and other companies. The presentation covers the technical use cases for ColumnStore, differences from row-oriented databases, and optimizations for ColumnStore.
This document provides an overview of MariaDB Galera Cluster and discusses some key features of Galera Cluster version 4, including huge transaction support through streaming replication and optimizing handling of inconsistencies to avoid unnecessary cluster-wide shutdowns. It summarizes Seppo Jaakola's presentation on the state of Galera Cluster and the roadmap for future releases.
What to expect from MariaDB Platform X5, part 2MariaDB plc
This document summarizes new features and enhancements in MariaDB MaxScale 2.5 and MariaDB ColumnStore 1.5. Some key points include:
- MaxScale 2.5 includes a new graphical user interface, improved binlog router, capability to stream binlogs to Kafka as JSON, and distributed caching between MaxScale servers.
- ColumnStore 1.5 features a new API, PowerBI direct query connector, improved replication from InnoDB, and multinode support in SkySQL.
- Configuration and installation of ColumnStore has been simplified, including using a new ColumnStore.xml utility and S3 storage manager for redundant file storage in object storage.
M|18 Why Abstract Away the Underlying Database InfrastructureMariaDB plc
MariaDB MaxScale is a database proxy that abstracts away the underlying database infrastructure. It provides a single logical view of the database even if it is physically distributed as a cluster. This simplifies application development and management. It also provides high availability through failure tolerance and load balancing for better performance. MaxScale uses monitors to detect the cluster topology and status, classifiers to understand queries, and routers to direct traffic, hiding the physical infrastructure and enabling horizontal scalability. Filters can further extend its functionality such as for caching, analytics, or patching SQL. Overall, MaxScale abstracts database clusters to make them easier to use while preserving high performance and availability.
MariaDB ColumnStore is a high performance columnar storage engine for MariaDB that supports analytical workloads on large datasets. It uses a distributed, massively parallel architecture to provide faster and more efficient queries. Data is stored column-wise which improves compression and enables fast loading and filtering of large datasets. The cpimport tool allows loading data into MariaDB ColumnStore in bulk from CSV files or other sources, with options for centralized or distributed parallel loading. Proper sizing of ColumnStore deployments depends on factors like data size, workload, and hardware specifications.
MaxScale uses an asynchronous and multi-threaded architecture to route client queries to backend database servers. Each thread creates its own epoll instance to monitor file descriptors for I/O events, avoiding locking between threads. Listening sockets are added to a global epoll file descriptor that notifies threads when clients connect, allowing connections to be distributed evenly across threads. This architecture improves performance over the previous single epoll instance approach.
How to migrate from Oracle Database with easeMariaDB plc
MariaDB introduced Oracle Database compatibility last May with support for Oracle Database data types, sequences, stored procedures (PL/SQL) and more, making it easier than ever to migrate to MariaDB. In this session, MariaDB's Alexander Bienemann and Wagner Bianchi share best practices and lessons learned from our experiences helping customers migrate from Oracle Database. They explain how MariaDB approaches migrations, what’s needed to complete a successful migration and the tools used to determine the level of effort required.
MariaDB Platform for hybrid transactional/analytical workloadsMariaDB plc
OpenWorks 2019 Session
In order to provide data-driven customers with more historical data and real-time analytics, MariaDB Platform can be configured for hybrid transactional/analytical workloads by leveraging row storage for current data transactions and columnar storage for historical data and analytics. In this session Shane Johnson, Senior Director of Product Marketing at MariaDB, shows how change-data-capture and query routing, both available out of the box, can be used to bring scalable analytics to customer-facing applications without changing their code – and without depending on a separate data warehouse.
How QBerg scaled to store data longer, query it fasterMariaDB plc
The continuous increase in terms of services and countries to which QBerg delivers its services requires an ever-increasing load of resources. During the last year QBerg has reached a critical point, storing so much transactional data that standard relational databases were unable to meet the SLAs, or support the features, required by customers. As an example, they had to cap web analytics to running on a maximum of four months of history. The introduction of MariaDB ColumnStore, flanked by existing MariaDB Server databases, not only will allow them to store multiple years’ worth of historical data for analytics – it decreased overall processing time by one order of magnitude right off the bat. The move to a unified platform was incremental, using MariaDB MaxScale as both a router and a replicator. QBerg is now able to replicate full InnoDB schemas to MariaDB ColumnStore and incrementally update big tables without impacting the performance of ongoing transactions.
M|18 Analyzing Data with the MariaDB AX PlatformMariaDB plc
The document summarizes new features in MariaDB AX, an open-source analytics platform. Key updates include: improved high availability and disaster recovery with GlusterFS support and parallel backup/restore; enhanced analytics capabilities like user-defined aggregate and window functions; and streamlined data ingestion with streaming and bulk data adapters for loading data from sources like Kafka and applications in real-time or batch. The platform provides scalable analytics on MariaDB ColumnStore through features like distributed storage, parallel queries, and automatic partitioning.
Faster, better, stronger: The new InnoDBMariaDB plc
For MariaDB Enterprise Server 10.5, the default transactional storage engine, InnoDB, has been significantly rewritten to improve the performance of writes and backups. Next, we removed a number of parameters to reduce unnecessary complexity, not only in terms of configuration but of the code itself. And finally, we improved crash recovery thanks to better consistency checks and we reduced memory consumption and file I/O thanks to an all new log record format.
In this session, we’ll walk through all of the improvements to InnoDB, and dive deep into the implementation to explain how these improvements help everything from configuration and performance to reliability and recovery.
Scylla Summit 2022: Making Schema Changes Safe with RaftScyllaDB
ScyllaDB adopted Raft as a consensus protocol in order to dramatically improve our operational aspects as well as provide strong consistency to the end-user. This talk will explain how Raft behaves in Scylla Open Source 5.0 and introduce the first end-user visible major improvement: schema changes. Learn how cluster configuration resides in Raft, providing consistent cluster assembly and configuration management. This makes bootstrapping safer and provides reliable disaster recovery when you lose the majority of the cluster.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
No matter how resilient your database infrastructure is, backups are still needed to defend against catastrophic failures. Be it the unlikely hardware failure of all data centers, or the more likely and all-too-human user error. Acknowledging the importance of good backup procedures, the Scylla Manager now natively supports backup and restore operations. In this talk, we will learn more about how that works and the guarantees provided, as well as how to set it up to guarantee maximum resiliency to your cluster.
In this session Max Mether, VP of Product Management at MariaDB, provides an introduction to MariaDB Platform X3 and the new features in MariaDB Server 10.3 and MariaDB MaxScale 2.3. He then turns his focus to what’s coming in MariaDB Server 10.4, including instant DROP COLUMN, the INTERVAL data type and advanced security features like account locking.
Postgres-XC is a shared-nothing PostgreSQL cluster that scales horizontally by distributing data across multiple nodes. It supports both replicated and distributed tables. Replicated tables store each row on all nodes, while distributed tables store each row on a single node according to the distribution strategy. The document discusses Postgres-XC's architecture, data distribution techniques, query processing, and provides an example of how to distribute the tables in the TPC-B benchmark schema for optimal performance.
M|18 How DBAs at TradingScreen Make Life Easier With AutomationMariaDB plc
This document discusses how Tradingscreen automates tasks related to managing their MariaDB databases across multiple environments. Some key points:
- Tradingscreen has over 100 database servers across different regions to support their financial services clients.
- They developed tools like DBABot, RosBot, and various scripts to automate backups, replication monitoring, user access removal, and schema deployments across their environments in order to reduce errors and make processes more efficient.
- The tools leverage technologies like Percona Toolkit, XtraBackup, Git, and APIs to perform tasks like backups, replication monitoring, schema changes, and more. This allows a smaller team of DBAs to manage a large, globally distributed database infrastructure
How Scylla Make Adding and Removing Nodes Faster and SaferScyllaDB
When a new node is added or removed, Scylla has to transfer part of the existing data from some nodes to their neighbors. When a node fails, Scylla has to repopulate its data with data from the surviving replicas. Those operations are collectively referred to as "streaming" operations, since they simply stream data from one node to another, without using this opportunity to also fix discrepancies in the data. This is in contrast with the repair operation, that looks into all existing replicas and reconcile their contents. Scylla is moving towards unifying those two operations. In this talk we will discuss why this is considered beneficial, and what other possibilities this opens to users.
This document summarizes the challenges and solutions for maintaining large PostgreSQL databases at Emma, including:
- Maintaining terabytes of data across multiple clusters up to version 9.0
- Facing performance issues when the hardware load was pushed to its limits
- Dealing with huge catalogs containing millions of data points that caused slow performance
- Addressing problems like bloat, backups that took hours, system resource exhaustion, and transaction wraparound issues
- Implementing solutions such as scripts to clean up bloat, sharding to a Linux filesystem, and increasing autovacuum thresholds
In this session Satoru Goto, Solutions Engineer at MariaDB, shows how the Pentaho connector for MariaDB ColumnStore can be used for both BI/reporting on MariaDB ColumnStore as well as loading data into MariaDB ColumnStore.
A Brief Introduction of TiDB (Percona Live)PingCAP
TiDB is an open-source distributed SQL database that supports high availability, horizontal scalability, and consistent distributed transactions. It provides a MySQL compatible API and seamless online expansion. TiDB uses Raft for consensus and implements the MVCC model to support high concurrency. It also provides distributed transactions through a two-phase commit protocol. The architecture consists of a stateless SQL layer (TiDB) and a distributed transactional key-value storage (TiKV).
This is the speech Max Liu gave at Percona Live Open Source Database Conference 2016.
Max Liu: Co-founder and CEO, a hacker with a free soul
The slide covered the following topics:
- Why another database?
- What kind of database we want to build?
- How to design such a database, including the principles, the architecture, and design decisions?
- How to develop such a database, including the architecture and the core technologies for TiKV and TiDB?
- How to test the database to ensure the quality and stability?
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
You can gain substantially more business insights and save costs by migrating your existing data warehouse to Amazon Redshift. This session will cover the key benefits of migrating to Amazon Redshift, migration strategies, and tools and resources that can help you in the process.
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
You can gain substantially more business insights and save costs by migrating your existing data warehouse to Amazon Redshift. This session will cover the key benefits of migrating to Amazon Redshift, migration strategies, and tools and resources that can help you in the process. We’ll learn about AWS Database Migration Service and AWS Schema Migration Tool, which were recently enhanced to import data from six common data warehouse platforms.
Amazon Redshift is a fully managed data warehouse service that allows for petabyte-scale analytics on data stored in columns. It uses a massively parallel processing architecture and columnar data storage to improve query performance. Defining sort keys and distribution keys appropriately is crucial to influence how data is stored and queries are processed in parallel across nodes. Automatic features like concurrency scaling, resize operations, and backups help ensure the warehouse scales and remains available as data and usage grow over time.
Demystifying MS17-010: Reverse Engineering the ETERNAL ExploitsPriyanka Aash
"MS17-010 is the most important patch in the history of operating systems, fixing remote code execution vulnerabilities in the world of modern Windows. The ETERNAL exploits, written by the Equation Group and dumped by the Shadow Brokers, have been used in the most damaging cyber attacks in computing history: WannaCry, NotPetya, Olympic Destroyer, and many others.
Yet, how these complicated exploits work has not been made clear to most. This is due to the ETERNAL exploits taking advantage of undocumented features of the Windows kernel and the esoteric SMBv1 protocol.
This talk will condense years of research into Windows internals and the SMBv1 protocol driver. Descriptions of full reverse engineering of internal structures and all historical background info needed to understand how the exploit chains for ETERNALBLUE, ETERNALCHAMPION, ETERNALROMANCE, and ETERNALSYNERGY work will be provided.
This talk will also describe how the MS17-010 patch fixed the vulnerabilities, and identify additional vulnerabilities that were patched around the same time."
Best Practices for Migrating your Data Warehouse to Amazon Redshift Amazon Web Services
This document provides best practices for migrating a data warehouse to Amazon Redshift. It discusses why companies migrate to Redshift due to its scalability, performance and cost advantages. Example migration stories are provided from companies that achieved significant improvements after migrating large datasets from Oracle, Greenplum and SQL on Hadoop to Redshift. The document also outlines the Redshift cluster architecture, data loading best practices including file splitting and column encoding, schema design considerations and available migration tools.
TiDB and Amazon Aurora can be combined to provide analytics on transactional data without needing a separate data warehouse. TiDB Data Migration (DM) tool allows migrating and replicating data from Aurora into TiDB for analytics queries. DM provides full data migration and incremental replication of binlog events from Aurora into TiDB. This allows joining transactional and analytical workloads on the same dataset without needing ETL pipelines.
Big data challenges are common : we are all doing aggregations , machine learning , anomaly detection, OLAP ...
This presentation describe how InnerActive answer those requirements
This document discusses solutions for generating unique IDs in distributed systems. It describes existing solutions like auto-incrementing database IDs, ticket servers, and UUIDs, and their pros and cons. It then explains Twitter's Snowflake algorithm in detail, which generates compact, sortable, unique IDs across distributed nodes at high speeds without coordination. Finally, it introduces SepTech's Snowflake4S library, which is inspired by Twitter's Snowflake and makes unique ID generation easily embeddable in applications.
Sesión técnica: Big Data Analytics con MariaDB ColumnStoreMariaDB plc
MariaDB ColumnStore 1.0 is a columnar database that provides:
1) High performance analytics through massively parallel and distributed query processing on commodity servers.
2) High speed parallel data loading and extraction without blocking reads.
3) In-database analytics with complex joins, windowing functions, and extensible user defined functions.
We’ll present details about Argus, a time-series monitoring and alerting platform developed at Salesforce to provide insight into the health of infrastructure as an alternative to systems such as Graphite and Seyren.
Argus Production Monitoring at Salesforce HBaseCon
Tom Valine and Bhinav Sura (Salesforce)
We’ll present details about Argus, a time-series monitoring and alerting platform developed at Salesforce to provide insight into the health of infrastructure as an alternative to systems such as Graphite and Seyren.
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsAmazon Web Services
Using AWS has never been easier or more affordable to solve business problems and uncover new opportunities using data. Now, businesses of all sizes and across all industries can take advantage of big data technologies and easily collect, store, process, analyze, and share their data. Gain a thorough understanding of what AWS offers across the big data lifecycle and learn architectural best practices for applying these technologies to your projects. We will also deep dive into how to use AWS services such as Kinesis, DynamoDB, Redshift, and Quicksight to optimize logging, build real-time applications, and analyze and visualize data at any scale.
Scylla Summit 2017: Scylla for Mass Simultaneous Sensor Data Processing of ME...ScyllaDB
We will share Scylla adoption practices in equipment sensor data management of MES, Data Modeling Tips, Data Architecture using Scylla, configurations, and tunings.
The Parquet Format and Performance Optimization OpportunitiesDatabricks
The Parquet format is one of the most widely used columnar storage formats in the Spark ecosystem. Given that I/O is expensive and that the storage layer is the entry point for any query execution, understanding the intricacies of your storage format is important for optimizing your workloads.
As an introduction, we will provide context around the format, covering the basics of structured data formats and the underlying physical data storage model alternatives (row-wise, columnar and hybrid). Given this context, we will dive deeper into specifics of the Parquet format: representation on disk, physical data organization (row-groups, column-chunks and pages) and encoding schemes. Now equipped with sufficient background knowledge, we will discuss several performance optimization opportunities with respect to the format: dictionary encoding, page compression, predicate pushdown (min/max skipping), dictionary filtering and partitioning schemes. We will learn how to combat the evil that is ‘many small files’, and will discuss the open-source Delta Lake format in relation to this and Parquet in general.
This talk serves both as an approachable refresher on columnar storage as well as a guide on how to leverage the Parquet format for speeding up analytical workloads in Spark using tangible tips and tricks.
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS
At Uber we use high cardinality monitoring to observe and detect issues with our 4,000 microservices running on Mesos and across our infrastructure systems and servers. We’ll cover how we put the resulting 6 billion plus time series to work in a variety of different ways, auto-discovering services and their usage of other systems at Uber, setting up and tearing down alerts automatically for services, sending smart alert notifications that rollup different failures into individual high level contextual alerts, and more. We’ll also talk about how we accomplish all this with a global view of our systems with M3, our open source metrics platform. We’ll take a deep dive look at how we use M3DB, now available as an open source Prometheus long term storage backend, to horizontally scale our metrics platform in a cost efficient manner with a system that’s still sane to operate with petabytes of metrics data.
Mirko Damiani - An Embedded soft real time distributed system in Golinuxlab_conf
An embedded system usually involves low level languages like C and highly customized hardware. In this talk we will see a use case of a soft real time system which was developed taking a very different approach, written in Go. We will see what are the advantages of this choice, along with its limits.
Similar to M|18 Understanding the Architecture of MariaDB ColumnStore (20)
MariaDB Paris Workshop 2023 - NewpharmaMariaDB plc
This document summarizes Newpharma's transition from a standalone database server to an enterprise MariaDB Galera cluster configuration between 2018-2023. It discusses the business needs that drove the change, including increased traffic and access to multiple data sources. Key benefits of the Galera cluster are highlighted like synchronous replication, read/write access from any node, and automatic node joining. Challenges of migrating like converting table types and splitting large transactions are also outlined. The transition has supported Newpharma's growth to over 100 million euro in turnover.
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB plc
MariaDB is an open-source database that is highly tunable and modular. It allows for various storage engines, plugins, and configurations to optimize performance depending on usage. Key aspects that impact performance include memory allocation, disk access, query optimization, and architecture choices like replication, sharding, or using ColumnStore for analytics. Solutions like MyRocks, Spider, MaxScale can improve performance for transactional or large scale workloads by optimizing resources, adding high availability, and distributing load.
MariaDB Paris Workshop 2023 - MaxScale MariaDB plc
The document outlines requirements and criteria for a database solution involving two buildings 30km apart with a WAN link. The chosen solution was MariaDB with Galera cluster for high availability and synchronous replication across sites, along with Maxscale for read/write splitting and failover. Maxscale instances on each site allow for zero downtime database patching and upgrades per site, while the Galera cluster provides structure-independent synchronous replication between sites.
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB plc
MariaDB Enterprise Server 10.6 includes the following key features:
- New JSON functions and data types like UUID and INET4.
- Improved Oracle compatibility with function parameters.
- Enhanced partitioning capabilities like converting partitions.
- Optimistic ALTER TABLE for replicas to reduce downtime.
- Online schema changes without locking tables for improved performance.
- Security enhancements including password policies and privilege changes.
MariaDB SkySQL is a cloud database service that provides autonomous scaling, observability, and cloud backup capabilities. It offers multi-cloud and hybrid operations across AWS, Google Cloud, and on-premises databases. The service includes features like the Remote Observability Service (ROS) for monitoring across environments, and a Cloud Backup Service. It aims to provide a simple yet advanced service for scaling databases from small to extreme sizes with tools for automation, self-service, and unified operations.
The document discusses high availability solutions for MariaDB databases. It begins by defining high availability and concepts like Recovery Time Objective (RTO) and Recovery Point Objective (RPO). It then presents different MariaDB and MaxScale architectures that provide high availability, including single node, primary-replica, Galera cluster, and SkySQL solutions. Key aspects covered are automatic failover, load balancing, data filtering, and service level agreements.
Die Neuheiten in MariaDB Enterprise ServerMariaDB plc
This document summarizes new features in MariaDB Enterprise Server. Key points include:
- MariaDB Enterprise Server is geared toward enterprise customers and focuses on stability, robustness, and predictability.
- It has a longer release cycle than Community Server, with new versions every 2 years and long maintenance cycles. New features from Community Server are backported.
- Recent additions include analytics functions, JSON support, bi-temporal modeling, schema changes, database compatibility features, and security enhancements.
- The upcoming 23.x release will include new JSON functions, data types like UUID and INET4, Oracle compatibility features, partitioning improvements, and Galera enhancements.
Global Data Replication with Galera for Ansell Guardian®MariaDB plc
Ansell Guardian® faced challenges with their previous database replication solution as their data and usage grew globally. They evaluated MariaDB/Galera and implemented it to replace their legacy solution. The implementation was smooth using automation scripts. MariaDB/Galera provided increased performance, faster deployment times, and more reliable data synchronization across their 3 data centers compared to their previous solution. It helped resolve a critical data divergence issue and improved the user experience. They plan to further enhance their database infrastructure using MaxScale in the future.
SkySQL is the first and only database-as-a-service (DBaaS) to perform workload analysis with advanced deep learning models, identifying and classifying discrete workload patterns so DBAs can better understand database workloads, identify anomalies and predict changes.
In this session, we’ll explain the concepts behind workload analysis and show how it can be used in the real world (and with sample real-world data) to improve database performance and efficiency by identifying key metrics and changes to cyclical patterns.
SkySQL uses best-of-breed software, and when it comes to metrics and monitoring that means Prometheus and Grafana. SkySQL Monitor is built on both, and provides customers with interactive dashboards for both real-time and historic metrics monitoring. In addition, it meets the same high availability and security requirements as other SkySQL components, ensuring metrics are always available and always secure.
In this session, we’ll explain how SkySQL Monitor works, walk through its dashboards and show how to monitor key metrics for performance and replication.
Introducing the R2DBC async Java connectorMariaDB plc
Not too long ago, a reactive variant of the JDBC driver was released, known as Reactive Relational Database Connectivity (R2DBC for short). While R2DBC started as an experiment to enable integration of SQL databases into systems that use reactive programming models, it now specifies a full-fledged service-provider interface that can be used to retrieve data from a target data source.
In this session, we’ll take a look at the new MariaDB R2DBC connector and examine the advantages of fully reactive, non-blocking development with MariaDB. And, of course, we’ll dive in and get a first-hand look at what it’s like to use the new connector with some live coding!
The capabilities and features of MariaDB Platform continue to expand, resulting in larger and more sophisticated production deployments – and the need for better tools. To provide DBAs with comprehensive, consolidating tooling, we created MariaDB Enterprise Tools: an easy-to-use, modular command-line interface for interacting with any part of MariaDB Platform.
In this session, we will provide a preview of the MariaDB Enterprise Client, walk through current and planned modules and discuss future plans for MariaDB Enterprise Tools – including SkySQL modules and the ability to create custom modules.
SkySQL implements a groundbreaking, state-of-the-art architecture based on Kubernetes and ServiceNow, and with a strong emphasis on cloud security – using compartmentalization and indirect access to secure and protect customer databases.
In this session, we’ll walk through the architecture of SkySQL and discuss how MariaDB leverages an advanced Kubernetes operator and powerful ServiceNow configuration/workflow management to deploy and manage databases on cloud infrastructure.
What to expect from MariaDB Platform X5, part 1MariaDB plc
MariaDB Platform X5 will be based on MariaDB Enterprise Server 10.5. This release includes Xpand, a fully distributed storage engine for scaling out, as well as many new features and improvements for DBAs and developers alike, including enhancements to temporal tables, additional JSON functions, a new performance schema, non-blocking schema changes with clustering and a Hashicorp Vault plugin for key management.
In this session, we’ll walk through all of the new features and enhancements available in MariaDB Enterprise Server 10.5. In addition, we will highlight those being backported to maintenance releases of MariaDB Enterprise Server 10.2, 10.3 and 10.4.
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...Fwdays
.NET 8 brought a lot of improvements for developers and maturity to the Azure serverless container ecosystem. So, this talk will cover these changes and explain how you can apply them to your projects. Another reason for this talk is the re-invention of Serverless from a DevOps perspective as a Platform Engineering trend with Backstage and the recent Radius project from Microsoft. So now is the perfect time to look at developer productivity tooling and serverless apps from Microsoft's perspective.
How UiPath Discovery Suite supports identification of Agentic Process Automat...DianaGray10
📚 Understand the basics of the newly persona-based LLM-powered Agentic Process Automation and discover how existing UiPath Discovery Suite products like Communication Mining, Process Mining, and Task Mining can be leveraged to identify APA candidates.
Topics Covered:
💡 Idea Behind APA: Explore the innovative concept of Agentic Process Automation and its significance in modern workflows.
🔄 How APA is Different from RPA: Learn the key differences between Agentic Process Automation and Robotic Process Automation.
🚀 Discover the Advantages of APA: Uncover the unique benefits of implementing APA in your organization.
🔍 Identifying APA Candidates with UiPath Discovery Products: See how UiPath's Communication Mining, Process Mining, and Task Mining tools can help pinpoint potential APA candidates.
🔮 Discussion on Expected Future Impacts: Engage in a discussion on the potential future impacts of APA on various industries and business processes.
Enhance your knowledge on the forefront of automation technology and stay ahead with Agentic Process Automation. 🧠💼✨
Speakers:
Arun Kumar Asokan, Delivery Director (US) @ qBotica and UiPath MVP
Naveen Chatlapalli, Solution Architect @ Ashling Partners and UiPath MVP
Redefining Cybersecurity with AI CapabilitiesPriyanka Aash
In this comprehensive overview of Cisco's latest innovations in cybersecurity, the focus is squarely on resilience and adaptation in the face of evolving threats. The discussion covers the imperative of tackling Mal information, the increasing sophistication of insider attacks, and the expanding attack surfaces in a hybrid work environment. Emphasizing a shift towards integrated platforms over fragmented tools, Cisco introduces its Security Cloud, designed to provide end-to-end visibility and robust protection across user interactions, cloud environments, and breaches. AI emerges as a pivotal tool, from enhancing user experiences to predicting and defending against cyber threats. The blog underscores Cisco's commitment to simplifying security stacks while ensuring efficacy and economic feasibility, making a compelling case for their platform approach in safeguarding digital landscapes.
Demystifying Neural Networks And Building Cybersecurity ApplicationsPriyanka Aash
In today's rapidly evolving technological landscape, Artificial Neural Networks (ANNs) have emerged as a cornerstone of artificial intelligence, revolutionizing various fields including cybersecurity. Inspired by the intricacies of the human brain, ANNs have a rich history and a complex structure that enables them to learn and make decisions. This blog aims to unravel the mysteries of neural networks, explore their mathematical foundations, and demonstrate their practical applications, particularly in building robust malware detection systems using Convolutional Neural Networks (CNNs).
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPathCommunity
Welcome to our third live UiPath Community Day Amsterdam! Come join us for a half-day of networking and UiPath Platform deep-dives, for devs and non-devs alike, in the middle of summer ☀.
📕 Agenda:
12:30 Welcome Coffee/Light Lunch ☕
13:00 Event opening speech
Ebert Knol, Managing Partner, Tacstone Technology
Jonathan Smith, UiPath MVP, RPA Lead, Ciphix
Cristina Vidu, Senior Marketing Manager, UiPath Community EMEA
Dion Mes, Principal Sales Engineer, UiPath
13:15 ASML: RPA as Tactical Automation
Tactical robotic process automation for solving short-term challenges, while establishing standard and re-usable interfaces that fit IT's long-term goals and objectives.
Yannic Suurmeijer, System Architect, ASML
13:30 PostNL: an insight into RPA at PostNL
Showcasing the solutions our automations have provided, the challenges we’ve faced, and the best practices we’ve developed to support our logistics operations.
Leonard Renne, RPA Developer, PostNL
13:45 Break (30')
14:15 Breakout Sessions: Round 1
Modern Document Understanding in the cloud platform: AI-driven UiPath Document Understanding
Mike Bos, Senior Automation Developer, Tacstone Technology
Process Orchestration: scale up and have your Robots work in harmony
Jon Smith, UiPath MVP, RPA Lead, Ciphix
UiPath Integration Service: connect applications, leverage prebuilt connectors, and set up customer connectors
Johans Brink, CTO, MvR digital workforce
15:00 Breakout Sessions: Round 2
Automation, and GenAI: practical use cases for value generation
Thomas Janssen, UiPath MVP, Senior Automation Developer, Automation Heroes
Human in the Loop/Action Center
Dion Mes, Principal Sales Engineer @UiPath
Improving development with coded workflows
Idris Janszen, Technical Consultant, Ilionx
15:45 End remarks
16:00 Community fun games, sharing knowledge, drinks, and bites 🍻
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Zilliz
Enterprises have traditionally prioritized data quantity, assuming more is better for AI performance. However, a new reality is setting in: high-quality data, not just volume, is the key. This shift exposes a critical gap – many organizations struggle to understand their existing data and lack effective curation strategies and tools. This talk dives into these data challenges and explores the methods of automating data curation.
Discovery Series - Zero to Hero - Task Mining Session 1DianaGray10
This session is focused on providing you with an introduction to task mining. We will go over different types of task mining and provide you with a real-world demo on each type of task mining in detail.
The History of Embeddings & Multimodal EmbeddingsZilliz
Frank Liu will walk through the history of embeddings and how we got to the cool embedding models used today. He'll end with a demo on how multimodal RAG is used.
Keynote : Presentation on SASE TechnologyPriyanka Aash
Secure Access Service Edge (SASE) solutions are revolutionizing enterprise networks by integrating SD-WAN with comprehensive security services. Traditionally, enterprises managed multiple point solutions for network and security needs, leading to complexity and resource-intensive operations. SASE, as defined by Gartner, consolidates these functions into a unified cloud-based service, offering SD-WAN capabilities alongside advanced security features like secure web gateways, CASB, and remote browser isolation. This convergence not only simplifies management but also enhances security posture and application performance across global networks and cloud environments. Discover how adopting SASE can streamline operations and fortify your enterprise's digital transformation strategy.
2. Who Am I?
● Andrew Hutchings, aka “LinuxJedi”
● Lead Software Engineer for MariaDB’s ColumnStore
● Previous worked for:
○ NGINX - Senior Developer Advocate / Technical
Product Manager
○ HP - Principal Software Engineer (HP Cloud / ATG)
○ SkySQL - Senior Sustaining Engineer
○ Rackspace - Senior Software Engineer
○ Sun/Oracle - MySQL Senior Support Engineer
● Co-author of MySQL 5.1 Plugin Development
● IRC/Twitter: LinuxJedi
● EMail: linuxjedi@mariadb.com
3. Overview
● History of MariaDB ColumnStore
● Technical Use Case
● Components of MariaDB ColumnStore
● Disk Storage
● Writing Data
● Querying Data
● Optimizing for MariaDB ColumnStore
● Closing Notes
● Questions
4. History of MariaDB ColumnStore
● March 2010 - Calpont launches InfiniDB
● September 2014 - Calpont (now itself called InfiniDB) closes down
○ MariaDB (then SkySQL) supports InfiniDB customers
● April 2016 - MariaDB announces development of MariaDB ColumnStore
● August 2016 - I joined MariaDB and jumped straight into ColumnStore
● December 2016 - MariaDB ColumnStore 1.0 GA
○ InfiniDB + MariaDB 10.1 + Many fixes and improvements
● November 2017 - MariaDB ColumnStore 1.1 GA
○ MariaDB 10.2 + APIs + Even more improvements
6. Technical Use Case
MariaDB ColumnStore
● Very large data sets
○ Many columns
○ Many millions of rows
● Complex joins and aggregates
● Rapid bulk data insertion
○ The larger the batch the better
Traditional OLTP Engines
● Smaller data sets
● Basic queries
● Lots of DML queries
● Complex data types
7. Data Types
● INT types - range is 2 less from max unsigned or min signed
● CHAR†
- max 255 bytes
● VARCHAR†
- max 8000 bytes
● DECIMAL - max 18 digits
● DOUBLE/FLOAT
● DATETIME - no sub-seconds (coming in 1.2)
● DATE
● BLOB/TEXT†
† Empty string is the same as NULL
8. Other DDL Differences
● No indexes
○ Columns are somewhat self-indexing
● Auto increment is handled differently (a table comment)
● No constraints
● PARTITION syntax not supported
○ Columns are partitioned automatically
9. Row-oriented vs. Column-oriented Format
ID Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
ID
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
SELECT Fname FROM People WHERE State = 'NY'
12. Query Processing
Shared Nothing Distributed Data Storage
SQL
Column
Primitives
User
Module
Performance
Module
UM
PM
Primitives ↓↓↓↓
Intermediate
↑↑Results↑↑
13. Hardware Requirements
● Lots of RAM
○ minimum 32GB for UM, 16GB for PM
○ minimum 4GB for trying single server out on a VM
● Optimised for HDD spindles, will still work with SSD
○ We are looking into SSD optimisation soon
● More cores typically better
○ 8 core minimum recommendation
● For AWS m4.4xlarge is the recommended minimum
15. Column Types
• 8-byte fixed length token (pointer).
• A variable length value stored at the
location identified by the pointer.
1-byte Field
with 8192
values per 8k
block
2-byte Field
with 4096
values per 8k
block
4-byte Field
with 2048
values per 8k
block
8-byte Field
with 1024
values per 8k
block
Dictionary structure
made up of 2
files/extents with:
16. Extent Map
Object ID The ID for the column (or dictionary)
Object Type Column or Dictionary
LBID Start / End Start / End Logical Block Pointer
Minimum Value Lowest value in the extent
Maximum Value Highest value in the extent
Width Column Width
DBRoot DBRoot (disk partition) number
Partition ID / Segment ID / Block Offset The extent number
High Water Mark Atomic last block pointer
19. Inserting Data
● Multiple methods
○ Single INSERTs
○ INSERT...SELECT
○ LOAD DATA INFILE
○ cpimport
○ Bulk Write API
● Designed for large bulk inserts
● Inserts are appended at the end of extents (or new extents created)
○ This means reads are not affected
○ A High Water Mark pointing to the last block is moved at the end of the insert
20. cpimport
● Uses CSV files or piped CSV data
● Fastest way to get data into ColumnStore
● Does minimal data conversion and pipes it straight into the PMs
○ Works by appending new blocks to the table and moving an atomic block pointer (HWM)
○ No UNDO log needed (atomic pointer not moved on rollback)
○ Therefore can cause a gap of 0-64KB in a column
● Can load multiple tables simultaneously
● Can load into multiple PMs for the same table simultaneously
● Can load into specific PMs for physical partitioning by PM
21. Bulk Write API
● A simple C++ API to inject data into the PMs
○ Bindings in Python and Java available
● Works in a similar way to cpimport
○ Append new blocks and an atomic block pointer (HWM)
● LGPL licensed
22. DML Writes
● Regular INSERT / UPDATE / DELETE
○ Also INSERT...SELECT and LOAD DATA INFILE when autocommit is off
● Slow compared to other engines
○ INSERT is very slow compared to cpimport
● Requires the use of a version buffer for an undo log
○ But INSERT appends to data blocks so no wasted storage
● Data sent to DMLProc to process
23. A Note About DELETE
● Need to touch every column and the undo log
○ So very slow
● Also leaves a gap in the column that won’t be filled
● Having a column that is marked using an UPDATE query is faster
● Dropping entire partitions is instantaneous
○ Partitions can be disabled first
24. INSERT...SELECT / LOAD DATA INFILE
● Injects the binary row data from MariaDB into cpimport
● Good for backwards compatibility with tools and remote loading
● cpimport then injects this data into the column extent files
○ In 1.2 it will use the write API instead
● If autocommit is turned off this will behave like regular DML instead (slow)
27. Extent Elimination
Horizontal
Partition:
8 Million Rows
Extent 2
Horizontal
Partition:
8 Million Rows
Extent 3
Horizontal
Partition:
8 Million Rows
Extent 1
Storage Architecture reduces I/O
• Only touch column files
that are in filter, projection, group by, and
join conditions
• Eliminate disk block touches
to partitions outside filter
and join conditions
Extent 1:
ShipDate: 2016-01-12 - 2016-03-05
Extent 2:
ShipDate: 2016-03-05 - 2016-09-23
Extent 3:
ShipDate: 2016-09-24 - 2017-01-06
SELECT Item, sum(Quantity) FROM Orders
WHERE ShipDate between ‘2016-01-01’ and ‘2016-01-31’
GROUP BY Item
Id OrderId Line Item Quantity Price Supplier ShipDate ShipMode
1 1 1 Laptop 5 1000 Dell 2016-01-12 G
2 1 2 Monitor 5 200 LG 2016-01-13 G
3 2 1 Mouse 1 20 Logitech 2016-02-05 M
4 3 1 Laptop 3 1600 Apple 2016-01-31 P
... ... ... ... ... ... ... ... ...
8M 2016-03-05
8M+1 2016-03-05
... ... ... ... ... ... ... ... ...
16M 2016-09-23
16M+1 2016-09-24
... ... ... ... ... ... ... ... ...
24M 2017-01-06
ELIMINATED PARTITION
ELIMINATED PARTITION
28. Query Analysis
MariaDB [tpch1]> select calsettrace(1);
...
MariaDB [tpch1]> select c_count, count(*) as custdist
-> from ( select c_custkey, count(o_orderkey) c_count
-> from v_customer left outer join v_orders on c_custkey = o_custkey
-> and o_comment not like '%special%requests%'
-> group by c_custkey ) c_orders
-> group by c_count
-> order by custdist desc, c_count desc;
...
42 rows in set, 1 warning (9.07 sec)
MariaDB [tpch1]> select calgetstats()G
*************************** 1. row ***************************
calgetstats(): Query Stats: MaxMemPct-4; NumTempFiles-0; TempFileSpace-0B; ApproxPhyI/O-0; CacheI/O-12503;
BlocksTouched-12503; PartitionBlocksEliminated-812; MsgBytesIn-102MB; MsgBytesOut-3KB; Mode-Distributed
1 row in set (0.00 sec)
29. Query Analysis
MariaDB [tpch1]> select calgettrace()G
*************************** 1. row ***************************
calgettrace():
Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows
BPS PM customer 7254 (c_custkey) 0 75 0 0.032 150000
TNS UM - - - - - - 0.045 150000
BPS PM customer 7254 (c_custkey) 0 0 75 0.000 0
TNS UM - - - - - - 0.000 0
TUS UM - - - - - - 0.303 150000
BPS PM orders 7268 (o_comment,o_custkey,o_orderkey) 0 12428 0 2.293 1500000
TNS UM - - - - - - 2.967 1500000
BPS PM orders 7268 (o_comment,o_custkey,o_orderkey) 0 0 737 0.000 0
TNS UM - - - - - - 0.000 0
TUS UM - - - - - - 3.796 1500000
HJS UM v_customer-v_orders - - - - - ----- -
TAS UM - - - - - - 1.658 150000
TNS UM - - - - - - 0.044 150000
TAS UM - - - - - - 0.050 42
1 row in set (0.01 sec)
30. Cross Engine Joins
● Allows non-ColumnStore tables to join
with ColumnStore
● The whole query is processed by
ColumnStore
● Cross Engine makes new MariaDB
connections to retrieve data from
non-ColumnStore tables Original
Query
Non-ColumnStore Query
(Cross Engine)
MariaDB
Server
ExeMgr
32. Data Modeling
● Star-schema optimizations are generally a good idea
● Conservative data typing is very important
○ Especially around fixed-length vs. dictionary boundary (8 bytes)
○ IP Address vs. IP Number
● Break down compound fields into individual fields:
○ Trivializes searching for sub-fields
○ Can avoid dictionary overhead
○ Cost to re-assemble is generally small
33. Data Insertion
● Order data as best you can before inserting
○ Helps extent elimination when min/max range for an extent is small
● Insert in large batches using cpimport or bulk write API
34. Improving Your Queries
● Avoid filtering on a >= 8byte VARCHAR/CHAR column where possible
○ Two extents need to be read per column, no extent elimination
● Use extent map elimination where possible
● Don’t use a function to filter
○ Extent elimination won’t happen
● Only reference required columns, avoid “SELECT *”
● Use the smallest possible data type for your data
● Avoid large ORDER BY
● Read https://mariadb.com/kb/en/mariadb/columnstore-performance-tuning/
35. Tuning
● Generally self-tuning
○ Uses as much RAM as possible automatically
○ Uses all CPU cores
● More RAM in PMs = more LRU data cache
● More RAM in UMs = ability to process aggregates / joins on bigger data sets
○ Disk joins are possible
37. MariaDB ColumnStore 1.2 (later in 2018)
● MariaDB 10.3 base
● TIME datatype
● Microsecond support
● Improvements to LOAD DATA INFILE and INSERT...SELECT
● Phase 1 of MariaDB ColumnStore Storage Engine Convergence project
● Many other cool things