This document discusses open source relational databases. It begins by introducing the presenter and topic, which is the current state of components in open source SQL databases. It then covers key components such as the storage engine, query planner, protocols, transaction model, and others. For each component, it discusses the approaches taken by different databases like PostgreSQL, MySQL, CockroachDB, and ClickHouse. It also addresses topics like horizontal scalability and replication strategies. Overall, the document provides a detailed overview and comparison of the architectural components and capabilities across major open source relational database management systems.
Spotify: Horizontal Scalability for Great SuccessNick Barkas
The document discusses Spotify's use of horizontal scalability to handle its large user and music catalog sizes. It describes how Spotify scales out by distributing work across separate services and handling shared data through techniques like sharding and eventual consistency. Key approaches Spotify uses include running multiple instances of each service, using load balancers to distribute requests, storing only necessary data in globally consistent databases, and implementing distributed hash tables for service discovery.
Serialization and performance by Sergey MorenetsAlex Tumanoff
The document discusses serialization frameworks in Java and compares their performance. It provides an overview of popular serialization frameworks like Java serialization, Kryo, Protocol Buffers, Jackson, Google GSON, and others. Benchmark tests were conducted to compare the frameworks' speed of serialization and deserialization, as well as the size of serialized objects. Kryo with optimizations was generally the fastest, while Protocol Buffers was very fast for simple objects. The document concludes with recommendations on when to use different frameworks.
Ballerina is a new programming language that is designed and optimized for integration. Ballerina revolutionized the way you model integration scenarios with its graphical and textual syntax which is built on top of the sequence diagram metaphor. It is fully container native and 100% open source technology.
This document discusses combining Apache Hive, HBase, Phoenix, and Calcite to build a single data store that can be used for both analytics and transaction processing. It describes improvements being made to Hive such as LLAP to enable sub-second query performance, and using HBase as the Hive metastore for better performance and scalability. It also discusses improvements to Phoenix such as adding more analytics functions and transactions. The document proposes sharing components between Hive and Phoenix such as a single JDBC driver, SQL dialect, and analytics functions to allow transactional and analytical data to be accessed together with a single SQL interface.
An overview of all the different content related technologies at the Apache Software Foundation
Talk from ApacheCon NA 2010 in Atlanta in November 2010
Pulsar Summit Asia - Structured Data Stream with Apache PulsarShivji Kumar Jha
This document discusses Apache Pulsar schemas. It begins with background on Pulsar, serialization, and schema evolution. It then discusses the benefits of using schemas with Pulsar, including different schema types like primitive, JSON, and Avro schemas. It describes how Pulsar uses a schema registry to store schemas on the server side rather than client side. Key learnings are to use structured schemas like Avro to model domain objects, consider compatibility and ordering when designing topics, and manage schemas through a code review process. The document provides references for further reading on Pulsar schemas and schema evolution.
Providing true interactive and scalable BI on Hadoop is proven to be one of the biggest challenges that is preventing completion of legacy EDW OLAP system transit to Hadoop. While we have all seen many benchmarks running consecutive queries claiming success, having thousands of concurrent business users sending complicated generated queries from their dashboards over billions of records while delivering interactive speed is yet to be seen.
In this session we will discuss how an architecture that replaces full-scan brute-force approach with adaptive indexing and auto-generated cubes can dramatically reduce the resources and effort per query, resulting in interactive performance for high concurrency workloads and explain how this is achieved with minimum data engineering efforts. We will also discuss how this architecture can be seamlessly integrated with Hive to provide a complete OLAP-on-Hadoop solution.
Session will include live demo of complex business dashboards connected to Hive and accessing billions of rows at interactive speed.
Speaker
Boaz Raufman, CTO and Co-Founder, JethroData
Having used apache pulsar in production for an year for our pub sub use cases such as stream analytics, event sourcing etc, this slide deck presents the lesson learned per se understanding the architecture, tuning the cluster, managing to keep it highly available and fault tolerant and much more.
While the slides are presented in terms of apache pulsar, a lot of the concepts can be easily extended to a lot of distributed systems.
The views here are my own and do not represent the view of nutanix corporation.
This document provides an introduction to HBase, including its definition, storage model, use cases, and basic data access. HBase is a distributed, scalable NoSQL database built on Hadoop that allows for high-performance read/write operations on large datasets. It provides a distributed, multidimensional sorted map and supports operations like get, scan, put, and delete. The document demonstrates how to access HBase using its Java API for DDL and DML operations like creating/altering tables, putting/getting/scanning data. It also discusses how HBase is used at scale by Facebook for messaging and insights data.
Serialization is the process of converting an object into a byte stream to store or transmit the object. The document discusses three serialization frameworks: Avro, MessagePack, and Kryo. Avro uses a JSON-defined schema and is created by the creator of Hadoop. MessagePack supports rich data structures like JSON and has interfaces for RPC. Kryo makes serialization easy by collecting serializers by class and supports compression.
Rr embedded systems linux system programming and kernel internalsShailaja Gadagoju
Embedded Systems Training in Hyderabad with Linux System Programming And Kernel Internals at RR Embedded TRainings Hyderabad and also provides Embedded placements in India. RR EmbedLabs is well known as the best Embedded systems training India
Embedded systems training India - Linux system programming and kernel intern...RR Embedded
Embedded Systems Training in Hyderabad with Linux System Programming And Kernel Internals at RR Embedded TRainings Hyderabad and also provides Embedded placements in India. RR EmbedLabs is well known as the best Embedded systems training India
This document provides an overview of large scale data ingestion using Apache Flume. It discusses why event streaming with Flume is useful, including its scalability, event routing capabilities, and declarative configuration. It also covers Flume concepts like sources, channels, sinks, and how they connect agents together reliably in a topology. The document dives into specific source, channel, and sink types including examples and configuration details. It also discusses interceptors, channel selectors, sink processors, and ways to integrate Flume into applications using client SDKs and embedded agents.
This document discusses how coordinating the many tools of big data has become more complex with the rise of cloud computing and large datasets. It argues that while having many tools provides flexibility, it also leads to inefficiencies as tools do not integrate well and developers end up duplicating work. The document proposes that Hadoop can help address these issues by providing shared services that tools can leverage, such as common table management, metadata access, and a new execution engine called Tez that allows for more efficient pipelining of jobs compared to the traditional MapReduce approach. Coordinating tools through shared services allows users to focus on selecting the right tool for a task while reducing redundant development work.
Kerberos is the system which underpins the vast majority of strong authentication across the Apache HBase/Hadoop application stack. Kerberos errors have brought many to their knees and it is often referred to as “black magic” or “the dark arts”; a long-standing joke that there are so few who understand how it works. This talk will cover the types of problems that Kerberos solves and doesn’t solve for HBase, decrypt some jargon on related libraries and technology that enable Kerberos authentication in HBase and Hadoop, and distill some basic takeaways designed to ease users in developing an application that can securely communicate with a “kerberized” HBase installation.
HBaseCon 2012 | HBase Filtering - Lars George, ClouderaCloudera, Inc.
This talk will run through the list of filters that are shipped with HBase and show how they are used from a client application. Filters expose varying feature sets, but also exhibit an equally varying impact on read performance – but neither are directly intuitive. A skilled HBase practitioner should know how to select the proper filter for a given use-case, or how to combine sets of filters to achieve what is needed. The talk will conclude with an example for a custom filter and explain how to deploy it on a cluster.
Distributed Logging Architecture in Container EraSATOSHI TAGOMORI
Distributed Logging Architecture in Container Era
The document discusses distributed logging architecture in the container era. It covers: 1) The difficulties of logging with microservices and containers due to their ephemeral and distributed nature, 2) The need to redesign logging to push logs from containers to destinations quickly without fixed addresses or mappings; 3) Common patterns for distributed logging architectures including source aggregation, destination aggregation, and scaling; and 4) A case study using Docker and Fluentd to implement source aggregation and scaling for logging. Open source solutions are important to keep the logging layer transparent, interoperable, and able to scale independently of applications and infrastructure.
Distributed Logging Architecture in the Container EraGlenn Davis
Presentation given at LinuxCon Japan 2016 by Satoshi "Moris" Tagomori (@tagomoris), Treasure Data. Describes various strategies for aggregating log data in a microservices architecture using containers, e.g. Docker.
This document provides an overview of patterns for scalability, availability, and stability in distributed systems. It discusses general recommendations like immutability and referential transparency. It covers scalability trade-offs around performance vs scalability, latency vs throughput, and availability vs consistency. It then describes various patterns for scalability including managing state through partitioning, caching, sharding databases, and using distributed caching. It also covers patterns for managing behavior through event-driven architecture, compute grids, load balancing, and parallel computing. Availability patterns like fail-over, replication, and fault tolerance are discussed. The document provides examples of popular technologies that implement many of these patterns.
Presto is an open source distributed SQL query engine that was originally developed by Facebook. It allows for fast SQL queries on large datasets across multiple data sources. Presto uses various optimizations like code generation, predicate pushdown, and data layout awareness to improve query performance. It is used at Facebook and other companies for interactive analytics, batch ETL, A/B testing, and app analytics where low latency and high concurrency are important.
ROS - an open-source Robot Operating Systemabirpahlwan
A presentation based on
"ROS - an open-source Robot Operating System" by Willow Garage.
Available at http://www.willowgarage.com/sites/default/files/icraoss09-ROS.pdf
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Manik Surtani
Manik Surtani is the founder and project lead of Infinispan, an open source data grid platform. He discussed data grids, NoSQL, and their role in cloud storage. Data grids evolved from distributed caches to provide features like querying, task execution, and co-location control. NoSQL systems are alternative data storage that is scalable and distributed but lacks relational structure. JSR 347 aims to standardize data grid APIs for the Java platform. Infinispan implements JSR 107 and will support JSR 347, acting as the reference backend for Hibernate OGM.
Modern software architectures - PHP UK Conference 2015Ricard Clau
The web has changed. Users demand responsive, real-time interactive applications and companies need to store and analyze tons of data. Some years ago, monolithic code bases with a basic LAMP stack, some caching and perhaps a search engine were enough. These days everybody is talking about micro-services architectures, SOA, Erlang, Golang, message passing, queue systems and many more. PHP seems to not be cool anymore but... is this true? Should we all forget everything we know and just learn these new technologies? Do we really need all these things?
This document summarizes a presentation about DocumentDB on Azure. It discusses what DocumentDB is, how it works as a fully managed NoSQL database, and some key features for developers. DocumentDB allows storing and querying JSON documents, offers tunable consistency levels, and exposes APIs for common languages like .NET, Node.js, and Python. The presentation provides an overview of DocumentDB's capabilities and when it would be a good fit compared to relational databases or other document stores.
Apache Geode is an open source in-memory data grid that provides data distribution, replication and high availability. It can be used for caching, messaging and interactive queries. The presentation discusses Geode concepts like cache, region and member. It provides examples of how large companies use Geode for applications requiring real-time response, high concurrency and global data visibility. Geode's performance comes from minimizing data copying and contention through flexible consistency and partitioning. The project is now hosted by Apache and the community is encouraged to get involved through mailing lists, code contributions and example applications.
HBase is a distributed, column-oriented database that stores data in tables divided into rows and columns. It is optimized for random, real-time read/write access to big data. The document discusses HBase's key concepts like tables, regions, and column families. It also covers performance tuning aspects like cluster configuration, compaction strategies, and intelligent key design to spread load evenly. Different use cases are suitable for HBase depending on access patterns, such as time series data, messages, or serving random lookups and short scans from large datasets. Proper data modeling and tuning are necessary to maximize HBase's performance.
From: DataWorks Summit 2017 - Munich - 20170406
HBase hast established itself as the backend for many operational and interactive use-cases, powering well-known services that support millions of users and thousands of concurrent requests. In terms of features HBase has come a long way, overing advanced options such as multi-level caching on- and off-heap, pluggable request handling, fast recovery options such as region replicas, table snapshots for data governance, tuneable write-ahead logging and so on. This talk is based on the research for the an upcoming second release of the speakers HBase book, correlated with the practical experience in medium to large HBase projects around the world. You will learn how to plan for HBase, starting with the selection of the matching use-cases, to determining the number of servers needed, leading into performance tuning options. There is no reason to be afraid of using HBase, but knowing its basic premises and technical choices will make using it much more successful. You will also learn about many of the new features of HBase up to version 1.3, and where they are applicable.
Pincaster is an in-memory database that stores data in layers, with features like key-value storage, hashes, points with spatial indexing and queries, link relations between records, and expiration. It speaks HTTP/JSON for simple usage in any language. Data is stored in memory for speed but also durably logged to a human-readable journal file, with crash recovery and optional journal rewriting to improve startup times. Future plans include replication, clustering of spatial results, and client libraries.
Apache Drill is a data analytics system with a flexible architecture that allows for pluggable components. It includes a driver, parser, compiler/optimizer, execution engine, and storage handlers. The parser converts queries to an intermediate representation, which is optimized and then executed across a cluster by the execution engine. Drill supports various data formats and sources through its extensible storage interfaces and scanner operators. Its design focuses on flexibility, ease of use, dependability, and high performance.
Demystifying postgres logical replication percona live scEmanuel Calvo
This document provides an overview of logical replication in PostgreSQL, including:
- The different types of replication in PostgreSQL and how logical replication works
- How logical replication compares to MySQL replication and the elements involved
- What logical replication can be used for and some limitations
- Key concepts like publications, subscriptions, replication slots, and conflict handling
- Monitoring and configuration options for logical replication
The document discusses PostgreSQL full-text search (FTS). It covers FTS concepts like parsers, tokens, lexemes, and dictionaries. It also discusses native PostgreSQL FTS support and external solutions like Sphinx and Solr. The document provides examples of using FTS indexes and queries, and tips on preprocessing, ranking, and automating updates of FTS vectors.
The document discusses various PostgreSQL database hosting options on Amazon Web Services (AWS). It describes services like EC2 that allow running a customized PostgreSQL database on the cloud. It provides tips for setting up PostgreSQL replication, scaling the database vertically and horizontally, backups, monitoring with CloudWatch, and reducing costs. Other AWS services mentioned include S3, EBS, Redshift and tools for managing PostgreSQL on AWS.
This document summarizes a presentation about using PostgreSQL's native full text search capabilities and the Sphinx search engine. It discusses when each option may be preferable, how to configure and use Sphinx to index PostgreSQL data, and some key Sphinx features like distributed searching, misspelling corrections, and autocompletion. Sphinx can be used to offload text searches for improved performance and scalability compared to native PostgreSQL searching.
This document summarizes PalominoDB's service offerings and provides an agenda for a presentation on full-text search solutions in PostgreSQL. PalominoDB offers monthly support plans with discounts based on monthly spend. They are adding annual support contracts with consultation hours and emergency support. The presentation agenda covers goals of full-text search, native PostgreSQL support, external solutions like Sphinx and Solr, and tips for implementing full-text search.
Este documento presenta las nuevas características de PostgreSQL 9.1. El ponente, Emanuel Calvo, es un DBA experto en PostgreSQL, MySQL y Oracle. La presentación cubre temas como replicación síncrona mejorada, soporte de datos externos, internalización por columna, aislamiento serializable instantáneo, tablas efímeras, y más. El documento también menciona características menores como soporte SE-Linux y actualizaciones al lenguaje PL/pgSQL.
Este documento presenta un curso de administración básica de PostgreSQL 9.0. Cubrirá la instalación y configuración del servidor, herramientas de administración, mantenimiento de bases de datos, respaldos, replicación, seguridad, y optimización de consultas. El objetivo es que los asistentes obtengan los conocimientos necesarios para administrar, monitorear y entender la estructura de PostgreSQL.
This document discusses PostgreSQL and Solaris as a low-cost platform for medium to large scale critical scenarios. It provides an overview of PostgreSQL, highlighting features like MVCC, PITR, and ACID compliance. It describes how Solaris and PostgreSQL integrate well, with benefits like DTrace support, scalability on multicore/multiprocessor systems, and Solaris Cluster support. Examples are given for installing PostgreSQL on Solaris using different methods, configuring zones for isolation, using ZFS for storage, and monitoring performance with DTrace scripts.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/07/intels-approach-to-operationalizing-ai-in-the-manufacturing-sector-a-presentation-from-intel/
Tara Thimmanaik, AI Systems and Solutions Architect at Intel, presents the “Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” tutorial at the May 2024 Embedded Vision Summit.
AI at the edge is powering a revolution in industrial IoT, from real-time processing and analytics that drive greater efficiency and learning to predictive maintenance. Intel is focused on developing tools and assets to help domain experts operationalize AI-based solutions in their fields of expertise.
In this talk, Thimmanaik explains how Intel’s software platforms simplify labor-intensive data upload, labeling, training, model optimization and retraining tasks. She shows how domain experts can quickly build vision models for a wide range of processes—detecting defective parts on a production line, reducing downtime on the factory floor, automating inventory management and other digitization and automation projects. And she introduces Intel-provided edge computing assets that empower faster localized insights and decisions, improving labor productivity through easy-to-use AI tools that democratize AI.
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfjackson110191
These fighter aircraft have uses outside of traditional combat situations. They are essential in defending India's territorial integrity, averting dangers, and delivering aid to those in need during natural calamities. Additionally, the IAF improves its interoperability and fortifies international military alliances by working together and conducting joint exercises with other air forces.
Video traffic on the Internet is constantly growing; networked multimedia applications consume a predominant share of the available Internet bandwidth. A major technical breakthrough and enabler in multimedia systems research and of industrial networked multimedia services certainly was the HTTP Adaptive Streaming (HAS) technique. This resulted in the standardization of MPEG Dynamic Adaptive Streaming over HTTP (MPEG-DASH) which, together with HTTP Live Streaming (HLS), is widely used for multimedia delivery in today’s networks. Existing challenges in multimedia systems research deal with the trade-off between (i) the ever-increasing content complexity, (ii) various requirements with respect to time (most importantly, latency), and (iii) quality of experience (QoE). Optimizing towards one aspect usually negatively impacts at least one of the other two aspects if not both. This situation sets the stage for our research work in the ATHENA Christian Doppler (CD) Laboratory (Adaptive Streaming over HTTP and Emerging Networked Multimedia Services; https://athena.itec.aau.at/), jointly funded by public sources and industry. In this talk, we will present selected novel approaches and research results of the first year of the ATHENA CD Lab’s operation. We will highlight HAS-related research on (i) multimedia content provisioning (machine learning for video encoding); (ii) multimedia content delivery (support of edge processing and virtualized network functions for video networking); (iii) multimedia content consumption and end-to-end aspects (player-triggered segment retransmissions to improve video playout quality); and (iv) novel QoE investigations (adaptive point cloud streaming). We will also put the work into the context of international multimedia systems research.
In this follow-up session on knowledge and prompt engineering, we will explore structured prompting, chain of thought prompting, iterative prompting, prompt optimization, emotional language prompts, and the inclusion of user signals and industry-specific data to enhance LLM performance.
Join EIS Founder & CEO Seth Earley and special guest Nick Usborne, Copywriter, Trainer, and Speaker, as they delve into these methodologies to improve AI-driven knowledge processes for employees and customers alike.
Are you interested in learning about creating an attractive website? Here it is! Take part in the challenge that will broaden your knowledge about creating cool websites! Don't miss this opportunity, only in "Redesign Challenge"!
How Social Media Hackers Help You to See Your Wife's Message.pdfHackersList
In the modern digital era, social media platforms have become integral to our daily lives. These platforms, including Facebook, Instagram, WhatsApp, and Snapchat, offer countless ways to connect, share, and communicate.
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecJames Anderson
The lecture titled "Automating AppSec" delves into the critical challenges associated with manual application security (AppSec) processes and outlines strategic approaches for incorporating automation to enhance efficiency, accuracy, and scalability. The lecture is structured to highlight the inherent difficulties in traditional AppSec practices, emphasizing the labor-intensive triage of issues, the complexity of identifying responsible owners for security flaws, and the challenges of implementing security checks within CI/CD pipelines. Furthermore, it provides actionable insights on automating these processes to not only mitigate these pains but also to enable a more proactive and scalable security posture within development cycles.
The Pains of Manual AppSec:
This section will explore the time-consuming and error-prone nature of manually triaging security issues, including the difficulty of prioritizing vulnerabilities based on their actual risk to the organization. It will also discuss the challenges in determining ownership for remediation tasks, a process often complicated by cross-functional teams and microservices architectures. Additionally, the inefficiencies of manual checks within CI/CD gates will be examined, highlighting how they can delay deployments and introduce security risks.
Automating CI/CD Gates:
Here, the focus shifts to the automation of security within the CI/CD pipelines. The lecture will cover methods to seamlessly integrate security tools that automatically scan for vulnerabilities as part of the build process, thereby ensuring that security is a core component of the development lifecycle. Strategies for configuring automated gates that can block or flag builds based on the severity of detected issues will be discussed, ensuring that only secure code progresses through the pipeline.
Triaging Issues with Automation:
This segment addresses how automation can be leveraged to intelligently triage and prioritize security issues. It will cover technologies and methodologies for automatically assessing the context and potential impact of vulnerabilities, facilitating quicker and more accurate decision-making. The use of automated alerting and reporting mechanisms to ensure the right stakeholders are informed in a timely manner will also be discussed.
Identifying Ownership Automatically:
Automating the process of identifying who owns the responsibility for fixing specific security issues is critical for efficient remediation. This part of the lecture will explore tools and practices for mapping vulnerabilities to code owners, leveraging version control and project management tools.
Three Tips to Scale the Shift Left Program:
Finally, the lecture will offer three practical tips for organizations looking to scale their Shift Left security programs. These will include recommendations on fostering a security culture within development teams, employing DevSecOps principles to integrate security throughout the development
Kief Morris rethinks the infrastructure code delivery lifecycle, advocating for a shift towards composable infrastructure systems. We should shift to designing around deployable components rather than code modules, use more useful levels of abstraction, and drive design and deployment from applications rather than bottom-up, monolithic architecture and delivery.
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Chris Swan
Have you noticed the OpenSSF Scorecard badges on the official Dart and Flutter repos? It's Google's way of showing that they care about security. Practices such as pinning dependencies, branch protection, required reviews, continuous integration tests etc. are measured to provide a score and accompanying badge.
You can do the same for your projects, and this presentation will show you how, with an emphasis on the unique challenges that come up when working with Dart and Flutter.
The session will provide a walkthrough of the steps involved in securing a first repository, and then what it takes to repeat that process across an organization with multiple repos. It will also look at the ongoing maintenance involved once scorecards have been implemented, and how aspects of that maintenance can be better automated to minimize toil.
Implementations of Fused Deposition Modeling in real worldEmerging Tech
The presentation showcases the diverse real-world applications of Fused Deposition Modeling (FDM) across multiple industries:
1. **Manufacturing**: FDM is utilized in manufacturing for rapid prototyping, creating custom tools and fixtures, and producing functional end-use parts. Companies leverage its cost-effectiveness and flexibility to streamline production processes.
2. **Medical**: In the medical field, FDM is used to create patient-specific anatomical models, surgical guides, and prosthetics. Its ability to produce precise and biocompatible parts supports advancements in personalized healthcare solutions.
3. **Education**: FDM plays a crucial role in education by enabling students to learn about design and engineering through hands-on 3D printing projects. It promotes innovation and practical skill development in STEM disciplines.
4. **Science**: Researchers use FDM to prototype equipment for scientific experiments, build custom laboratory tools, and create models for visualization and testing purposes. It facilitates rapid iteration and customization in scientific endeavors.
5. **Automotive**: Automotive manufacturers employ FDM for prototyping vehicle components, tooling for assembly lines, and customized parts. It speeds up the design validation process and enhances efficiency in automotive engineering.
6. **Consumer Electronics**: FDM is utilized in consumer electronics for designing and prototyping product enclosures, casings, and internal components. It enables rapid iteration and customization to meet evolving consumer demands.
7. **Robotics**: Robotics engineers leverage FDM to prototype robot parts, create lightweight and durable components, and customize robot designs for specific applications. It supports innovation and optimization in robotic systems.
8. **Aerospace**: In aerospace, FDM is used to manufacture lightweight parts, complex geometries, and prototypes of aircraft components. It contributes to cost reduction, faster production cycles, and weight savings in aerospace engineering.
9. **Architecture**: Architects utilize FDM for creating detailed architectural models, prototypes of building components, and intricate designs. It aids in visualizing concepts, testing structural integrity, and communicating design ideas effectively.
Each industry example demonstrates how FDM enhances innovation, accelerates product development, and addresses specific challenges through advanced manufacturing capabilities.
Transcript: Details of description part II: Describing images in practice - T...BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and slides: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
Blockchain technology is transforming industries and reshaping the way we conduct business, manage data, and secure transactions. Whether you're new to blockchain or looking to deepen your knowledge, our guidebook, "Blockchain for Dummies", is your ultimate resource.
Quantum Communications Q&A with Gemini LLM. These are based on Shannon's Noisy channel Theorem and offers how the classical theory applies to the quantum world.
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsMydbops
This presentation, delivered at the Postgres Bangalore (PGBLR) Meetup-2 on June 29th, 2024, dives deep into connection pooling for PostgreSQL databases. Aakash M, a PostgreSQL Tech Lead at Mydbops, explores the challenges of managing numerous connections and explains how connection pooling optimizes performance and resource utilization.
Key Takeaways:
* Understand why connection pooling is essential for high-traffic applications
* Explore various connection poolers available for PostgreSQL, including pgbouncer
* Learn the configuration options and functionalities of pgbouncer
* Discover best practices for monitoring and troubleshooting connection pooling setups
* Gain insights into real-world use cases and considerations for production environments
This presentation is ideal for:
* Database administrators (DBAs)
* Developers working with PostgreSQL
* DevOps engineers
* Anyone interested in optimizing PostgreSQL performance
Contact info@mydbops.com for PostgreSQL Managed, Consulting and Remote DBA Services
7 Most Powerful Solar Storms in the History of Earth.pdfEnterprise Wired
Solar Storms (Geo Magnetic Storms) are the motion of accelerated charged particles in the solar environment with high velocities due to the coronal mass ejection (CME).
4. Who and what is about?
• Emanuel Calvo, currently at
OnGres as a PostgreSQL
Consultant and ayres.io as
_root_.
• Working on Modern
techniques for DBRE.
• What is the current status of
the Open Source SQL
databases per component?
• What’s the good, the bad and
the ugly in the market?
6. The ER Map
• Needs a First-Order logic
language for retrieving data.
• Relational Algebra
• Tuple and Domain Relational
Calculus.
7. The model example
• Obscures everything behind
the complexity of the storage.
• It is represented as relational
algebra, but is hidden from
you.
• How to select the names of
the people of "Black" team?
8. Some SQL:2011 tangent
distinctions
• Support NULLs
• Support SubQueries
• Column precedence affects
(horizontal alignment)
depending on the engine
• SQL/MED
• Is a declarative language
• Hides all the complexity of the
executions to the end user
• Planners were very advanced
already.
10. The Entity Consistency
• CAP Theorem (Consistency, Availability and Partition
Tolerance). PACELC adds to choose between [L]atency
and [C]onsistency.
• ACID (Atomicity, Consistency, Isolation and Durability)
• BASE (Basically Available, Soft State, Eventual
consistency)
11. The chosen
We grab them by the storage and use them
wisely without paying money to Oracle.
14. • Storage Engine
• Planner
• Protocol
• Language
• Ecosystem
• Framework
• WAL
• Transaction Manager
• Source Code availability, documentation both user and internal,
community, etc.
• Buffer Management
• IO method (Direct/io, fsync)
• Transaction Management (storage
layer)
• Point in Time Recovery and Undo Log
• For distributed engines you want to
read Jepsen tests.
• Is the sauce
16. Quick cherry pick
• Fast for aggregations
• Easy for parallelization
• Better compression due to ColBased
• Better to scale massive amount of data
• Bloom filters
• Sparse indexes by design
• Avoid Write Amplification
• Index-based storage
• More disk efficient, more CPU
• Better for concurrency
• Hard to scale
• Better when manipulating entities
atomically
• Balance between performance and
concurrency.
17. – Jorge de Lanús Oeste (maneja Uber pero sabe mucho de Bases de Datos)
“Relational databases require a Query Optimizer/
Query Planner for translating the first-order logic
language to relational algebra and other
optimizations. The result is called Execution Plan.”
18. • Storage Engine
• Planner
• Protocol
• Language
• Ecosystem
• Framework
• WAL
• Transaction Manager
• Source Code availability, documentation both user and internal,
community, etc.
Plan: {
…
• Heuristic
• Cost based {Parametric, MO, MOP}
• Mixed
• Planner, Resolver, Opmitizer, Executor
19. • Storage Engine
• Planner
• Protocol
• Language
• Ecosystem
• Framework
• WAL
• Transaction Manager
• Source Code availability, documentation both user and internal,
community, etc.
Plan: {
…
• Heuristic
• Cost based {Parametric, MO, MOP}
• Mixed
• Planner, Resolver, Opmitizer, Executor
• MySQL has also Condition Pushdown
• PostgreSQL has a rich planner
• MySQL plan information lacks of
information
• PostgreSQL does not provide additional
tools for plan reading.
20. • Storage Engine
• Planner
• Protocol
• Language
• Ecosystem
• Framework
• WAL
• Transaction Manager
• Source Code availability, documentation both user and internal,
community, etc.
• Client Protocol
• Replication Protocol
• Logical/Binary
• Coordination Protocol
• HA protocol
• Gossip
• Consensus {RAFT, Paxos}
• …
21. • Storage Engine
• Planner
• Protocol
• Language
• Ecosystem
• Framework
• WAL
• Source Code availability, documentation both user and
internal, community, etc.
• Client Protocol
• Replication Protocol
• Coordination Protocol
• HA protocol
• Gossip
• Consensus {RAFT, paxos}
• …
• No standard
• JSON is becoming more present
(thankfully)
• Absence of internal consensus
22. • Storage Engine
• Planner
• Protocol
• Language
• Ecosystem
• Framework
• WAL
• Transaction Manager
• Source Code availability, documentation both user and internal,
community, etc.
• Abstract all relation algebra
• SQL != Relational
• NULLs
• Column Alignment
• Subquery
• Mixed implementations
• Relational is conceptually unable to
return more than 1 result set.
23. • Storage Engine
• Planner
• Protocol
• Language
• Ecosystem
• Framework
• WAL
• Transaction Manager
• Source Code availability, documentation both user and internal,
community, etc.
• Abstract all relation algebra
• SQL != Relational
• NULLs
• Column Alignment
• Subquery
• Mixed implementations
• Relational is conceptually unable to
return more than 1 result set.
• Standard
• Backward Compatibility
• Modern
What do
we want?
24. Postgres95 -> PostgreSQL
“Postgres original implementation was in QUEL and
its organization resembles to many of the concepts
of the original ER model. COPY is a inherited piece
from this prior implementation.”
25. • Storage Engine
• Planner
• Protocol
• Language
• Ecosystem
• Framework
• WAL
• Transaction Manager
• Source Code availability, documentation both user and internal,
community, etc.
• Single Provider or fake Open source
• Community contribution or
Social Entropy Experiment
• Satellite companies building tools
• Satellite companies building forks
• Satellite coders copy pasting
• Tons of under-proven libraries
26. • Storage Engine
• Planner
• Protocol
• Language
• Ecosystem
• Framework
• WAL
• Transaction Manager
• Source Code availability, documentation both user and internal,
community, etc.
• Multi-database tools tend to fail
awesomely
• Choose tools that are integrated with
the core and that have frequent updates
• Bug fixing tied to community times
• bugs.mysql.com
• Postgres uses mailing list
• Clickhouse/Cockroach use GH
27. • Storage Engine
• Planner
• Protocol
• Language
• Ecosystem
• Framework
• WAL
• Transaction Manager
• Source Code availability, documentation both user and internal,
community, etc.
• Core extensibility plugins or extensions
• Customize Planner
• Manage protocol
• Creating workers
• Creating own types
28. • Storage Engine
• Planner
• Protocol
• Language
• Ecosystem
• Framework
• WAL
• Transaction Manager
• Source Code availability, documentation both user and internal,
community, etc.
• Complex, generally in C.
• Multi-provider packages.
29. • Storage Engine
• Planner
• Protocol
• Language
• Ecosystem
• Framework
• WAL
• Transaction Manager
• Source Code availability, documentation both user and internal,
community, etc.
• WAL or Redo
• MySQL has undo log, but only for
rollback space.
• Postgres has extensions for rewind
(pg_rewind)
• It can reside on the Storage Engine
or higher layers
• It’s local and provides consistency and
durability
• Distributed WALs or Certification log could
be in this group, although there will be
always a WAL.
30. • Storage Engine
• Planner
• Protocol
• Language
• Ecosystem
• Framework
• WAL
• Transaction Manager
• Source Code availability, documentation both user and internal,
community, etc.
• It can be at node level or cluster level
• Concept of source and origin
• Group Replication
• Logical Replication
• Concept of Global Id
• Centralized Commits are possible through
Kafka brokers
• Functional sharing must relay on node
try level
• Serializable only supported by Postgres
• Uncommitted only supported by InnoDB
31. Other components or
capabilities
• Access Methods (B-Tree, L-Tree, Reverse, Hash)
• FTS (Full Text Search) and advanced search
• Geo capabilities
33. What is in the land of single
leader engines?
• Async
• Semi-synchronous replication
• First node response, as in MySQL.
• Simple Synchronous replication
• Quorum Synchronous
• Postgres
34. What is in the land of distributed/
multileader[less] engines?
• Asynchronous Multi Leader replication
• BDR
• Snapshot Isolation
• Galera (MySQL layer on top InnoDB)
• Serializability
• CockroachDB (2PC to a consensus group, with Hybrid Logical Clock, not strict serial)
• VoltDB
• External consistency
• Google Spanner (through True Time clocks).
35. The [full] architecture
Service Check HTTP
Replication Worker /
Certification /
Tx Coordination
Client Worker
Internal Pooling /
Thread Management /
Process per worker
External Pooling
Executor
• Write Quorum
• Single Leader
• Multi Leader
• Group Replication
• Inter node coordination
• Distributed Transactions
• Conflict-Free Replicated
Datatype (LWW, 2PC set,
etc)
• Consensus for HA
• Also in the entry points if
external
• Centralized Commit
36. The status of horizontal
scalability in OSDBs
• Non native support for distributed consensus.
• Only MySQL has Global identifiers and recently supported
Group Replication.
• There are extensions/forks for providing sharding in
Postgres and MySQL.