1) Cassandra is a distributed database management system that provides high availability with no single point of failure.
2) It is well suited for applications that need to store large amounts of structured data and can handle very high write throughput.
3) Cassandra offers easy setup, maintenance, and scalability but requires careful data modeling to achieve high performance.
DataStax: Extreme Cassandra Optimization: The SequelDataStax Academy
Al has been using Cassandra since version 0.6 and has spent the last few months doing little else but tune Cassandra clusters. In this talk, Al will show how to tune Cassandra for efficient operation using multiple views into system metrics, including OS stats, GC logs, JMX, and cassandra-stress.
This document discusses scaling Cassandra for big data applications. It describes how Ooyala uses Cassandra for fast access to data generated by MapReduce, high availability key-value storage from Storm, and playhead tracking for cross-device resume. It outlines Ooyala's experience migrating to newer Cassandra versions as data doubled yearly, including removing expired tombstones, schema changes, and Linux performance tuning.
The document provides guidance on tuning Apache Spark jobs. It discusses tuning memory and garbage collection, optimizing shuffle operations, increasing parallelism through partitioning, monitoring jobs, and testing Spark applications.
Elastic HBase on Mesos aims to improve resource utilization of HBase clusters by running HBase in Docker containers managed by Mesos and Marathon. This allows HBase clusters to dynamically scale based on varying workload demands, increases utilization by running mixed workloads on shared resources, and simplifies operations through standard containerization. Key benefits include easier management, higher efficiency through elastic scaling and resource sharing, and improved cluster tunability.
(SDD403) Amazon RDS for MySQL Deep Dive | AWS re:Invent 2014Amazon Web Services
Learn about architecting a highly available RDS MySQL implementation to support your high-performance applications and production workloads. We will also talk about best practices in the areas of security, storage, compute configurations, and management that will contribute to your success with Amazon RDS for MySQL. In addition, you will learn about how to effectively move data between Amazon RDS and on-premises instances.
Petabyte search at scale: understand how DataStax Enterprise search enables complex real-time multi-dimensional queries on massive datasets. This talk will cover when and why to use DSE search, best practices, data modeling and performance tuning/optimization. Also covered will be a deep dive into how DSE Search operates, and the fundamentals of bitmap indexing.
This document discusses managing Apache Cassandra at scale. It provides an overview of Cassandra's history and evolution from Dynamo and BigTable. It also discusses Cassandra's data model and how it handles operations like reads, writes and updates in a distributed system without relying on read-modify-writes. The document also covers Cassandra best practices like using collections, lightweight transactions and time series data modeling to optimize for scalability.
C* Summit 2013: Cassandra at Instagram by Rick BransonDataStax Academy
Speaker: Rick Branson, Infrastructure Engineer at Instagram
Cassandra is a critical part of Instagram's large scale site infrastructure that supports more than 100 million active users. This talk is a practical deep dive into data models, systems architecture, and challenges encountered during the implementation process.
This document provides tips and best practices for debugging and tuning Spark applications. It discusses Spark concepts like RDDs, transformations, actions, and the DAG execution model. It then gives recommendations for improving correctness, reducing overhead from parallelism, avoiding data skew, and tuning configurations like storage level, number of partitions, executor resources and joins. Common failures are analyzed along with their causes and fixes. Overall it emphasizes the importance of tuning partitioning, avoiding shuffles when possible, and using the right configurations to optimize Spark jobs.
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...DataStax
We built an application based on the principles of CQRS and Event Sourcing using Cassandra and Spark. During the project we encountered a number of challenges and problems with Cassandra and the Spark Connector.
In this talk we want to outline a few of those problems and our actions to solve them. While some problems are specific to CQRS and Event Sourcing applications most of them are use case independent.
About the Speakers
Matthias Niehoff IT-Consultant, codecentric AG
works as an IT-Consultant at codecentric AG in Germany. His focus is on big data & streaming applications with Apache Cassandra & Apache Spark. Yet he does not lose track of other tools in the area of big data. Matthias shares his experiences on conferences, meetups and usergroups.
Stephan Kepser Senior IT Consultant and Data Architect, codecentric AG
Dr. Stephan Kepser is an expert on cloud computing and big data. He wrote a couple of journal articles and blog posts on subjects of both fields. His interests reach from legal questions to questions of architecture and design of cloud computing and big data systems to technical details of NoSQL databases.
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
In this presentation, i have explained how Ceph Object Storage Performance can be improved drastically together with some object storage best practices, recommendations tips. I have also covered Ceph Shared Data Lake which is getting very popular.
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...DataStax Academy
Presenter: Ben Vanberg, Senior Software Engineer at FullContact
Here at FullContact we have lots and lots of contact data. In particular we have more than a billion profiles over which we would like to perform ad hoc data analysis. Much of this data resides in Cassandra, and we have many analytics MapReduce jobs that require us to iterate across terabytes of Cassandra data. To solve this problem we've implemented our own splittable input format which allows us to quickly process large SSTables for downstream analytics.
Cassandra Summit 2014: Performance Tuning Cassandra in AWSDataStax Academy
Presenters: Michael Nelson, Development Manager at FamilySearch
A recent research project at FamilySearch.org pushed Cassandra to very high scale and performance limits in AWS using a real application. Come see how we achieved 250K reads/sec with latencies under 5 milliseconds on a 400-core cluster holding 6 TB of data while maintaining transactional consistency for users. We'll cover tuning of Cassandra's caches, other server-side settings, client driver, AWS cluster placement and instance types, and the tradeoffs between regular & SSD storage.
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!ScyllaDB
As a real time Big Data database, there are few things more important than keeping latencies low and bounded. Scylla has been delivering great tail latencies from our day one, but the job of making them better never ends and there is always more to do. In this talk we will explore some of the changes made to Scylla in the past few releases to help keep latencies down.
(SDD409) Amazon RDS for PostgreSQL Deep Dive | AWS re:Invent 2014Amazon Web Services
Learn the specifics of Amazon RDS for PostgreSQL's capabilities and extensions that make it powerful. This session covers database data import, performance tuning and monitoring, troubleshooting, security, and leveraging open source solutions with RDS. Throughout, this session focuses on capabilities particular to RDS for PostgreSQL.
AWS Redshift Introduction - Big Data AnalyticsKeeyong Han
Redshift is a scalable SQL database in AWS that can store up to 1.6PB of data across multiple servers. It uses a columnar data storage model that makes adding or removing columns fast. Data is uploaded from S3 using SQL COPY commands and queried using standard SQL. The document provides recommendations for getting started with Redshift, such as performing daily full refreshes initially and then implementing incremental update mechanisms to enable more frequent updates.
Scylla Summit 2018: What's New in Scylla Manager?ScyllaDB
Scylla Manager is a centralized cluster administration and tasks automation tool, it can automate repairs and new features are coming. I'll demo the new 1.3 version and tell about future of the project.
Slides from my talk at Cassandra Summit 2016 on troubleshooting Cassandra. This is a reprise of my popular talk from last summit, reorganized, expanded, and updated for Cassandra 3.0. In it I share the secrets I've learned in four years of supporting hundreds of customers using Apache Cassandra and DataStax Enterprise. Be sure to check out presenter notes for additional tips and links to further resources.
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuRedis Labs
Postgres and Redis Sitting in a Tree | In today’s world of polyglot persistence, it’s likely that companies will be using multiple data stores for storing and working with data based on the use case. Typically a company will
start with a relational database like Postgres and then add Redis for more high velocity use-cases. What if you could tie the two systems together to enable so much more?
Automation of Hadoop cluster operations in Arm Treasure DataYan Wang
This talk will focus on the journey we in the Arm Treasure Data hadoop team is on to simplify and automate how we deploy hadoop. In Arm Treasure Data, up to recently we were running hadoop clusters in two clouds. Due to fast increase of deployments into more sites, the overhead of manual operations has started to strain us. Due to this, we started a project last year to automate and simplify how we deploy using tools like AWS autoscaling groups. Steps we have taken so far are modernize and standardize instance types, moved from manually executed deployment scripts to api triggered work flows, actively working to deprecate chef in favor of debian packages and AWS Codedeploy. We have also started to automate a lot of operations that up to recently were manual, like scaling in and out clusters, and routing traffic between clusters. We also started simplify health check and node snapshotting. And our goal of the year is close to fully automated cluster operations.
This document discusses using Apache Cassandra for business intelligence, reporting and analytics. It covers:
- Data modeling and querying Cassandra data using CQL
- Accessing Cassandra data through drivers, ODBC/JDBC, and analytics frameworks like Spark and Hadoop
- Doing reporting, dashboards, and analytics on Cassandra data using CQL, Solr, Spark, and BI tools
- Capabilities of DataStax Enterprise for integrated search, batch analytics, and real-time analytics on Cassandra
- Example architectures that isolate workloads and handle hot vs cold data
Spark + Cassandra = Real Time Analytics on Operational DataVictor Coustenoble
This document discusses using Apache Spark and Cassandra together for real-time analytics on transactional data. It provides an overview of Cassandra and how it can be used for operational applications like recommendations, fraud detection, and messaging. It then discusses how the Spark Cassandra Connector allows reading and writing Cassandra data from Spark, enabling real-time analytics on streaming and batch data using Spark SQL, MLlib, and Spark Streaming. It also covers some DataStax Enterprise features for high availability and integration of Spark and Cassandra.
This document discusses Apache Cassandra, a distributed database management system. It provides an overview of Cassandra's features such as linear scalability, high performance and availability. The document also discusses how Cassandra addresses big data challenges through its integration of analytics and real-time capabilities. Several companies that use Cassandra share how it meets their needs for scalability, high performance and lower total cost of ownership compared to alternative solutions.
Cassandra DataTables Using RESTful APISimran Kedia
This project exposes Cassandra data tables through a REST API for querying large volumes of data. It builds a web interface to access the API and enables paginated results for user convenience. The interface automatically organizes data into Cassandra tables, handles REST queries to retrieve and display paginated results, and provides APIs for keyspace and column family management. It was implemented using Flask for the REST API, Cassandra's Python driver, and Jinja2/HTML for the user interface.
This document provides an overview of using Cassandra in web applications. It discusses why developers may consider using a NoSQL solution like Cassandra over traditional SQL databases. It then covers topics like Cassandra's architecture, data modeling, configuration options, APIs, development tools, and examples of companies using Cassandra in production systems. Key points emphasized are that Cassandra offers high performance but requires rewriting code and developing new processes and tools to support its flexible schema and data model.
This document provides an overview of the NodeJS Cassandra driver. It begins with a brief introduction of Cassandra and then discusses the driver's architecture, streaming capabilities, and API. Key aspects covered include connection pooling, request pipelining, load balancing policies, automatic failover, and data type mappings. The presentation concludes with a code example and demonstration of the driver.
Cassandra Java APIs Old and New – A Comparisonshsedghi
The document compares old Java APIs for Cassandra like Thrift, Hector and JDBC to the new DataStix Java driver. It provides an overview of each API, including how they interact with Cassandra (e.g. using Thrift), examples of basic operations like reading rows, and references for more information. It also briefly introduces Cassandra's data model and the binary protocol which the new driver utilizes.
Application Development with Apache Cassandra as a ServiceWSO2
WSO2 is an open source software company founded in 2005 that produces an entire middleware platform under the Apache license. Their business model involves selling comprehensive support and maintenance for their products. They have over 150 employees with offices globally. The document discusses using Apache Cassandra as a NoSQL database with WSO2's Column Store Service, including how to install the Cassandra feature, manage keyspaces and column families, and develop applications using the Java API Hector.
This document discusses using Node.js and Cassandra for highly concurrent systems. It explains that Node.js is well-suited for I/O-bound applications with low CPU usage that require high concurrency. This is because Node.js uses an event-driven, non-blocking model that handles connections efficiently in a single thread without much overhead. The document also introduces the Cassandra driver for Node.js, which features connection pooling, load balancing, retry functions, and row/field streaming for efficiently accessing Cassandra from Node.js applications. Examples are given showing how to perform queries, stream rows and fields to responses.
NodeJS : Communication and Round Robin WayEdureka!
The document provides an overview of the Mastering Node.js course offered by Edureka. It outlines the course objectives which include introducing Node.js, NPM, use cases, network communication, two-way communication using Socket.io, and cluster round robin load balancing. It also lists topics that will be covered in the course modules and highlights features like live online classes, class recordings, 24/7 support, quizzes, projects, and a verifiable certificate.
Pollfish is a survey platform which provides access to millions of targeted users. Pollfish allows easy distribution and targeting of surveys through existing mobile apps. (https://www.pollfish.com/). At pollfish we use Cassandra for difference use cases, eg. for application data store to maximize write throughput when appropriate and for our analytics project to find insights in application generated data. As a medium to accomplish our success so far, we use the Datastax's DSE 4.6 environment which integrates Appache Cassadra, Spark and a hadoop compatible file system (CFS). We will discuss how we started, how the journey was and the impressions gained so far along with some tips learned the hard way. This is a result of joint work of an excellent team here at Pollfish.
High concurrency, Low latency analytics using Spark/KuduChris George
With the right combination of open source projects, you can have a high concurrency and low latency spark jobs for doing data analysis. We'll show both REST and JDBC access to access data from a persistent spark context and then show how the combination of Spark Job Server, Spark Thrift Server and Apache Kudu can create a scalable backend for low latency analytics.
1) The document proposes making a key-value storage system (CDP KVS) 10 times more scalable to support real-time data delivery.
2) Three ideas are presented: using an alternative distributed KVS, implementing a storage hierarchy on the existing KVS, and shipping edit logs to indexed archives.
3) The storage hierarchy approach of partitioning, compressing, and writing data to DynamoDB in batches is selected as it improves write performance and reduces storage costs while remaining stateless.
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScyllaDB
The idea for implementing a brand new Rust driver for ScyllaDB emerged from an internal hackathon in 2020. The initial goal was to provide a native implementation of a CQL driver, fully compatible with Apache Cassandra™, but also contain a variety of Scylla-specific optimizations. The development was later continued as a Warsaw University project led by ScyllaDB.
Now it's an officially supported driver with excellent performance and a wide range of features. This session shares the design decisions taken in implementing the driver and its roadmap. It also presents a forward-thinking plan to unify other Scylla-specific drivers by translating them to bindings to our Rust driver, using work on our C++ driver as an example.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Quick trip around the Cosmos - Things every astronaut supposed to knowRafał Hryniewski
Slides for my talk which overviews new(ish) product of Microsoft - multi-model, cloud database known as CosmosDB.
Recorded talk (in Polish) is available here: https://youtu.be/ZWpJne0kcds?t=1h52m45s
Scylla is a new open source NoSQL database that is compatible with Apache Cassandra but provides significantly higher performance through a redesign that takes advantage of modern hardware. Scylla is capable of over 1.8 million operations per second per node with predictable low latencies. It uses an architecture with shard-per-core and reactor programming that avoids locks and threads for near-linear scaling. Scylla also has its own efficient unified cache and I/O scheduler that maximize throughput and allow it to outperform Cassandra on benchmarks by an order of magnitude. Scylla is fully compatible with Cassandra and aims to build an open source community around ongoing core database improvements.
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKSkills Matter
In this talk of Hadoop User Group UK meeting, Aaron Kimball from Cloudera introduces Sqoop, the open source SQL-to-Hadoop tool. Sqoop helps users perform efficient imports of data from RDBMS sources to Hadoop's distributed file system, where it can be processed in concert with other data sources. Sqoop also allows users to export Hadoop-generated results back to an RDBMS for use with other data pipelines.
After this session, users will understand how databases and Hadoop fit together, and how to use Sqoop to move data between these systems. The talk will provide suggestions for best practices when integrating Sqoop and Hadoop in your data processing pipelines. We'll also cover some deeper technical details of Sqoop's architecture, and take a look at some upcoming aspects of Sqoop's development roadmap.
This document summarizes new features and upcoming releases for Ceph. In the Jewel release in April 2016, CephFS became more stable with improvements to repair and disaster recovery tools. The BlueStore backend was introduced experimentally to replace Filestore. Future releases Kraken and Luminous will include multi-active MDS support for CephFS, erasure code overwrites for RBD, management tools, and continued optimizations for performance and scalability.
The document discusses data partitioning and distribution across multiple machines in a cluster. It explains that data replication does not scale well, but data partitioning, where each record exists on only one machine, allows write latency to scale with the number of machines in the cluster. Coherence provides a distributed cache that partitions data and offers functions for server-side processing near the data through tools like entry processors.
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...DataStax Academy
Presenters: Don Marti, Glauber Costa, and Dor Laor of Cloudius Systems
The need for performance tuning of the JVM and OS is making administrators the bottleneck for Cassandra deployments--especially in virtual environments. Over the past two years, the OSv project has profiled tuning-sensitive applications with a special focus on Cassandra. Today, many of the important bottlenecks for NoSQL applications are tunable on a conventional OS, but do not require tuning in the OSv environment. OSv gives Cassandra a simpler environment, set up to run one application in a single address space. This talk will cover how to use OSv to improve performance in key areas such as JVM memory allocation and network throughput--without loading up your to-do list with difficult tuning tasks.
"Puppet and Apache CloudStack" by David Nalley, Citrix, at Puppet Camp San Francisco 2013. Find a Puppet Camp near you: puppetlabs.com/community/puppet-camp/
Infrastructure as code with Puppet and Apache CloudStackke4qqq
Puppet can now be used to define not only the configuration of machines, but also the machines themselves and entire collections of machines when using CloudStack. New Puppet types and providers allow defining CloudStack instances, groups of instances, and entire application stacks that can then be deployed on CloudStack. This brings infrastructure as code to a new level by allowing Puppet to define and manage the entire CloudStack infrastructure.
Presentation at March 2019 Dutch Postgres User Group Meetup on lessons learnt while migrating from Oracle to Postgres, demo'ed via vagrant test environments and using generic pgbench datasets.
OSv is a new, high-performance OS for virtual machines in the cloud. Designed to run one application per guest with minimal overhead, OSv eliminates important bottlenecks for NoSQL applications through improvements in memory management, network I/O, and scheduling. And many important bottlenecks for NoSQL applications are tunable on a conventional OS, but do not require tuning in the OSv environment.
OSv is fully stateless and can be configured at runtime with cloud-init or through a REST API, with zero configuration files. OSv offers unified tracing from the application layer through the JVM and the OS kernel. Attendees will learn how to boot Cassandra in one second, and create a simple cluster in a minute.
Listen up, developers. You are not special. Your infrastructure is not a beautiful and unique snowflake. You have the same tech debt as everyone else. This is a talk about a better way to build and manage infrastructure: Terraform Modules. It goes over how to build infrastructure as code, package that code into reusable modules, design clean and flexible APIs for those modules, write automated tests for the modules, and combine multiple modules into an end-to-end techs tack in minutes.
You can find the video here: https://www.youtube.com/watch?v=LVgP63BkhKQ
Under The Hood Of A Shard-Per-Core Database ArchitectureScyllaDB
This document summarizes the key design decisions behind ScyllaDB's shard-per-core database architecture. It discusses how ScyllaDB addresses the challenges of scaling databases across hundreds of CPU cores by utilizing an asynchronous task model with one thread and one data shard per CPU core. This allows for linear scalability. It also overhauls the I/O scheduling to prioritize workloads and maximize throughput from SSDs under mixed read/write workloads. Benchmark results show ScyllaDB's architecture can handle petabyte-scale databases with high performance and low latency even on commodity hardware.
OCF.tw's talk about "Introduction to spark"Giivee The
在 OCF and OSSF 的邀請下分享一下 Spark
If you have any interest about 財團法人開放文化基金會(OCF) or 自由軟體鑄造場(OSSF)
Please check http://ocf.tw/ or http://www.openfoundry.org/
另外感謝 CLBC 的場地
如果你想到在一個良好的工作環境下工作
歡���跟 CLBC 接洽 http://clbc.tw/
High-Performance Storage Services with HailDB and Javasunnygleason
This document summarizes an approach to providing high-performance storage services using Java and HailDB. It discusses using the optimized "guts" of MySQL without needing to go through JDBC and SQL. It presents HailDB as a storage engine alternative to NoSQL options like Voldemort. It describes integrating HailDB with Java using JNA, building a REST API on top called St8, and examples of nifty applications like graph stores and counters. It concludes with discussing future work like improving packaging, online backup, and exploring JNI bindings.
This document discusses using Apache Spark to perform analytics on Cassandra data. It provides an overview of Spark and how it can be used to query and aggregate Cassandra data through transformations and actions on resilient distributed datasets (RDDs). It also describes how to use the Spark Cassandra connector to load data from Cassandra into Spark and write data from Spark back to Cassandra.
концепт и архитектура геймплея в Creach: The Depleted WorldSperasoft
Presentation by Evgeniy Muralev (Sperasoft) and Konstantin Muralev (Trace studio) during Unreal Engine 4 MeetUp at Sperasoft office in St.Petersburg
April 8th, 2017
This document discusses code and memory optimization techniques for software engineers developing AAA game titles. It begins with an introduction to the speaker and provides an overview of hardware architecture including CPU registers, caches, and memory access times. The bulk of the document focuses on optimizing for data caches through techniques like improving data layout, prefetching, and utilizing cache lines efficiently. It also discusses optimizing branches through removing branches, computing both paths, and splitting data to avoid branches. Resources for further reading are provided.
The document discusses key concepts in relational database models including:
- Data is stored in tables called relations with rows and columns where rows represent records and columns represent attributes.
- Relations can be normalized to eliminate redundant data and optimize storage.
- Database normalization involves organizing data into tables through a multi-step process to remove anomalies.
- SQL is a programming language used to interact with relational databases through operations like joins, transactions, and indexing/hashing techniques.
Automated layout testing using Galen FrameworkSperasoft
The Galen framework allows testing page layouts using Selenium and by verifying elements' positions relative to each other. It uses .gspec files to describe layouts with objects, groups, sections and tags. Verifications include checking widths, heights, alignments, text values, and relative positions using keywords like "near", "inside" and ranges. Results can be saved to HTML reports.
The document discusses various security threats related to Android applications. It begins by introducing the OWASP Mobile Top 10 risks framework for categorizing common mobile vulnerabilities. It then provides more details on each of the top 10 risk categories, including examples, impacts, and tips for prevention. It also discusses techniques for protecting Android apps from reverse engineering and tampering, such as code obfuscation, anti-debugging, and license verification.
Sperasoft Talks: RxJava Functional Reactive Programming on AndroidSperasoft
RxJava is a library for composing asynchronous and event-based programs by using observable sequences. It provides APIs for asynchronous programming using observable streams and the observer pattern to allow publishing and subscribing to multiple streams of events. Some key features include transformations on observable streams, combining multiple observables, filtering streams, and handling asynchronous operations without callbacks using reactive extensions. The document provides examples of creating observables from various sources, transforming streams through mapping and filtering, and combining multiple observables. It also discusses subjects, schedulers, and how RxJava can help eliminate AsyncTasks for asynchronous operations on Android.
This document provides an overview and agenda for the JPoint 2015 conference. It includes summaries of sessions on memory leaks profiling basics, notes about the Java String class, defining and measuring technical debt, how regular expressions work under the hood, and using memory dumps and analysis tools to find memory leaks. The agenda outlines sessions on memory regions, garbage collection, identifying memory leaks through examples, JVM options for logs and dumps, String class internals, technical debt concepts, and regular expression matching algorithms.
This document provides guidance on how to make meetings effective. It discusses preparing for meetings by defining goals, inviting the right participants, and sending agendas in advance. It recommends best practices during meetings, such as arriving on time, following the agenda, and sticking to time limits. Follow-up is also important, such as sending meeting minutes, tracking action items, and monitoring progress. Regular status meetings should review what was accomplished, next steps, and any issues in a short 15 minute stand-up format.
This document provides an overview of Unreal Engine 4 (UE4) and summarizes its features for game and app development. It covers UE4 project setup, game logic creation tools like Blueprints that improved on UE3 tools, a new UI system called Slate, automation testing capabilities, physically based materials, mobile development support across platforms, performance optimization techniques, and content creation guidelines.
JIRA is a bug tracking and issue tracking tool that allows users to manage issues, workflows, users and security. It also has a powerful plugin system that allows for customization. The document discusses building custom plugins and modules for JIRA, including different plugin module types. It also covers challenges like migrating configurations between environments and building custom data models and fields in JIRA.
The document provides an overview of Elasticsearch including that it is easy to install, horizontally scalable, and highly available. It discusses Elasticsearch's core search capabilities using Lucene and how data can be stored and retrieved. The document also covers Elasticsearch's distributed nature, plugins, scripts, custom analyzers, and other features like aggregations, filtering and sorting.
This document discusses mobile development using HTML, CSS, and JS. It covers developing for mobile by using web technologies that allow working offline, though early attempts were difficult. Frameworks like Ionic, Bootstrap, and libraries like jQuery UI, AngularJS, and EmberJS help build mobile apps with touches, swipes and bars as the UI. Containers like Apache Cordova allow building native mobile apps with full browser capabilities and offline access. The document compares PhoneGap to Cordova and covers debugging Android, iOS, and Windows Phone mobile apps.
Kanban is an agile method that uses a visual board with columns to manage work in progress. It focuses on limiting work-in-progress instead of having sprints or deadlines. Kanban can be better than Scrum for small teams, startups, or when tasks are variable since it allows for faster feature deployment without fixed timeboxes and less process overhead. The key aspects of Kanban include a customizable board with columns, optional constraints on work-in-progress per column, and an optional expedite row to prioritize certain tasks.
Sperasoft talks about several important aspects of ECMAScript6 - language widely used for client-side scripting on the web, in the form of several well-known implementations such as JavaScript, JScript and ActionScript.
Sperasoft is a game development company specializing in console development. This document provides tips for console development including considerations for different hardware platforms, using development kits to debug platform-specific issues, optimizing for limited memory and performance, following development processes, and addressing technical requirements checklists.
Generative AI technology is a fascinating field that focuses on creating comp...Nohoax Kanont
Generative AI technology is a fascinating field that focuses on creating computer models capable of generating new, original content. It leverages the power of large language models, neural networks, and machine learning to produce content that can mimic human creativity. This technology has seen a surge in innovation and adoption since the introduction of ChatGPT in 2022, leading to significant productivity benefits across various industries. With its ability to generate text, images, video, and audio, generative AI is transforming how we interact with technology and the types of tasks that can be automated.
"Making .NET Application Even Faster", Sergey Teplyakov.pptxFwdays
In this talk we're going to explore performance improvement lifecycle, starting with setting the performance goals, using profilers to figure out the bottle necks, making a fix and validating that the fix works by benchmarking it. The talk will be useful for novice and seasoned .NET developers and architects interested in making their application fast and understanding how things work under the hood.
Demystifying Neural Networks And Building Cybersecurity ApplicationsPriyanka Aash
In today's rapidly evolving technological landscape, Artificial Neural Networks (ANNs) have emerged as a cornerstone of artificial intelligence, revolutionizing various fields including cybersecurity. Inspired by the intricacies of the human brain, ANNs have a rich history and a complex structure that enables them to learn and make decisions. This blog aims to unravel the mysteries of neural networks, explore their mathematical foundations, and demonstrate their practical applications, particularly in building robust malware detection systems using Convolutional Neural Networks (CNNs).
Retrieval Augmented Generation Evaluation with RagasZilliz
Retrieval Augmented Generation (RAG) enhances chatbots by incorporating custom data in the prompt. Using large language models (LLMs) as judge has gained prominence in modern RAG systems. This talk will demo Ragas, an open-source automation tool for RAG evaluations. Christy will talk about and demo evaluating a RAG pipeline using Milvus and RAG metrics like context F1-score and answer correctness.
Self-Healing Test Automation Framework - HealeniumKnoldus Inc.
Revolutionize your test automation with Healenium's self-healing framework. Automate test maintenance, reduce flakes, and increase efficiency. Learn how to build a robust test automation foundation. Discover the power of self-healing tests. Transform your testing experience.
Top 12 AI Technology Trends For 2024.pdfMarrie Morris
Technology has become an irreplaceable component of our daily lives. The role of AI in technology revolutionizes our lives for the betterment of the future. In this article, we will learn about the top 12 AI technology trends for 2024.
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Zilliz
Enterprises have traditionally prioritized data quantity, assuming more is better for AI performance. However, a new reality is setting in: high-quality data, not just volume, is the key. This shift exposes a critical gap – many organizations struggle to understand their existing data and lack effective curation strategies and tools. This talk dives into these data challenges and explores the methods of automating data curation.
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...Fwdays
.NET 8 brought a lot of improvements for developers and maturity to the Azure serverless container ecosystem. So, this talk will cover these changes and explain how you can apply them to your projects. Another reason for this talk is the re-invention of Serverless from a DevOps perspective as a Platform Engineering trend with Backstage and the recent Radius project from Microsoft. So now is the perfect time to look at developer productivity tooling and serverless apps from Microsoft's perspective.
The History of Embeddings & Multimodal EmbeddingsZilliz
Frank Liu will walk through the history of embeddings and how we got to the cool embedding models used today. He'll end with a demo on how multimodal RAG is used.
Increase Quality with User Access Policies - July 2024Peter Caitens
⭐️ Increase Quality with User Access Policies ⭐️, presented by Peter Caitens and Adam Best of Salesforce. View the slides from this session to hear all about “User Access Policies” and how they can help you onboard users faster with greater quality.
Discovery Series - Zero to Hero - Task Mining Session 1DianaGray10
This session is focused on providing you with an introduction to task mining. We will go over different types of task mining and provide you with a real-world demo on each type of task mining in detail.
2. In memory Key-Value:
Redis
CouchBase
File system
BerkeleyDB
Distributed file system
HDFS
Velocity
Volume
Variety
In memory NewSQL /
domain specific
VoltDB
Traditional
[SQL] RDBMS
MongoDB
Distributed DBMS
Cassandra
HBase
CAP theorem:
• Consistency
• Availability
• Partition tolerance
- pick two
Eventual consistency
Transactional?
• No
Choose the DB
3. Why select Cassandra?
• [relatively] easy to setup
• [relatively] easy to use
• ~zero routine ops
• it works (!!) as promised:
o real-time replication
o node/site failure recovery
o zero load writes
o double of nodes = double of speed
4. Because Cassandra is Fast!
But needs some time to deliver
• 12'000 WPS on a laptop
• ~0.1 / 1 ms constant latency for writes/reads
6. Good For:
• log-like data
TTL helps
• massive writes
1M WPS enough?
• simple real-time analytics
Not So Good For:
• dump of junk
(consider HDFS)
• OLAP
(depends on "O")
Good and Not So Good
7. Distributed DBMS
Just DBMS - closed monolithic solution
o not a platform to run custom code (as MongoDB);
o not an extension (as HBase);
o highly optimized
No-master, eventually consistent
NoSQL
Data model - Key-Value
http://cassandra.apache.org
Apache Cassandra
8. Developed at Facebook for Inbox search
Released to open source in 2008
In use:
• Netflix - main non-content data store~500 Cassandra nodes (2012)
• eBay - recommendation system"dozens of nodes", 200 TB storage (2012)
• Twitter - tweet analysis100 + TB of data
• More clients: (http://www.datastax.com/cassandrausers)
History
9. 1.0 - October 2011
1.1 - April 2012
1.2 - January 2013
2.0 - expected this summer (2013)
June 26 2013 - 158 bugs, 89 worth to notice
Sperasoft Experience:
• hit 1 bug in production (stability issue)
• hit 1 bug in QA (in a crafted case)
Mature & Agile
10. Apache .tar.gz and Debian
packageshttp://cassandra.apache.org/download/
DataStax DSC - Cassandra + OpsCenter
http://planetcassandra.org/Download/DataStaxCommunityEdition
Embedded – for funct. tests on Java apps
Maven
Documentation
http://wiki.apache.org/cassandra/
http://www.datastax.com/docs
Distributions
12. Bare metal
CPU: 8 cores (4 works too)
RAM: 16 - 64 GB (min 8 GB)
Storage: rotating disks 3 - 5 TB total (SSD better)
VM works too, but...
Storage: local disks, avoid NAS
More on Hardware
16. Node 3Node 2Node 1
CFCFCF
1
2
3
Client
Parallel reads,writes
1
2
3
Data on Discs
17. https://github.com/datastax/java-driver
Client API Options
Thrift RPC Native protocol + CQL3
Apache Thrift Custom protocol
Synchronous Asynchronous
Schema-less Static schema
Store & Forward Cursors promised in 2.0
API for any language Java; Python, C# coming
Cryptic API JDBC-like API
Supported yet Going forward
18. • Forget RDB design principles
• Forget abstract data model
- shape data for queries
• No joins - materialized views
• Data duplication - OK
• Remember eventual consistency
• Queries are precious
• Use right data types - timestamp, uuid
Why? Because NoSQL is a low level tool for high optimization.
Data Modeling for NoSQL
22. Insert = Update = Delete
A B C D1
Y1
Z1
1
A Y Z1
a b c d
UPDATE ... SET b = 'Y' WHERE id = 1
INSERT INTO ... SET (id, c) values (1, 'Z')
DELETE d FROM ... WHERE id = 1
SELECT * FROM ... WHERE id = 1
have to fetch 4 rows
slo-o-ow
Plan Data Immutable
24. Remember - eventual consistency.
Concurrent updates
=> wrong count
SELECT count(*)
FROM ... WHERE .... ;
Full scan over the selection
=>
Default 10'000 rows limit
=> wrong count
Have an integer column and increment it
CREATE TABLE count_table (
id
uuid,
value counter,
PRIMARY KEY (id)
);
...
UPDATE count_table
SET value = value + 1
WHERE id = ... ;
Counter column family
http://www.datastax.com/documentation/cassandr
a/1.2/cassandra/cql_using/use_counter_t.html
slo-o-ow
mess
mess
How Many?
25. CREATE TABLE blob (
id uuid,
data blob,
PRIMARY KEY (id)
);
id
chunk_no
data
CREATE TABLE blob (
id uuid,
chunk_no int,
data blob,
PRIMARY KEY (id, chunk_no)
);
id data
http://wiki.apache.org/cassandra/FAQ#large_file_and_blob_storage
http://wiki.apache.org/cassandra/CassandraLimitations
OutOfMemory
Blobs