HBase is an open source, distributed, column-oriented database modeled after Google's Bigtable that runs on top of Hadoop. The presenter discusses HBase's architecture, performance improvements in version 0.20 including major gains from new file formats and compression, and Stumbleupon's extensive use of HBase including supporting over 9 billion rows in a single table with high import and read speeds.
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon
The document discusses improvements made by Hubspot's Big Data Team to increase the availability of HBase in a multi-tenant environment. It outlines reducing the cost of region server failures by improving mean time to recovery, addressing issues that slowed recovery, and optimizing the load balancer. It also details eliminating workload-driven failures through service limits and improving hardware monitoring to reduce impacts of failures. The changes resulted in 8-10x faster balancing, reduced recovery times from 90 to 30 seconds, and consistently achieving 99.99% availability across clusters.
With employees based in countries around the globe which provide 24x7 services to MySQL users worldwide, Percona provides enterprise-grade MySQL Support, Consulting, Training, Managed Services, and Server Development services to companies ranging from large organizations, such as Cisco Systems, Alcatel-Lucent, Groupon, and the BBC, to recent startups building MySQL-powered solutions for businesses and consumers.
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBaseHBaseCon
Blackbird is a large-scale object store built at Rocket Fuel, which stores 100+ TB of data and provides real time access to 10 billion+ objects in a 2-3 milliseconds at a rate of 1 million+ times per second. In this talk (an update from HBaseCon 2014), we will describe Blackbird's comprehensive collections API and various examples of how it can be used to model collections like sets, maps, and aggregates on these collections like counters, etc. We will also illustrate the flexibility and power of the API by modeling custom collection types that are unique to the Rocket Fuel context.
PostgreSQL is designed to be easily extensible. For this reason, extensions loaded into the database can function just like features that are built in. In this session, we will learn more about PostgreSQL extension framework, how are they built, look at some popular extensions, management of these extensions in your deployments.
Multitenancy: Kafka clusters for everyone at LINEkawamuray
Yuto Kawamura from LINE Corporation presented on their use of Apache Kafka clusters to provide multitenancy for different internal teams. They face challenges in ensuring isolation between client workloads and preventing abusive clients. Their solutions include request quotas to limit client resource usage, slow logs to identify slow requests, and changes to the broker code to pre-warm caches and minimize the impact of disk reads during message fetching. With these approaches, they are able to reliably operate shared Kafka clusters with high throughput and multiple tenants.
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixAcunu
The document discusses Cassandra's storage internals. It describes how Cassandra writes data to memtables and commit logs in memory before flushing to immutable SSTables on disk. It also explains how compaction merges SSTables to reclaim space and improve performance. For reads, Cassandra uses memtables, bloom filters on SSTables, key caches, and row caches to minimize disk I/O. Counters are implemented by coordinating writes across replicas.
LINE's messaging service architecture underlying more than 200 million monthl...kawamuray
Yuto Kawamura from LINE Corp presented on the messaging service architecture underlying LINE's 200 million monthly active users. The key points are:
1. LINE uses a distributed architecture with the LEGY gateway, talk-server application servers, and a hybrid Redis/HBase datastore to handle over 25 billion messages per day.
2. The Armeria RPC framework is used for communication between systems like the talk-servers, authentication services, and analytics.
3. Apache Kafka is used as the backbone for asynchronous task processing and data synchronization between services due to its load distribution, fail-over capabilities, and pub-sub model.
4. While LINE leverages many open source technologies, it also
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
In this presentation, we will introduce Hotspot's Garbage First collector (G1GC) as the most suitable collector for latency-sensitive applications running with large memory environments. We will first discuss G1GC internal operations and tuning opportunities, and also cover tuning flags that set desired GC pause targets, change adaptive GC thresholds, and adjust GC activities at runtime. We will provide several HBase case studies using Java heaps as large as 100GB that show how to best tune applications to remove unpredicted, protracted GC pauses.
Kafka Summit SF 2017 - Shopify Flash Sales with Apache Kafkaconfluent
This document discusses how Shopify uses Apache Kafka in their systems architecture. It describes how Kafka provides reliable asynchronous messaging that allows Shopify to collect logs and events and feed them to their data lake. It outlines how Kafka provides operational decoupling and allows them to deploy application or Kafka changes independently. It then discusses two specific use cases: using Kafka as part of a logs/events pipeline and using it to enable active-active Elasticsearch replication across multiple data centers.
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
This document discusses file system usage in HBase. It provides an overview of the three main file types in HBase: write-ahead logs (WALs), data files, and reference files. It describes durability semantics, IO fencing techniques for region server recovery, and how HBase leverages data locality through short circuit reads, checksums, and block placement hints. The document is intended help understand HBase's interactions with HDFS for tuning IO performance.
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuRedis Labs
Postgres and Redis Sitting in a Tree | In today’s world of polyglot persistence, it’s likely that companies will be using multiple data stores for storing and working with data based on the use case. Typically a company will
start with a relational database like Postgres and then add Redis for more high velocity use-cases. What if you could tie the two systems together to enable so much more?
This document provides an overview of Postgres clustering solutions and distributed Postgres architectures. It discusses master-slave replication, Postgres-XC/XL, Greenplum, CitusDB, pg_shard, BDR, pg_logical, and challenges around distributed transactions, high availability, and multimaster replication. Key points include the tradeoffs of different approaches and an implementation of multimaster replication built on pg_logical and a timestamp-based distributed transaction manager (tsDTM) that provides partition tolerance and automatic failover.
High Performance, High Reliability Data Loading on ClickHouseAltinity Ltd
This document provides a summary of best practices for high reliability data loading in ClickHouse. It discusses ClickHouse's ingestion pipeline and strategies for improving performance and reliability of inserts. Some key points include using larger block sizes for inserts, avoiding overly frequent or compressed inserts, optimizing partitioning and sharding, and techniques like buffer tables and compact parts. The document also covers ways to make inserts atomic and handle deduplication of records through block-level and logical approaches.
HBase 2.0 is the next stable major release for Apache HBase scheduled for early 2017. It is the biggest and most exciting milestone release from the Apache community after 1.0. HBase-2.0 contains a large number of features that is long time in the development, some of which include rewritten region assignment, perf improvements (RPC, rewritten write pipeline, etc), async clients, C++ client, offheaping memstore and other buffers, Spark integration, shading of dependencies as well as a lot of other fixes and stability improvements. We will go into technical details on some of the most important improvements in the release, as well as what are the implications for the users in terms of API and upgrade paths. Existing users of HBase/Phoenix as well as operators managing HBase clusters will benefit the most where they can learn about the new release and the long list of features. We will also briefly cover earlier 1.x release lines and compatibility and upgrade paths for existing users and conclude by giving an outlook on the next level of initiatives for the project.
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon
HBase is used to serve online facing traffic in Pinterest. It means no downtime is allowed. However, we were on HBase 94. To upgrade to latest version, we need to figure out a way to live upgrade while keeping Pinterest site live. Recently, we successfully upgrade 94 HBase cluster to 1.2 with no downtime. We made change to both Asynchbase and HBase server side. We will talk about what we did and how we did it. We will also talk about the finding in config and performance tuning we did to achieve low latency.
HBaseCon 2013: Scalable Network Designs for Apache HBaseCloudera, Inc.
This document discusses scalable network designs and how modern networks can help applications. It begins with a brief history of network software and describes how switches now run Linux. Typical network designs are presented starting small and scaling up through multiple racks and core switches. The benefits of layer 3 designs, jumbo frames, and deep buffers to prevent packet loss are covered. Finally, it discusses how the network can help applications by detecting server failures, redirecting traffic, and enabling fast failover through features only possible by the switch running Linux.
HBaseCon 2013: A Developer’s Guide to CoprocessorsCloudera, Inc.
This document discusses coprocessors in HBase, which allow arbitrary code to run on each region server. It provides examples of using coprocessors for observers that react to events and endpoints that clients can explicitly call. The examples include expanding single-row JSON data into multiple columns, collecting real-time analytics, and optimizing searches through endpoints.
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataJignesh Shah
This document discusses PostgreSQL performance on multi-core systems with multi-terabyte data. It covers current market trends towards more cores and larger data sizes. Benchmark results show that PostgreSQL scales well on inserts up to a certain number of clients/cores but struggles with OLTP and TPC-E workloads due to lock contention. Issues are identified with sequential scans, index scans, and maintenance tasks like VACUUM as data sizes increase. The document proposes making PostgreSQL utilities and tools able to leverage multiple cores/processes to improve performance on modern hardware.
PostgreSQL Replication High Availability MethodsMydbops
This slides illustrates the need for replication in PostgreSQL, why do you need a replication DB topology, terminologies, replication nodes and many more.
This document provides an overview of Apache Cassandra, including its origins from Amazon Dynamo and Google BigTable, its data model using a ring topology and column families, and how it provides horizontal scalability and eventual consistency through replication. It discusses Cassandra's write path using commit logs and memtables as well as its read path involving caching. It also covers client access, practical considerations, and Cassandra's future direction.
This document provides an overview of NoSQL and Hadoop technologies. It discusses the trends driving these technologies like increasing data size, connectivity of data, semi-structured data, and decoupled service architectures. It introduces concepts from academic research like Amazon Dynamo, Google BigTable, and Brewer's CAP theorem. Specific technologies are explained like Hadoop for processing large datasets using MapReduce on the Hadoop Distributed File System.
- The document discusses using Flume and HBase for real-time analytics of streaming big data with service level agreements (SLAs).
- Flume is used to ingest streaming data into HBase for storage and querying. HBase's column families allow storing different types of data with different time to live settings.
- Maintaining SLAs for bucketing millions of product impressions within seconds across many nodes is challenging due to the complexity of the systems involved.
- Monalytics can help monitor end-to-end performance and identify issues in meeting SLAs, like garbage collection pauses, across the distributed Flume and HBase deployment.
The document discusses improvements to consistency and performance in HBase. It describes how HBase maintains consistency using timestamps and multi-version concurrency control. It identifies issues that could cause inconsistent reads and explains how storing timestamps in HFiles resolves it. The document also outlines several performance improvements in HBase 0.94 like checksum caching, deleting expired files, HLog compression and lazy-seeking.
This document summarizes the new features and improvements in Apache HBase 0.98, including:
1) Performance enhancements such as reverse scans, improved write throughput, and stripe compactions.
2) New security features like endpoint access control, transparent encryption, per-cell ACLs, and visibility labels.
3) Additional features like MapReduce over snapshots and REST streaming scans.
4) The release manager ensures compatibility, performance, and other criteria are met before the 0.98 release.
What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web ApplicationsTodd Hoff
This is the slidedeck I used for a webinar (http://voltdb.com/choosing-sql-nosql-or-both-scalable-web-apps-webinar) I gave on helping people choose SQL or NoSQL for building scalabile web applications. Hint, the answer is: both.
The document provides an overview of NewSQL databases. It discusses why NewSQL databases were created, including the need to handle extreme amounts of data and traffic. It describes some key characteristics of NewSQL databases, such as providing scalability like NoSQL databases while also supporting SQL and ACID transactions. Finally, it reviews some examples of NewSQL database products, like VoltDB and Google Spanner, and their architectures.
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsCloudera, Inc.
The latest Apache HBase releases, 0.92 and 0.94, contain many improvements over prior releases in terms of correctness and performance improvements. We discuss a couple of these improvements from a development and operations perspective. For correctness, we discuss the ACID guarantees of HBase, give a case study of problems with earlier releases, and give an overview of the implementation internals that were improved to fix the issues. For performance, we discuss recent improvements in 0.94 and how to monitor the performance of a cluster with new metrics.
The document discusses strategies for storing time series data from IoT devices in Apache HBase. It describes how IoT data streams typically have a time-series format with identifiers, timestamps and values. It proposes using HBase to store the raw, compressed and aggregated time series data separately with different retention policies. FIFO compaction is recommended for raw data while ECPM or date tiered compaction could be used for compressed and aggregated data. This would reduce read and write I/O compared to the default HBase settings while preserving the temporal locality of the time series data.
Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise
HBase Technical Introduction. This deck includes a description of memory design, write path, read path, some operational tidbits, SQL on HBase (Phoenix and Hive), as well as HOYA (HBase on YARN).
Hadoop World 2011: Advanced HBase Schema DesignCloudera, Inc.
While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second.
This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they can be implemented on top of HBase, using schemas that optimize for the given access patterns.
This document discusses NoSQL databases and provides an example of using MongoDB to calculate a total sum from documents. Key points:
- MongoDB is a document-oriented NoSQL database where data is stored in JSON-like documents within collections. It uses map-reduce functions to perform aggregations.
- The example shows saving ticket documents with an ID and checkout amount to the tickets collection.
- A map-reduce operation is run to emit the checkout amount from each document. These are summed by the reduce function to calculate a total of 430 across all documents.
Vladimir Rodionov (Hortonworks)
Time-series applications (sensor data, application/system logging events, user interactions etc) present a new set of data storage challenges: very high velocity and very high volume of data. This talk will present the recent development in Apache HBase that make it a good fit for time-series applications.
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
This document discusses using Apache Spark and Apache HBase to build a time series application. It provides an overview of time series data and requirements for ingesting, storing, and analyzing high volumes of time series data. The document then describes using Spark Streaming to process real-time data streams from sensors and storing the data in HBase. It outlines the steps in the lab exercise, which involves reading sensor data from files, converting it to objects, creating a Spark Streaming DStream, processing the DStream, and saving the data to HBase.
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks
HBase adoption continues to explode amid rapid customer success and unbridled innovation. HBase with its limitless scalability, high reliability and deep integration with Hadoop ecosystem tools, offers enterprise developers a rich platform on which to build their next generation applications. In this workshop we will explore HBase SQL capabilities, deep Hadoop ecosystem integrations and deployment & management best practices.
This document provides an overview of Apache Hadoop and HBase. It begins with an introduction to why big data is important and how Hadoop addresses storing and processing large amounts of data across commodity servers. The core components of Hadoop, HDFS for storage and MapReduce for distributed processing, are described. An example MapReduce job is outlined. The document then introduces the Hadoop ecosystem, including Apache HBase for random read/write access to data stored in Hadoop. Real-world use cases of Hadoop at companies like Yahoo, Facebook and Twitter are briefly mentioned before addressing questions.
With the public confession of Facebook, HBase is on everyone's lips when it comes to the discussion around the new "NoSQL" area of databases. In this talk, Lars will introduce and present a comprehensive overview of HBase. This includes the history of HBase, the underlying architecture, available interfaces, and integration with Hadoop.
This document provides an overview of distributed databases and the Yahoo! Cloud Serving Benchmark (YCSB). It discusses NoSQL databases Cassandra and HBase and how YCSB can be used to benchmark their performance. Experiments were conducted on Amazon EC2 using YCSB to load data and run workloads on Cassandra and HBase clusters. The results showed Cassandra had lower latency and higher throughput than HBase. YCSB provides a way to compare the performance of different databases.
HBase is an open-source, distributed, column-oriented database that runs on top of Hadoop. It provides real-time read and write access to large amounts of data across clusters of commodity hardware. HBase scales to billions of rows and millions of columns and is used by companies like Twitter, Adobe, and Yahoo to store large datasets. It uses a master-slave architecture with a single HBaseMaster and multiple RegionServers and stores data in Hadoop's HDFS for high availability.
Jonathan Gray gave an introduction to HBase at the NYC Hadoop Meetup. He began with an overview of HBase and why it was created to handle large datasets beyond what Hadoop could support alone. He then described what HBase is, as a distributed, column-oriented database management system. Gray explained how HBase works with its master and regionserver nodes and how it partitions data across tables and regions. He highlighted some key features of HBase and examples of companies using it in production. Gray concluded with what is planned for the future of HBase and contrasted it with relational database examples.
PostgreSQL is an object-relational database management system (ORDBMS) that is cross-platform and implements the majority of the SQL:2011 standard. It uses a client-server model with a postmaster daemon process that manages connections to backend server processes. PostgreSQL supports features like ACID compliance, transactions, complex queries, user-defined objects, and built-in replication. It allows custom indexing and inheritance between tables. To get started, users can create databases and tables, populate them with data, and perform queries.
The document provides an introduction to NoSQL and HBase. It discusses what NoSQL is, the different types of NoSQL databases, and compares NoSQL to SQL databases. It then focuses on HBase, describing its architecture and components like HMaster, regionservers, Zookeeper. It explains how HBase stores and retrieves data, the write process involving memstores and compaction. It also covers HBase shell commands for creating, inserting, querying and deleting data.
PostgreSQL is a well-known relational database. But in the last few years, it has gained capabilities that previously belonged only to "NoSQL" databases. In this talk, I describe several of PostgreSQL that give it such capabilities.
HBase is a distributed, column-oriented database that stores data in tables divided into rows and columns. It is optimized for random, real-time read/write access to big data. The document discusses HBase's key concepts like tables, regions, and column families. It also covers performance tuning aspects like cluster configuration, compaction strategies, and intelligent key design to spread load evenly. Different use cases are suitable for HBase depending on access patterns, such as time series data, messages, or serving random lookups and short scans from large datasets. Proper data modeling and tuning are necessary to maximize HBase's performance.
From: DataWorks Summit 2017 - Munich - 20170406
HBase hast established itself as the backend for many operational and interactive use-cases, powering well-known services that support millions of users and thousands of concurrent requests. In terms of features HBase has come a long way, overing advanced options such as multi-level caching on- and off-heap, pluggable request handling, fast recovery options such as region replicas, table snapshots for data governance, tuneable write-ahead logging and so on. This talk is based on the research for the an upcoming second release of the speakers HBase book, correlated with the practical experience in medium to large HBase projects around the world. You will learn how to plan for HBase, starting with the selection of the matching use-cases, to determining the number of servers needed, leading into performance tuning options. There is no reason to be afraid of using HBase, but knowing its basic premises and technical choices will make using it much more successful. You will also learn about many of the new features of HBase up to version 1.3, and where they are applicable.
NOSQL Meets Relational - The MySQL Ecosystem Gains More FlexibilityIvan Zoratti
Colin Charles gave a presentation comparing SQL and NoSQL databases. He discussed why organizations adopt NoSQL databases like MongoDB for large, unstructured datasets and rapid development. However, he argued that MySQL can also handle these workloads through features like dynamic columns, memcached integration, and JSON support. MySQL addresses limitations around high availability, scalability, and schema flexibility through tools and plugins that provide sharding, replication, load balancing, and online schema changes. In the end, MySQL with the right tools is capable of fulfilling both transactional and NoSQL-style workloads.
1. The document discusses how OpsWorks has made the presenter's life easier as a developer who also handles operations. OpsWorks provides hosted infrastructure on AWS for deploying applications using Chef recipes.
2. It describes the main structures in OpsWorks - stacks, layers, apps, and instances. Stacks represent entire applications, layers define different parts like web servers, apps contain specific settings, and instances define the servers.
3. The presenter discusses using OpsWorks with Ruby on Rails applications, including customizing Chef recipes, deploying code, and integrating other AWS services for monitoring, security, and scaling. While documentation can be confusing, OpsWorks provides an easy way for developers to manage operations.
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
Presented by Mark Miller, Software Engineer, Cloudera
As the NoSQL ecosystem looks to integrate great search, great search is naturally beginning to expose many NoSQL features. Will these Goliath's collide? Or will they remain specialized while intermingling – two sides of the same coin.
Come learn about where SolrCloud fits into the NoSQL landscape. What can it do? What will it do? And how will the big data, NoSQL, Search ecosystem evolve. If you are interested in Big Data, NoSQL, distributed systems, CAP theorem and other hype filled terms, than this talk may be for you.
The next major version - 0.96- of Apache HBase have several new features. The "Singularity", because you will have to start and stop your cluster to upgrade to 0.96. 0.96 requires Apache Hadoop 1.0.0 at least, and supported on Hadoop 2.0.0 as well. 0.96 uses protobufs all the time. All of its serializations to ZooKeeper, to the filesystem, and over rpc are protobufs. It runs on JDK7. Metrics have been edited and converted to use Hadoop Metrics2. It has HBase Snapshots and PrefixTreeCompression, etc. This presentation captures a high-level overview of what's new in HBase 0.96.
HBase is used at Flipboard for storing user and magazine data at scale. Some key uses of HBase include storing magazines, articles, user profiles, social graphs and metrics. HBase provides high write throughput, elasticity and strong consistency needed to support Flipboard's 100+ million users. Data is accessed through patterns optimized for common queries like fetching individual magazines or articles. HBase failures are handled through caching, replication and ability to switch to redundant clusters.
Speaker: Varun Sharma (Pinterest)
Over the past year, HBase has become an integral component of Pinterest's storage stack. HBase has enabled us to quickly launch and iterate on new products and create amazing pinner experiences. This talk briefly describes some of these applications, the underlying schema, and how our HBase setup stays highly available and performant despite billions of requests every week. It will also include some performance tips for running on SSDs. Finally, we will talk about a homegrown serving technology we built from a mashup of HBase components that has gained wide adoption across Pinterest.
ClojureScript allows developers to use the Clojure programming language to build applications that compile to JavaScript. This enables Clojure code to run in environments where JavaScript is supported, like web browsers and mobile apps. ClojureScript leverages the Google Closure compiler and library to provide whole program optimization of Clojure code compiling to JavaScript.
Why you should be excited about ClojureScriptelliando dias
ClojureScript allows Clojure code to compile to JavaScript. Created by Rich Hickey and friends, it provides optimizations for performance while maintaining readability and abstraction. As a Lisp for JavaScript, ClojureScript controls complexity on the web and benefits from JavaScript's status as a compilation target for many languages.
Functional Programming with Immutable Data Structureselliando dias
1. The document discusses the advantages of functional programming with immutable data structures for multi-threaded environments. It argues that shared mutable data and variables are fundamentally flawed concepts that can lead to bugs, while immutable data avoids these issues.
2. It presents Clojure as a functional programming language that uses immutable persistent data structures and software transactional memory to allow for safe, lock-free concurrency. This approach allows readers and writers to operate concurrently without blocking each other.
3. The document makes the case that Lisp parentheses in function calls uniquely define the tree structure of computations and enable powerful macro systems, homoiconicity, and structural editing of code.
O documento lista e descreve as principais partes de um contêiner de carga seco, incluindo o painel frontal, laterais, traseira, teto, piso e estrutura inferior. Muitos componentes como painéis laterais, travessas do teto e fundo são numerados de acordo com sua localização. As portas traseiras contêm quadros, painéis, dobradiças e barras de fechamento.
O documento discute a história da geometria projetiva, desde Euclides até seu uso em computação gráfica. Aborda figuras-chave como Pascal, que foi pioneiro na área, e como a perspectiva foi aplicada nas artes ao longo dos séculos.
Polyglot and Poly-paradigm Programming for Better Agilityelliando dias
This document discusses the benefits of polyglot and poly-paradigm programming approaches for building more agile applications. It describes how using multiple languages and programming paradigms can optimize both performance and developer productivity. Specifically, it suggests that statically-typed compiled languages be used for core application components while dynamically-typed scripting languages connect and customize these components. This approach allows optimizing areas that require speed/efficiency separately from those requiring flexibility. The document also advocates aspects and functional programming to address cross-cutting concerns and concurrency challenges that arise in modern applications.
This document discusses JavaScript libraries and frameworks. It provides an overview of some popular options like jQuery, Prototype, Dojo, MooTools, and YUI. It explains why developers use libraries, such as for faster development, cross-browser compatibility, and animation capabilities. The document also discusses how libraries resemble CSS and use selector syntax. Basic examples are provided to demonstrate common tasks like hover effects and row striping. Factors for choosing a library are outlined like maturity, documentation, community, and licensing. The document concludes by explaining how to obtain library code from project websites or Google's AJAX Libraries API.
How to Make an Eight Bit Computer and Save the World!elliando dias
This document summarizes a talk given to introduce an open source 8-bit computer project called the Humane Reader. The talk outlines the goals of providing a cheap e-book reader and computing platform using open source tools. It describes the hardware design which uses an AVR microcontroller and interfaces like video output, SD card, and USB. The talk also covers using open source tools for development and sourcing low-cost fabrication and assembly. The overall goals are to create an inexpensive device that can provide educational resources in developing areas.
Ragel is a parser generator that compiles to various host languages including Ruby. It is useful for parsing protocols and data formats and provides faster parsing than regular expressions or full LALR parsers. Several Ruby projects like Mongrel and Hpricot use Ragel for tasks like HTTP request parsing and HTML parsing. When using Ragel with Ruby, it can be compiled to Ruby code directly, which is slow, or a C extension can be written for better performance. The C extension extracts the parsed data from Ragel and makes it available to Ruby.
A Practical Guide to Connecting Hardware to the Webelliando dias
This document provides an overview of connecting hardware devices to the web using the Arduino platform. It discusses trends in electronics and computing that make this easier, describes the Arduino hardware and software, and covers various connection methods including directly to a computer, via wireless modems, Ethernet shields, and services like Pachube that allow sharing sensor data over the internet. The document aims to demonstrate how Arduinos can communicate with other devices and be used to build interactive systems.
O documento introduz o Arduino, uma plataforma de desenvolvimento open-source. Discute as características e componentes do Arduino, incluindo microcontroladores, software e exemplos de código. Também fornece instruções básicas sobre como programar o Arduino usando linguagem C.
O documento apresenta um mini-curso introdutório sobre Arduino, abordando o que é a plataforma Arduino, como é estruturado seu hardware, como programá-lo, exemplos básicos de código e aplicações possíveis como controle residencial e robótica.
The document discusses various functions for working with datasets in the Incanter library for Clojure. It describes how to create, read, save, select rows and columns from, and sort datasets. Functions are presented for building datasets from sequences, reading datasets from files and URLs, saving datasets to files and databases, selecting single or multiple columns, and filtering rows based on conditions. The document also provides an overview of the Incanter library and its various namespaces for statistics, charts, and other functionality.
Rango is a lightweight Ruby web framework built on Rack that aims to be more robust than Sinatra but smaller than Rails or Merb. It is inspired by Django and Merb, uses Ruby 1.9, and supports features like code reloading, Bundler, routing, rendering, and HTTP error handling. The documentation provides examples and details on using Rango.
Fab.in.a.box - Fab Academy: Machine Designelliando dias
This document describes the design of a multifab machine called MTM. It includes descriptions of the XY stage and Z axis drive mechanisms, as well as the tool heads and network used to control the machine. Key aspects of the design addressed include the stepper motor selection, drive electronics, motion control firmware, and use of a virtual machine environment and circular buffer to enable distributed control of the machine. Strengths of the design include low inertia enabling high acceleration, while weaknesses include low basic resolution and stiffness unsuitable for heavy milling.
The document discusses using Clojure for Hadoop programming. Clojure is a dynamic functional programming language that runs on the Java Virtual Machine. The document provides an overview of Clojure and how its features like immutability and concurrency make it well-suited for Hadoop. It then shows examples of implementing Hadoop MapReduce jobs using Clojure by defining mapper and reducer functions.
This document provides an overview of Hadoop, including:
1) Hadoop solves the problems of analyzing massively large datasets by distributing data storage and analysis across multiple machines to tolerate node failure.
2) Hadoop uses HDFS for distributed data storage, which shards massive files across data nodes with replication for fault tolerance, and MapReduce for distributed data analysis by sending code to the data.
3) The document demonstrates MapReduce concepts like map, reduce, and their composition with an example job.
Hadoop and Hive Development at Facebookelliando dias
Facebook generates large amounts of user data daily from activities like status updates, photo uploads, and shared content. This data is stored in Hadoop using Hive for analytics. Some key facts:
- Facebook adds 4TB of new compressed data daily to its Hadoop cluster.
- The cluster has 4800 cores and 5.5PB of storage across 12TB nodes.
- Hive is used for over 7500 jobs daily and by around 200 engineers/analysts monthly.
- Performance improvements to Hive include lazy deserialization, map-side aggregation, and joins.
Multi-core Parallelization in Clojure - a Case Studyelliando dias
The document describes a case study on using Clojure for multi-core parallelization of the K-means clustering algorithm. It provides background on parallel programming concepts, an introduction to Clojure, and details on how the authors implemented a parallel K-means algorithm in Clojure using agents and software transactional memory. They present results showing speedups from parallelization and accuracy comparable to R's implementation on both synthetic and real-world datasets.
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPathCommunity
Welcome to our third live UiPath Community Day Amsterdam! Come join us for a half-day of networking and UiPath Platform deep-dives, for devs and non-devs alike, in the middle of summer ☀.
📕 Agenda:
12:30 Welcome Coffee/Light Lunch ☕
13:00 Event opening speech
Ebert Knol, Managing Partner, Tacstone Technology
Jonathan Smith, UiPath MVP, RPA Lead, Ciphix
Cristina Vidu, Senior Marketing Manager, UiPath Community EMEA
Dion Mes, Principal Sales Engineer, UiPath
13:15 ASML: RPA as Tactical Automation
Tactical robotic process automation for solving short-term challenges, while establishing standard and re-usable interfaces that fit IT's long-term goals and objectives.
Yannic Suurmeijer, System Architect, ASML
13:30 PostNL: an insight into RPA at PostNL
Showcasing the solutions our automations have provided, the challenges we’ve faced, and the best practices we’ve developed to support our logistics operations.
Leonard Renne, RPA Developer, PostNL
13:45 Break (30')
14:15 Breakout Sessions: Round 1
Modern Document Understanding in the cloud platform: AI-driven UiPath Document Understanding
Mike Bos, Senior Automation Developer, Tacstone Technology
Process Orchestration: scale up and have your Robots work in harmony
Jon Smith, UiPath MVP, RPA Lead, Ciphix
UiPath Integration Service: connect applications, leverage prebuilt connectors, and set up customer connectors
Johans Brink, CTO, MvR digital workforce
15:00 Breakout Sessions: Round 2
Automation, and GenAI: practical use cases for value generation
Thomas Janssen, UiPath MVP, Senior Automation Developer, Automation Heroes
Human in the Loop/Action Center
Dion Mes, Principal Sales Engineer @UiPath
Improving development with coded workflows
Idris Janszen, Technical Consultant, Ilionx
15:45 End remarks
16:00 Community fun games, sharing knowledge, drinks, and bites 🍻
Demystifying Neural Networks And Building Cybersecurity ApplicationsPriyanka Aash
In today's rapidly evolving technological landscape, Artificial Neural Networks (ANNs) have emerged as a cornerstone of artificial intelligence, revolutionizing various fields including cybersecurity. Inspired by the intricacies of the human brain, ANNs have a rich history and a complex structure that enables them to learn and make decisions. This blog aims to unravel the mysteries of neural networks, explore their mathematical foundations, and demonstrate their practical applications, particularly in building robust malware detection systems using Convolutional Neural Networks (CNNs).
The Zaitechno Handheld Raman Spectrometer is a powerful and portable tool for rapid, non-destructive chemical analysis. It utilizes Raman spectroscopy, a technique that analyzes the vibrational fingerprint of molecules to identify their chemical composition. This handheld instrument allows for on-site analysis of materials, making it ideal for a variety of applications, including:
Material identification: Identify unknown materials, minerals, and contaminants.
Quality control: Ensure the quality and consistency of raw materials and finished products.
Pharmaceutical analysis: Verify the identity and purity of pharmaceutical compounds.
Food safety testing: Detect contaminants and adulterants in food products.
Field analysis: Analyze materials in the field, such as during environmental monitoring or forensic investigations.
The Zaitechno Handheld Raman Spectrometer is easy to use and features a user-friendly interface. It is compact and lightweight, making it ideal for field applications. With its rapid analysis capabilities, the Zaitechno Handheld Raman Spectrometer can help you improve efficiency and productivity in your research or quality control workflows.
Increase Quality with User Access Policies - July 2024Peter Caitens
⭐️ Increase Quality with User Access Policies ⭐️, presented by Peter Caitens and Adam Best of Salesforce. View the slides from this session to hear all about “User Access Policies” and how they can help you onboard users faster with greater quality.
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc
In a landmark year marked by significant AI advancements, it’s vital to prioritize transparency, accountability, and respect for privacy rights with your AI innovation.
Learn how to navigate the shifting AI landscape with our innovative solution TRUSTe Responsible AI Certification, the first AI certification designed for data protection and privacy. Crafted by a team with 10,000+ privacy certifications issued, this framework integrated industry standards and laws for responsible AI governance.
This webinar will review:
- How compliance can play a role in the development and deployment of AI systems
- How to model trust and transparency across products and services
- How to save time and work smarter in understanding regulatory obligations, including AI
- How to operationalize and deploy AI governance best practices in your organization
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Zilliz
Enterprises have traditionally prioritized data quantity, assuming more is better for AI performance. However, a new reality is setting in: high-quality data, not just volume, is the key. This shift exposes a critical gap – many organizations struggle to understand their existing data and lack effective curation strategies and tools. This talk dives into these data challenges and explores the methods of automating data curation.
"Making .NET Application Even Faster", Sergey Teplyakov.pptxFwdays
In this talk we're going to explore performance improvement lifecycle, starting with setting the performance goals, using profilers to figure out the bottle necks, making a fix and validating that the fix works by benchmarking it. The talk will be useful for novice and seasoned .NET developers and architects interested in making their application fast and understanding how things work under the hood.
It's your unstructured data: How to get your GenAI app to production (and spe...Zilliz
So you've successfully built a GenAI app POC for your company -- now comes the hard part: bringing it to production. Aparavi addresses the challenges of AI projects while addressing data privacy and PII. Our Service for RAG helps AI developers and data scientists to scale their app to 1000s to millions of users using corporate unstructured data. Aparavi’s AI Data Loader cleans, prepares and then loads only the relevant unstructured data for each AI project/app, enabling you to operationalize the creation of GenAI apps easily and accurately while giving you the time to focus on what you really want to do - building a great AI application with useful and relevant context. All within your environment and never having to share private corporate data with anyone - not even Aparavi.
Self-Healing Test Automation Framework - HealeniumKnoldus Inc.
Revolutionize your test automation with Healenium's self-healing framework. Automate test maintenance, reduce flakes, and increase efficiency. Learn how to build a robust test automation foundation. Discover the power of self-healing tests. Transform your testing experience.
DefCamp_2016_Chemerkin_Yury-publish.pdf - Presentation by Yury Chemerkin at DefCamp 2016 discussing mobile app vulnerabilities, data protection issues, and analysis of security levels across different types of mobile applications.
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...Fwdays
.NET 8 brought a lot of improvements for developers and maturity to the Azure serverless container ecosystem. So, this talk will cover these changes and explain how you can apply them to your projects. Another reason for this talk is the re-invention of Serverless from a DevOps perspective as a Platform Engineering trend with Backstage and the recent Radius project from Microsoft. So now is the perfect time to look at developer productivity tooling and serverless apps from Microsoft's perspective.
The Challenge of Interpretability in Generative AI Models.pdfSara Kroft
Navigating the intricacies of generative AI models reveals a pressing challenge: interpretability. Our blog delves into the complexities of understanding how these advanced models make decisions, shedding light on the mechanisms behind their outputs. Explore the latest research, practical implications, and ethical considerations, as we unravel the opaque processes that drive generative AI. Join us in this insightful journey to demystify the black box of artificial intelligence.
Dive into the complexities of generative AI with our blog on interpretability. Find out why making AI models understandable is key to trust and ethical use and discover current efforts to tackle this big challenge.
3. Now
• Personally rewri8en large porRons of HBase
for 0.20
– Code easy to work with, understand, modify
• Recently voted to commi8er status (thanks!)
• Now giving presentaRons (hi!)
NOSQL Meetup
4. Four Point Agenda
• What is HBase?
• Why HBase?
• HBase 0.20
• HBase At Stumbleupon
NOSQL Meetup
5. What is HBase?
• Clone of Bigtable ‐
h8p://labs.google.com/papers/bigtable.html
• Created originally at Powerset in 2007
• Hadoop‐subproject
– The usual ASF things apply (license, JIRA, etc)
NOSQL Meetup
7. Table & Regions
• Rows stored in byte‐lexographic sorted order
• Table dynamically split into “regions”
• Each region contains values [startKey, endKey)
• Regions hosted on a regionserver
NOSQL Meetup
11. Column Families
• Table consists of 1+ “column families”
• Column family is unit of performance tuning
• Stored in separate set of files
• Column names scoped like so:
– “Family:qualifier”
NOSQL Meetup
12. SorCng
• Rows stored in byte‐lexographical order (row
keys are raw bytes, not just strings)
• Furthermore within a row, columns stored in
sorted order
• Fast, cheap easy to scan adjacent rows &
columns
NOSQL Meetup
13. SorCng (but there’s more!)
• Not just scanning, but can do parRal‐key
lookups
• When combined with compound keys, has the
same properRes as leading‐lel edge indexes
in standard RDBMS
– (Except your index is distributed of course)
• Can use a second table to index a primary
table.
NOSQL Meetup
16. API Example
Scan scan = new Scan(startRow,
endRow).addFamily(“family”);
ResultScanner scanner = table.getScanner(scan);
Result result;
while ( (result=scanner.next()) != null) {
EnRty e = new EnRty();
dser.deserialize(e, result.getValue("default”, “0”);
}
scanner.close();
NOSQL Meetup
17. Why HBase?
• Community is highly acRve, diverse, helpful
• User list Email acRvity for May: 78 threads
• IRC Channel #hbase highly acRve
• Helpful people in mulRple Rmezones, email
answered all hours of the day/night/weekend.
NOSQL Meetup
18. Why HBase?
• Commi8er & contributor base broad:
– PSet, Streamy, SU, Trend Micro, Openplaces, and
more!
• No monopoly on experts – deep knowledge at
these companies and more!
• (We’re really friendly… honest!)
NOSQL Meetup
24. HBase 0.20 vs 0.19
0.19 0.20
Master Single master – if it fails, so Master elecRon and
does the cluster membership via ZK
Compression Not really GZ, LZO
Memory usage Small values cause big New file‐format limits
indexes and OOM index size (800kB for 10m
entries)
Scan Speed 300‐600ms per 500 rows 20‐30ms per 500 rows
NOSQL Meetup
27. Performance
• Significant performance gains in 0.20
• New file format with 0‐copy infrastructure
• Scan and get improvements
• LZO compression
• Block caching
• Speed increases as much as 30x!
NOSQL Meetup
29. Performance Numbers
• 1m rows, 1 column per row (~16 bytes)
– SequenRal insert: 24s, .024ms/row
– Random read: 1.42ms/row (avg)
– Full Scan: 11s, (117ms/10k rows)
• Performance under cache is very high:
– 1ms to get single row
– 20 ms to read 550 rows
– 75ms to get 5500 rows
NOSQL Meetup
31. Big accomplishments @ SU
• Over 9b small rows in single table
– Sustained import performance – 3‐4 days to
import 9b rows (mysql limiRng speed)
• 1.2m row reads/sec on 19 nodes (!!)
– That is 60‐100k reads/sec/node sustained, 2hrs
– Scalable with more nodes
– HBase has been improved since then
NOSQL Meetup
34. HBase deployment trivia
• Nodes are 8x16 w/2TB (best price point)
– Don’t use RAID1. Use RAID0 or JBOD support
• Ganglia allows overall cluster performance
monitoring
• Clusters won’t span datacenters
– We want fully duplicate data for DR anyways
• Update master with code & config
– Rsync to other nodes (1 dir, very easy)
– Controlled restart for rolling upgrade
NOSQL Meetup
35. HBase deployment trivia
• HDFS – set xciever limit to 2048, Xmx2000m
– Never get HDFS problems even under heavy load
• For 9b row import, randomized key insert order
gives substanRal speedup
• Give HBase enough ram, you wouldn’t starve
mysql!
• Import speeds of 200k ops/sec on 19 machines
possible!
– Hard to provide a SQL‐based source fast enough
– 100k ops/sec typical for sustained
NOSQL Meetup
37. HBase future @ SU
• Latency sensiRve cluster
• Batch/analyRcs cluster
• Use replicaRon to keep la8er up to date
• Allows batch jobs to go full thro8le against
reasonably up to date data without risking the
website
NOSQL Meetup