Taming the resource tiger

•Download as PPTX, PDF•

2 likes•616 views

This document discusses various computing concepts related to resources, data storage, and performance. It covers topics like hard disk drives, solid state drives, areal storage density, streams, filters, memory management, CPU performance, networking, and best practices for handling large amounts of data and potential failures. The key ideas are to use appropriate data structures, iterate/process data lazily, offload work to queues when possible, and design systems with failure in mind.

Taming the
resource tiger
You cannot hide from Physics!

Data Storage
• Hard Disk Drive - HDD
• Magnetizes a thin film of ferromagnetic material on a disk
• Reads it with a magnetic head on an actuator arm
• Solid State Drive – SSD
• Uses integrated circuit assemblies as memory to store data persistently
• No moving parts

Areal Storage Density
• SSD
• 2.8 Tbit/in2
• HDD
• 1.5 Tbit/in2
Terabits per square inch – numbers as of 2016 (see Wikipedia, our materials are
improving)

Streams: Computing Concept
Definitions
• Idea originating in 1950’s
• Standard way to get Input and
Output
• A source or sink of data
Who uses them
• C – stdin, stderr, stdout
• C++ iostream
• Perl IO
• Python io
• Java
• C#

What is a Stream?
• Access input and output generically
• Can write and read linearly
• May or may not be seekable
• Comes in chunks of data

Why do I care about streams?
• They are created to handle massive amounts of data
• Assume all files are too large to load into memory
• If this means checking size before load, do it
• If this means always treating a file as very large, do it
• PHP streams were meant for this!

What uses streams in PHP?
• EVERYTHING
• include/require _once
• stream functions
• file system functions
• many other extensions

ALL IO
Attach
Context
Stream
Transport
Stream
Filter
Stream
Wrapper
How PHP Streams Work

What are Filters?
• Performs operations on stream data
• Can be prepended or appended (even on the fly)
• Can be attached to read or write
• When a filter is added for read and write, two instances of the filter are
created.

Things to watch for!
• Data has an input and output state
• When reading in chunks, you may need to cache in between reads to make
filters useful
• Use the right tool for the job

Throw away your assumptions except for:
There will be Terabytes of Cat Gifs!!

Dimension
Both an object’s size and mathematical space

Random Access Memory (RAM)
• The CPU uses RAM to work
• It randomly shoves data inside and pulls data back out
• RAM is faster then SSD and HDD
• It’s also more expensive

There are two reasons you’ll see that error
• Recursion recursion recursion recursion
• Solution: install xdebug and get your stacktrace
• Loading too much data into memory
• Solution: manage your memory

Inherently PHP hides this problem
• Share nothing architecture
• Extensions with C libraries that hide memory consumption
• FastCGI/CGI blows away processes, restoring memory
• Max child and other Apache settings blow away children, restoring memory

Arrays are evil
• There are other ways to store data that are more efficient
• They should be used for small amounts of data
• No matter how hard you try, there is C overhead

Process with the appropriate tools
• Load data into the appropriate place for processing
• Hint – arrays are IN MEMORY – that is generally not an appropriate place
for processing
• Datastores are meant for storing and retrieving data, use them

Use the iteration, Luke
• Lazy fetching exists for database fetching – use it!
• Always page (window) your result sets from the database – ALWAYS
• Use filters or generators to format or alter results on the fly

The N+1 problem
• In simple terms, nested loops
• Don’t distance yourself too much from your datastore
• Collapse into one or two queries instead

Speed
The rate at which an object covers distance

CPU limitations
• Transmission delays
• Heat
• Both are materials limitations
• http://www.mooreslaw.org/

Profiling Options
Free
• Xdebug
• Xhprof
• Uprofiler
Paid
• NewRelic
• AppDynamics
• Blackfire
• Tideways
• Tracelytics

Distribute the load
• Perfect for heavy processing for some type of data
• Queue code that requires heavy processing but not immediate viewing
• Design your UX so you can inform users of completed jobs
• Cache complex work items

Pick your system
• php-resque
• Gearman
• Beanstalkd
• IronMQ
• RabbitMQ
• ZeroMQ
• AmazonSQS
• Just visit http://queues.io

Keep your CPU happy
• Offload processing
• Use a queue

Networking 101
• IP – forwards packets of data based on a destination address
• TCP – verifies the correct delivery of data from client to server with error
and lost data correction
• Network Sockets – subroutines that provide TCP/IP (and UDP and some
other support) on most systems

Speed in the series of tubes
• Bandwidth – size of your pipe
• Latency – length of your pipe including size changes
• Jitter – air bubbles in your pipe

Network Socket Types
• Stream
• Connection oriented (tcp)
• Datagram
• Connectionless (udp)
• Raw
• Low level protocols

Definitions
• Socket
• Bidirectional network stream that speaks a protocol
• Transport
• Tells a network stream how to communicate
• Wrapper
• Tells a stream how to handle specific protocols and encodings

What does this have to do with PHP?
• APIs fail
• APIs go byby
• AWS goes down
• Or loses network connection to a specific area
• Or otherwise fails

Prepare for failure
• Handle timeouts
• Handle failures
• Abstract enough to replace systems if necessary, but only as much as
necessary
• If you’re not paying for it, don’t base your business model on it

SHHHHHH
• Plotting
• https://github.com/phplang/streams2/wiki
• PHP is always improving!

About Me
 http://emsmith.net
 auroraeosrose@gmail.com
 twitter - @auroraeosrose
 IRC – freenode – auroraeosrose
 #phpmentoring
 https://joind.in/talk/18dd4

What's hot

System Programming and Administration

Krasimir Berov (Красимир Беров)

Avro - More Than Just a Serialization Framework - CHUG - 20120416

Chicago Hadoop Users Group

The document discusses Apache Avro, a data serialization framework. It provides an overview of Avro's history and capabilities. Key points include that Avro supports schema evolution, multiple languages, and interoperability with other formats like Protobuf and Thrift. The document also covers implementing Avro, including using the generic, specific and reflect data types, and examples of writing and reading data. Performance is addressed, finding that Avro size is competitive while speed is in the top half.

Thrift vs Protocol Buffers vs Avro - Biased Comparison

Igor Anishchenko

Igor Anishchenko Odessa Java TechTalks Lohika - May, 2012 Let's take a step back and compare data serialization formats, of which there are plenty. What are the key differences between Apache Thrift, Google Protocol Buffers and Apache Avro. Which is "The Best"? Truth of the matter is, they are all very good and each has its own strong points. Hence, the answer is as much of a personal choice, as well as understanding of the historical context for each, and correctly identifying your own, individual requirements.

Avro

Eric Turcotte

Modern Black Mages Fighting in the Real World

SATOSHI TAGOMORI

This document describes a presentation about introducing black magic programming patterns in Ruby and their pragmatic uses. It provides an overview of Fluentd, including what it is, its versions, and the changes between versions 0.12 and 0.14. Specifically, it discusses how the plugin API was updated in version 0.14 to address problems with the version 0.12 API. It also explains how a compatibility layer was implemented to allow most existing 0.12 plugins to work without modification in 0.14.

Great Tools Heavily Used In Japan, You Don't Know.

Junichi Ishida

The document discusses Japanese Perl developers who attended YAPC::EU 2015. It introduces many popular Perl modules created by Japanese developers, such as WebService::Simple for making web service requests, Riji for creating blogs, and GrowthForecast for visualizing metrics graphs. It encourages attendees to talk to the Japanese developers about their work or any questions. It emphasizes that Japanese developers prioritize speed and simplicity in their modules due to their culture of valuing efficiency.

Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)

Tim Bunce

Serialization (Avro, Message Pack, Kryo)

오석 한

Serialization is the process of converting an object into a byte stream to store or transmit the object. The document discusses three serialization frameworks: Avro, MessagePack, and Kryo. Avro uses a JSON-defined schema and is created by the creator of Hadoop. MessagePack supports rich data structures like JSON and has interfaces for RPC. Kryo makes serialization easy by collecting serializers by class and supports compression.

Serialization and performance by Sergey Morenets

Alex Tumanoff

The document discusses serialization frameworks in Java and compares their performance. It provides an overview of popular serialization frameworks like Java serialization, Kryo, Protocol Buffers, Jackson, Google GSON, and others. Benchmark tests were conducted to compare the frameworks' speed of serialization and deserialization, as well as the size of serialized objects. Kryo with optimizations was generally the fastest, while Protocol Buffers was very fast for simple objects. The document concludes with recommendations on when to use different frameworks.

Clojure in real life 17.10.2014

Metosin Oy

This document discusses Clojure, an immutable, functional programming language that runs on the JVM. It provides an overview of Clojure basics like its syntax, data structures, concurrency features and macros. It then discusses how Clojure is used in practice, including common tools, frameworks and patterns for web development, testing, persistence and more. Real-world examples are given throughout.

Avro intro

Randy Abernethy

3 avro hug-2010-07-21

Hadoop User Group

Avro is a data serialization system that provides data interchange and interoperability between systems. It allows for efficient encoding of data and schema evolution. Avro defines data using JSON schemas and provides dynamic typing which allows data to be read without code generation. It includes a file format and supports MapReduce workflows. Avro aims to become the standard data format for Hadoop applications by providing rich data types, interoperability between languages, and compatibility between versions.

CBOR - The Better JSON

Christoph Engelbert

Asynchronous I/O in Python 3

Feihong Hsu

Php

mohamed ashraf

PHP is an interpreted scripting language commonly used for web development. It allows embedding scripts in HTML pages using escapes and can also be used for command line and GUI applications. PHP has a large number of built-in features and functions contributed by volunteers to perform tasks like form processing, session handling, database interaction and more. While easy to get started with, PHP may not be the best choice for large, complex projects due to lack of type safety and its treatment of objects and classes.

Concurrency & Parallel Programming

Ramazan AYYILDIZ

The Parenscript Common Lisp to JavaScript compiler

Vladimir Sedach

TypeProf for IDE: Enrich Development Experience without Annotations

mametter

The document discusses TypeProf for IDE, a VSCode extension powered by TypeProf, a static type analyzer for Ruby. It allows achieving aspects of modern development experience like on-the-fly error reporting and type inference without type annotations. The demo shows features like method signature hints, error reporting, and completion working through the language server protocol. Future work includes improving parser robustness and optimizing analysis performance.

Experience protocol buffer on android

Richard Chang

The document introduces Protocol Buffers and how to use them on Android. It discusses the benefits of Protocol Buffers such as speed, size, and compatibility. It also provides a demo comparing the performance of Protocol Buffers to GSON, showing that Protocol Buffers are faster and produce smaller data sizes. The document concludes by mentioning some drawbacks of Protocol Buffers and providing references for further information.

What's hot (19)

System Programming and Administration

Avro - More Than Just a Serialization Framework - CHUG - 20120416

Thrift vs Protocol Buffers vs Avro - Biased Comparison

Avro

Modern Black Mages Fighting in the Real World

Great Tools Heavily Used In Japan, You Don't Know.

Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)

Serialization (Avro, Message Pack, Kryo)

Serialization and performance by Sergey Morenets

Clojure in real life 17.10.2014

Avro intro

3 avro hug-2010-07-21

CBOR - The Better JSON

Asynchronous I/O in Python 3

Php

Concurrency & Parallel Programming

The Parenscript Common Lisp to JavaScript compiler

TypeProf for IDE: Enrich Development Experience without Annotations

Experience protocol buffer on android

Viewers also liked

Kicking off with Zend Expressive and Doctrine ORM (Sunshine PHP 2017)

James Titcumb

Dip Your Toes in the Sea of Security

James Titcumb

Security is an enormous topic, and it’s really, really complicated. If you’re not careful, you’ll find yourself vulnerable to any number of attacks which you definitely don’t want to be on the receiving end of. This talk will give you just a taster of the vast array of things there is to know about security in modern web applications, such as writing secure PHP web applications and securing a Linux server. Whether you are writing anything beyond a basic brochure website, or even developing a complicated business web application, this talk will give you insights to some of the things you need to be aware of.

SunshinePHP 2017 - Making the most out of MySQL

Gabriela Ferrara

The new JSON fields are some of the most talking about new features in MySQL 5.7. But they are by no means the only awesome things this version has to offer. MySQL 5.7 is a year old, so this talk won't be an introduction to this version. We will be digging into 5.7 to see how to make the most of the tools available in it. Want to tackle important practical problem solving for your data, make your query performance analysis more efficient or look at how virtual columns can help you index data? This talk is for you!

A World Without PHP

Ben Marks

Imagine a world in which your career, the careers of your friends and coworkers, and the businesses and industries built on PHP vanished overnight, or never existed at all. No Facebook. No Wordpress. Billions of dollars in online commerce, all gone. Flickr, Tumblr, MailChimp: poof! This talk presents a dystopian world in which we are stuck updating each other via SMS or (gasp!) MySpace, reading content on corporate portals, and buying everything from just a handful of online marketplaces.

My app is secure... I think

Wim Godden

With more and more sites falling victim to data theft, you've probably read the list of things (not) to do to write secure code. But what else should you do to make sure your code and the rest of your web stack is secure ? In this tutorial we'll go through the basic and more advanced techniques of securing your web and database servers, securing your backend PHP code and your frontend javascript code. We'll also look at how you can build code that detects and blocks intrusion attempts and a bunch of other tips and tricks to make sure your customer data stays secure.

JWT - To authentication and beyond!

Luís Cobucci

Amp your site: An intro to accelerated mobile pages

Robert McFrazier

This document introduces Accelerated Mobile Pages (AMP). It discusses how AMP addresses the problems of slow mobile page speeds and inconsistent user experiences by making pages load near-instantly. AMP uses HTML, CSS and JavaScript to simplify pages and optimize resources. The AMP cache hosted by Google further improves speeds by serving validated AMP pages from a global proxy. In summary, AMP aims to make mobile pages fast, easy to implement and embrace open web standards.

From Docker to Production - SunshinePHP 2017

Chris Tankersley

Congrats! You and your coworkers love Docker. Docker has become an increasingly helpful tool when it comes to devops. We can now build smaller, more robust local development setups with the promise of mirroring production. One thing that still plagues many situations is how to get those containers into production and update them over time. We will explore different tools for setting up, configuring, and maintaining containers as they go live.

PHP UK 2017 - Don't Lose Sleep - Secure Your REST

Adam Englander

Are you worried that your REST API may be the next victim of an attack by ruthless hackers? Don't fret. Utilizing the same standards implemented by OAuth 2.0 and OpenID Connect, you can secure your REST API. Open and proven standards are the best ways to secure your REST APIs for now and well into the future. JSON Object Signing and Encryption (JOSE) is the core of a truly secure standards based REST API. In this talk, you will learn how to use the components of JOSE to secure your REST API for now and the future.

Docker for Developers - Sunshine PHP

Chris Tankersley

Thanks to tools like vagrant, puppet/chef, and Platform as a Service services like Heroku, developers are extremely used to being able to spin up a development environment that is the same every time. What if we could go a step further and make sure our development environment is not only using the same software, but 100% configured and set up like production. Docker will let us do that, and so much more. We'll look at what Docker is, why you should look into using it, and all of the features that developers can take advantage of.

WordPress for the modern PHP developer

Chris Sherry

The document contains tweets from a conference session about using WordPress and version control best practices for modern PHP developers. Some key points discussed include: - Setting up a WordPress site under version control using Git - Managing WordPress core and plugin/theme dependencies with Composer - Using plugins from packagist like wpackagist to install plugins via Composer - Developing a custom plugin and publishing it to GitHub to be installed as a dependency

Building Big on the Web

Cal Henderson

The document discusses tools and processes for building and maintaining large websites and web applications. It covers topics like source control, bug tracking, continuous integration, testing, release management, monitoring, and automation. The overall message is that tools can help manage complexity when developing, deploying, and maintaining software, and automation of common tasks is important for large projects.

Scalable Web Architectures: Common Patterns and Approaches

adunne

The document discusses scalable web architectures and common patterns. It covers topics like what scalability means, different types of architectures, load balancing, and how components like application servers, databases, and other services can be scaled horizontally to handle increased traffic and data loads. The presentation is given in 12 parts that define scalability, discuss myths, and describe scaling strategies for application servers, databases, load balancing, and other services.

Php extensions

Elizabeth Smith

This document provides an overview of PHP extensions, including reasons for creating extensions, the extension development process, and advanced extension topics. It begins with an introduction to extensions and why developers create them. It then covers the basic process of creating an extension, including setting up the development environment, writing the scaffolding, and compiling and testing the extension. Later sections discuss more advanced extension features like classes, methods, and object handling. The document aims to equip developers with the knowledge to begin developing their own PHP extensions and integrating PHP with external libraries.

PHP7 - Scalar Type Hints & Return Types

Eric Poe

flickr's architecture & php

coolpics

- Flickr uses PHP for its page and application logic, along with technologies like Smarty, PEAR, Perl, and Java. The site has over 60,000 lines of PHP code and templates. - The logical architecture includes page logic, application logic, photo storage, APIs, templates, third-party apps. The physical architecture includes static servers, database servers, and node servers. - PHP is used for page and application logic. Smarty is used for templating. MySQL is used for storage. Java is used in the node service. Flickr scales horizontally by adding more hardware over time.

PHP: 4 Design Patterns to Make Better Code

SWIFTotter Solutions

I present four design patterns that make your development easier and better. Design patterns are a fantastic way to make more readable code, as they make use of common ideas that many developers know and use. These patterns are tried and tested in the enterprise world. The first one is dependency injection. This covers putting the variables that a class needs to function preferably inside a constructor. The second one is the factory pattern. A factory moves the responsibility of instantiating an object to a third-party class. The third one is dependency injection. This allows us to place a class' dependencies at one time, making it easy to come back and see what the class needs to survive. Finally, we discuss the chain of responsibility. This allows complex operations to be handled by a chain of classes. Each class in the chain determines whether it is capable of handling the request and, if so, it returns the result.

Practical Performance Tips and Tricks to Make Your HTML/JavaScript Apps Faster

Doris Chen

How to tackle real-world web platform performance problems in modern websites and apps? This session starts with a basic understanding of the web platform and then explores to a set of problem/solution pairs built from industry-standard performance guidance. In the talk, we will demonstrate performance tips and tricks that will help you improve the performance of your apps and sites today. We will discuss the following respond to network requests, speed and responsiveness, optimizing media usage, and writing fast JavaScript. These performance tips and tricks apply equally to web sites that run on standards based web browsers, as well as to modern apps.

Giving birth to an ElePHPant

Mark Baker

This document discusses promoting diversity within the PHP development community. It introduces PHPDiversity, a nonprofit organization working to promote diversity regardless of attributes like race, gender, sexuality, age, nationality, religion or technology. Diversity is described as embracing differences rather than just equality. It exposes people to new ideas and perspectives. In workplaces and communities, diversity creates greater creativity, better solutions, and attracts a broader range of talent. The document advocates understanding that discrimination creates disadvantages for some while benefiting others. It suggests an "ElePHPant" mascot could help promote diversity through visual reminders and supporting diversity scholarships.

Services Oriented Architecture with PHP and MySQL

Joe Stump

Viewers also liked (20)

Kicking off with Zend Expressive and Doctrine ORM (Sunshine PHP 2017)

Dip Your Toes in the Sea of Security

SunshinePHP 2017 - Making the most out of MySQL

A World Without PHP

My app is secure... I think

JWT - To authentication and beyond!

Amp your site: An intro to accelerated mobile pages

From Docker to Production - SunshinePHP 2017

PHP UK 2017 - Don't Lose Sleep - Secure Your REST

Docker for Developers - Sunshine PHP

WordPress for the modern PHP developer

Building Big on the Web

Scalable Web Architectures: Common Patterns and Approaches

Php extensions

PHP7 - Scalar Type Hints & Return Types

flickr's architecture & php

PHP: 4 Design Patterns to Make Better Code

Practical Performance Tips and Tricks to Make Your HTML/JavaScript Apps Faster

Giving birth to an ElePHPant

Services Oriented Architecture with PHP and MySQL

Similar to Taming the resource tiger

Taming the tiger - pnwphp

Elizabeth Smith

The document discusses various concepts related to programming and physics, including: - There are physical limits to what hardware can do based on laws of physics. - Arrays can be inefficient for storing large amounts of data and other methods may be better. - Streams provide a standard way to access input and output in a linear, chunk-based fashion and are widely used across programming languages and systems.

DataIntensiveComputing.pdf

Brahmam8

This document discusses data intensive computing and its relationship to data curation and preservation. It defines data intensive computing as I/O-bound computations that require large volumes of data that is too big to fit in memory. The role of data infrastructures is described, including bringing compute to archived data through queries, scripts, or APIs. Approaches like MapReduce, Hadoop, and Storm are presented for making best use of resources for data intensive workloads. The conclusion is that data intensive computing requires new ways of parallel computing due to huge data volumes and offers new opportunities for data reuse and reduction.

Technologies for Data Analytics Platform

N Masahiro

This document discusses building a data analytics platform and summarizes various technologies that can be used. It begins by outlining reasons for analyzing data like reporting, monitoring, and exploratory analysis. It then discusses using relational databases, parallel databases, Hadoop, and columnar storage to store and process large volumes of data. Streaming technologies like Storm, Kafka, and services like Redshift, BigQuery, and Treasure Data are also summarized as options for a complete analytics platform.

Scalability, Availability & Stability Patterns

Jonas Bonér

This document provides an overview of patterns for scalability, availability, and stability in distributed systems. It discusses general recommendations like immutability and referential transparency. It covers scalability trade-offs around performance vs scalability, latency vs throughput, and availability vs consistency. It then describes various patterns for scalability including managing state through partitioning, caching, sharding databases, and using distributed caching. It also covers patterns for managing behavior through event-driven architecture, compute grids, load balancing, and parallel computing. Availability patterns like fail-over, replication, and fault tolerance are discussed. The document provides examples of popular technologies that implement many of these patterns.

Scaling etl with hadoop shapira 3

Gwen (Chen) Shapira

This document discusses scaling ETL processes with Hadoop. It describes using Hadoop for extracting data from various structured and unstructured sources, transforming data using MapReduce and other tools, and loading data into data warehouses or other targets. Specific techniques covered include using Sqoop and Flume for extraction, partitioning and tuning data structures for transformation, and loading data in parallel for scaling. Workflow management with Oozie and monitoring with Cloudera Manager are also discussed.

Writing Scalable Software in Java

Ruben Badaró

Meta scale kognitio hadoop webinar

Kognitio

The webinar discusses how organizations can make big data easy to use with the right tools and talent. It presents on MetaScale's expertise in helping Sears Holdings implement Hadoop and how Kognitio's in-memory analytics platform can accelerate Hadoop for organizations. The webinar agenda includes an introduction, a case study on Sears Holdings' Hadoop implementation, an explanation of how Kognitio's platform accelerates Hadoop, and a Q&A session.

Building Big Data Streaming Architectures

David Martínez Rego

Drupal performance

Piyuesh Kumar

This document discusses various techniques for optimizing Drupal performance, including: - Defining goals such as faster page loads or handling more traffic - Applying patches and rearchitecting content to optimize at a code level - Using tools like Apache Benchmark and MySQL tuning to analyze performance bottlenecks - Implementing solutions like caching, memcached, and reverse proxies to improve scalability

Meta scale kognitio hadoop webinar

Michael Hiskey

This webinar discusses tools for making big data easy to work with. It covers MetaScale Expertise, which provides Hadoop expertise and case studies. Kognitio Analytics is discussed as a way to accelerate Hadoop for organizations. The webinar agenda includes an introduction, presentations on MetaScale and Kognitio, and a question and answer session. Rethinking data strategies with Hadoop and using in-memory analytics are presented as ways to gain insights from large, diverse datasets.

Dev nexus 2017

Roy Russo

Elasticsearch is a distributed, RESTful search and analytics engine that can be used for processing big data with Apache Spark. It allows ingesting large volumes of data in near real-time for search, analytics, and machine learning applications like feature generation. Elasticsearch is schema-free, supports dynamic queries, and integrates with Spark, making it a good fit for ingesting streaming data from Spark jobs. It must be deployed with consideration for fast reads, writes, and dynamic querying to support large-scale predictive analytics workloads.

Intro to Big Data

Zohar Elkayam

Server Tips

liqingfang126

This document provides an overview of tips and best practices for high performance server programming. It discusses fundamentals like avoiding blocking, using efficient algorithms and data structures, and separating I/O from business logic. It also covers more advanced topics like I/O models, scheduling approaches, buffer management, concurrency models, and security considerations. Finally, it recommends tools and resources for profiling, debugging, and learning more about TCP/IP and networking.

Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)

Panagiotis Kanavos

This document discusses parallel and asynchronous programming using the Task Parallel Library (TPL) in .NET. It covers how processors are getting smaller so parallelism is important. It provides examples of using TPL for data parallelism by partitioning work over collections and task parallelism by breaking work into steps. It also discusses asynchronous programming with async/await and how TPL handles cancellation, progress reporting, and synchronization contexts.

Flashy prefetching for high performance flash drives

Pratik Bhat

The document describes a technique called "flashy prefetching" that aims to improve data prefetching performance for high-performance flash drives such as SSDs. It recognizes that traditional prefetching is not optimized for flash drives and their characteristics. Flashy prefetching dynamically controls prefetching aggressiveness based on drive performance, prefetching performance, and enabling prefetching for multiple accesses. An implementation called "Prefetchd" achieved on average 20% speedup on large file benchmarks and 65-70% prefetching accuracy.

Best practices for highly available and large scale SolrCloud

Anshum Gupta

Backup and Disaster Recovery in Hadoop

DataWorks Summit/Hadoop Summit

While you could be tempted assuming data is already safe in a single Hadoop cluster, in practice you have to plan for more. Questions like: "What happens if the entire datacenter fails?, or "How do I recover into a consistent state of data, so that applications can continue to run?" are not a all trivial to answer for Hadoop. Did you know that HDFS snapshots are handling open files not as immutable? Or that HBase snapshots are executed asynchronously across servers and therefore cannot guarantee atomicity for cross region updates (which includes tables)? There is no unified and coherent data backup strategy, nor is there tooling available for many of the included components to build such a strategy. The Hadoop distributions largely avoid this topic as most customers are still in the "single use-case" or PoC phase, where data governance as far as backup and disaster recovery (BDR) is concerned are not (yet) important. This talk first is introducing you to the overarching issue and difficulties of backup and data safety, looking at each of the many components in Hadoop, including HDFS, HBase, YARN, Oozie, the management components and so on, to finally show you a viable approach using built-in tools. You will also learn not to take this topic lightheartedly and what is needed to implement and guarantee a continuous operation of Hadoop cluster based solutions.

Backup and Disaster Recovery in Hadoop

larsgeorge

From: DataWorks Summit Munich 2017 - 20170406 While you could be tempted assuming data is already safe in a single Hadoop cluster, in practice you have to plan for more. Questions like: "What happens if the entire datacenter fails?, or "How do I recover into a consistent state of data, so that applications can continue to run?" are not a all trivial to answer for Hadoop. Did you know that HDFS snapshots are handling open files not as immutable? Or that HBase snapshots are executed asynchronously across servers and therefore cannot guarantee atomicity for cross region updates (which includes tables)? There is no unified and coherent data backup strategy, nor is there tooling available for many of the included components to build such a strategy. The Hadoop distributions largely avoid this topic as most customers are still in the "single use-case" or PoC phase, where data governance as far as backup and disaster recovery (BDR) is concerned are not (yet) important. This talk first is introducing you to the overarching issue and difficulties of backup and data safety, looking at each of the many components in Hadoop, including HDFS, HBase, YARN, Oozie, the management components and so on, to finally show you a viable approach using built-in tools. You will also learn not to take this topic lightheartedly and what is needed to implement and guarantee a continuous operation of Hadoop cluster based solutions.

Big Data for QAs

Ahmed Misbah

Devnexus 2018

Roy Russo

Elasticsearch is a distributed, RESTful search and analytics engine that can be used for processing big data with Apache Spark. Data is ingested from Spark into Elasticsearch for features generation and predictive modeling. Elasticsearch allows for fast reads and writes of large volumes of time-series and other data through its use of inverted indexes and dynamic mapping. It is deployed on AWS for its elastic scalability, high availability, and integration with Spark via fast queries. Ongoing maintenance includes archiving old data, partitioning indices, and reindexing large datasets.

Similar to Taming the resource tiger (20)

Taming the tiger - pnwphp

DataIntensiveComputing.pdf

Technologies for Data Analytics Platform

Scalability, Availability & Stability Patterns

Scaling etl with hadoop shapira 3

Writing Scalable Software in Java

Meta scale kognitio hadoop webinar

Building Big Data Streaming Architectures

Drupal performance

Meta scale kognitio hadoop webinar

Dev nexus 2017

Intro to Big Data

Server Tips

Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)

Flashy prefetching for high performance flash drives

Best practices for highly available and large scale SolrCloud

Backup and Disaster Recovery in Hadoop

Big Data for QAs

Devnexus 2018

More from Elizabeth Smith

Welcome to the internet

Elizabeth Smith

The document provides an overview of the history and workings of the internet. It describes how ARPA funded research in the 1960s to develop a decentralized network that could withstand attacks, leading to the creation of ARPANET. Key developments included packet switching, TCP/IP, DNS, personal computers, hypertext, browsers, and HTML, which together formed the foundation of today's worldwide internet. The internet allows data to be broken into packets and routed independently to a destination, ensuring reliable transmission of information.

Database theory and modeling

Elizabeth Smith

Modern sql

Elizabeth Smith

This document discusses modern SQL features beyond the SQL-92 standard, including OLAP features like grouping sets, cube, and rollup for multi-dimensional analysis; common table expressions (WITH queries) for organizing complex queries and enabling recursion; lateral joins for iterating over query results; window functions for ranking and aggregating over partitions; and the use of JSON data types in PostgreSQL for combining SQL and NoSQL capabilities. It provides examples and discusses support for these features across major database systems.

Php extensions

Elizabeth Smith

PHP extensions allow modifying and extending the PHP language. There are different types of extensions including wrapper extensions for interfacing with C libraries, speed and algorithm extensions for optimizing slow code, and Zend extensions for modifying the PHP engine. Writing extensions requires knowledge of C, the PHP internals including zvals and the PHP lifecycle, and using tools like phpize to generate the extension scaffolding. The document provides guidance on setting up a development environment, writing extension code, and testing extensions. It also outlines best practices for extension coding.

Lexing and parsing

Elizabeth Smith

Lexing and parsing involves breaking down input like code, markup languages, or configuration files into individual tokens and analyzing the syntax and structure according to formal grammars. Common techniques include using lexer generators to tokenize input and parser generators to construct parse trees and abstract syntax trees based on formal grammars. While regular expressions are sometimes useful, lexers and parsers are better suited for many formal language tasks and ensure well-formed syntax.

Hacking with hhvm

Elizabeth Smith

The document summarizes HHVM, a virtual machine for executing PHP code. Some key points: - HHVM is a drop-in replacement for PHP that compiles PHP to bytecode and uses a just-in-time (JIT) compiler to optimize for performance. - It supports most PHP syntax and features like Hack which adds type hints. It also has its own features like async functions, user attributes, and XHP for building components with XHTML syntax. - HHVM is faster than PHP due to its JIT compiler which performs type inference and compiles hot code paths to native machine code. Benchmark tests show significant performance improvements over PHP for applications like Magento and Symfony.

Security is not a feature

Elizabeth Smith

The document discusses security as an ongoing process rather than a feature or checklist. It emphasizes that security requires thinking like a paranoid person and acknowledging that systems will eventually be hacked. The document provides steps to take such as knowing your data, users, and laws; making good security decisions; documenting everything; and practicing security processes. It also gives best practices for different security layers like input validation, authentication, authorization, and more. The overall message is that security requires constant attention and effort from all parties.

Using unicode with php

Elizabeth Smith

1. Unicode is an international standard for representing characters across different languages. It allows websites and software to support multiple languages. 2. When working with Unicode in PHP, it is important to use UTF-8 encoding, and extensions like intl provide helpful internationalization functions. 3. Common issues include character encoding problems between databases, files and PHP strings, so ensuring consistent encoding is crucial.

Mentoring developers-php benelux-2014

Elizabeth Smith

How to train the next generation of Masters One of the best ways to move yourself forward as a developer is to have mentors who can help improve your skills, or to be a mentor for a newer developer. Mentoring isn’t limited to just ‘hard’ or technical skills, and a mentoring relationships can help in all aspects of any career – be it open source, a day job, or something else entirely. Learn some skills and tips from people who make mentoring an important aspect of their lives. From how to choose a mentor and what you should expect from a relationship as a padawan, to how to deal with the trials and successes of the person you are mentoring as they grow in their career. Also learn about setting up mentorship organizations, from the kind inside a company to one purely for the good of a community.

Using unicode with php

Elizabeth Smith

1. The document discusses internationalization and Unicode support in PHP, covering topics like encodings, locales, formatting numbers and dates for different languages, and database and browser considerations. 2. It provides an overview of PHP extensions and functions for internationalization, including Intl, mbstring, and Iconv, and discusses their strengths and limitations. 3. Examples of internationalization practices in popular PHP frameworks and applications are examined, highlighting both best practices and common pitfalls.

Socket programming with php

Elizabeth Smith

This document discusses socket programming in PHP. It begins with an overview of inter-process communication and network sockets. It then covers PHP streams and how they provide a generic interface for input and output. The document dives into details of socket programming in PHP using different extensions, covering topics like creating, binding, listening for, accepting, reading and writing sockets. It also discusses blocking, selecting sockets and websockets.

Mentoring developers

Elizabeth Smith

The document discusses the mentor-apprentice relationship in different stages from beginning to advanced, outlining expectations, goals, and needs at each stage. It provides guidance on finding mentors and apprentices, deciding on goals, communicating, and handling issues that could arise. The overall message is that mentorship is an ongoing learning process that benefits both parties when entered into with trust, respect, and a commitment to growth.

Do the mentor thing

Elizabeth Smith

Spl in the wild - zendcon2012

Elizabeth Smith

This document provides an overview of the Standard PHP Library (SPL) including common data structures, interfaces, exceptions and iterators. It discusses how SPL components like SplAutoload, SplFileInfo and various iterators are used in popular open source projects. The document encourages developers to get involved in improving SPL through code contributions and articles and provides contact information for the presenter.

Mentoring developers - Zendcon 2012

Elizabeth Smith

This document discusses mentoring relationships and provides advice for mentors and mentees at different stages. It notes that mentoring involves turning a person's raw potential into something useful by providing information, role modeling, advice, networking, and interaction. For mentees, the key aspects are finding a good mentor, defining goals, communicating, and having an exit strategy. Mentors should trust, listen, help mentees achieve goals, and provide constructive feedback and support. The roles and needs change as mentees progress from beginning to intermediate to advanced levels. Mentors should continue challenging mentees and helping them grow outside their comfort zones.

Event and Signal Driven Programming Zendcon 2012

Elizabeth Smith

The document discusses different approaches to event-driven programming such as publish/subscribe, subject/observer, event/handler, and signal/slot patterns. It also covers concepts like asynchronous vs synchronous programming, interrupts, design patterns, and best practices for implementing event-driven code. The document advocates borrowing ideas from other languages and frameworks and leveraging existing PHP extensions and libraries that support event-driven programming.

Mentoring developers

Elizabeth Smith

This document discusses mentoring relationships and provides advice for mentors and mentees at different stages of development. It emphasizes that mentoring is a two-way relationship that benefits both parties. For beginners, mentors should provide best practices, feedback, and resources but not spoonfeed answers. Intermediates need help stretching their skills and networking. Advanced mentees take on more of a peer role and focus on giving back through their own mentorship. Regular communication, defined goals, and addressing failures are keys to a successful mentoring relationship.

Php Extensions for Dummies

Elizabeth Smith

The document provides guidance on how to write PHP extensions in C. It discusses compiling extensions, writing tests, handling data types, using object-oriented features like classes, and documenting extensions. Key steps include setting up the build environment, adding basic scaffolding, writing tests, getting and returning data, and releasing extensions on PECL. Advanced topics covered are globals, memory management, custom objects, and thread safety. The document aims to explain the full process for writing reliable and well-integrated PHP extensions.

Event and signal driven programming

Elizabeth Smith

This document discusses event-driven programming and different event-driven programming patterns. It covers structured programming vs event-driven programming, common event patterns like publish/subscribe, subject/observer, event/handler, and signal/slot. It also discusses asynchronous vs synchronous programming, interrupts, design patterns for event programming, and best practices for implementing event-driven code.

Spl in the wild

Elizabeth Smith

This document provides an overview of the Standard PHP Library (SPL) including common data structures like stacks, queues, heaps and iterators. It discusses how SPL extensions like SplSubject and SplObserver can be used to implement observer patterns. Various real-world open source projects using SPL are cited as examples. The document encourages involvement in improving and expanding SPL through code contributions and community engagement.

More from Elizabeth Smith (20)

Welcome to the internet

Database theory and modeling

Modern sql

Php extensions

Lexing and parsing

Hacking with hhvm

Security is not a feature

Using unicode with php

Mentoring developers-php benelux-2014

Using unicode with php

Socket programming with php

Mentoring developers

Do the mentor thing

Spl in the wild - zendcon2012

Mentoring developers - Zendcon 2012

Event and Signal Driven Programming Zendcon 2012

Mentoring developers

Php Extensions for Dummies

Event and signal driven programming

Spl in the wild

Recently uploaded

Java Training in Chandigarh.Mastering Java: From Fundamentals to Advanced App...

aryan4bhardwaj37

Excel in Java Programming with Excellence Academy‘s top-notch Best Java training & Certification in Chandigarh. Immerse yourself in 100% practical training on live projects from global clients in the USA, UK, France, and Germany. Our comprehensive program covers the development of dynamic web applications, emphasizing Java, Servlets, JSP, Spring, and more. Whether pursuing a full-time one-year diploma or a short-term course, Excellence Academy offers a 2-year validity for your Java programming journey. Our Java training is the gateway to mastering programming languages and building robust, scalable applications. So enroll now the Java Complete Course For Beginners.

Do it again anti Republican shirt Do it again anti Republican shirt

exgf28

The Money Wave 2024 Review_ Is It the Key to Financial Success.pdf

nirahealhty

What is The Money Wave? The Money Wave is a comprehensive financial program designed to equip individuals with the knowledge and tools necessary for achieving financial independence. It encompasses a range of resources, including educational materials, webinars, and community support, all aimed at helping users understand and leverage various financial opportunities. ➡️ Click here to get The Money Wave from the official website. Key Features of The Money Wave Educational Resources: The Money Wave offers a wealth of educational materials that cover essential financial topics, including budgeting, investing, and wealth-building strategies. These resources are designed to empower users with the knowledge needed to make informed financial decisions. Expert Guidance: Users gain access to insights from financial experts who share their experiences and strategies for success. This guidance can be invaluable for individuals looking to navigate the complexities of personal finance. Community Support: The program fosters a supportive community where users can connect with like-minded individuals. This network provides encouragement, accountability, and shared experiences that can enhance the learning process. Actionable Strategies: The Money Wave emphasizes practical, actionable strategies that users can implement immediately. This focus on real-world application sets it apart from other financial programs that may be more theoretical in nature. Flexible Learning: The program is designed to accommodate various learning styles and schedules. Users can access materials at their convenience, making it easier to integrate financial education into their daily lives. Benefits of The Money Wave Increased Financial Literacy: One of the primary benefits of The Money Wave is the enhancement of financial literacy. Users learn essential concepts that enable them to make better financial decisions, ultimately leading to improved financial health. Empowerment: By providing users with the tools and knowledge needed to take control of their finances, The Money Wave empowers individuals to take proactive steps toward achieving their financial goals. Networking Opportunities: The community aspect of The Money Wave allows users to connect with others who share similar financial aspirations. This network can lead to valuable partnerships, collaborations, and support systems. Long-Term Success: The strategies taught in The Money Wave are designed for long-term success. Users are encouraged to adopt a mindset of continuous learning and growth for sustained financial well-being. Accessibility: With its online format, The Money Wave is accessible to anyone with an internet connection. This inclusivity allows individuals from various backgrounds to benefit from the program.

How God led me to DTS? Through many different signs and connections that I c...

AshishMohan57

Latest Deals in the Metaverse & NFT Markets.docx

SFC Today

The Money Wave 2024 Review: Is It the Key to Financial Success?

nirahealhty

What is The Money Wave? The Money Wave is a wealth manifestation software designed to help individuals attract financial abundance through audio tracks. Created by James Rivers, this program uses scientifically-backed methods to improve cognitive functions and reduce stress, thereby enhancing one's ability to manifest wealth. How Does The Money Wave Audio Program Work? The Cash Wave program works by utilizing the force of sound frequencies to overhaul your cerebrum. These audio tracks are designed to promote deep relaxation and improve cognitive functions. The underlying science suggests that specific sound waves can influence brain activity, leading to enhanced problem-solving abilities and reduced stress levels. How to Use The Money Wave Program? Using The Money Wave program is straightforward: Download the Audio Tracks: Once purchased, you can download the audio files from the official website. Listen Daily: For best results, listen to the tracks daily. Consistency is key. Relax and Visualize: Find a quiet place, relax, and visualize your financial goals as you listen. Follow the Guide: The program includes a detailed guide to help you maximize the benefits.

High-Yield Dow Jones Stocks Worth Investing in Today.docx

SFC Today

ADEGUNADEGUNADEGUNADEGUNADEGUNADEGUNADEGUN.pdf

ifraghaffar125

The Ultimate Guide to Web Hosting Reviews in 2024.pdf

Hosting Mastery Hub

Choosing the right web hosting provider can be a daunting task, especially with the plethora of options available. To help you make an informed decision, we’ve compiled comprehensive reviews of some of the top web hosting providers for 2024, with a special focus on Hosting Mastery Hub. This guide will cover the features, pros, cons, and unique offerings of each provider. By the end, you’ll have a clearer understanding of which hosting service best suits your needs.

Module 16 Incineration of Healthcare Waste and the Stockholm Convention Guide...

Beshoelwy

New York Institute of Technology degree Cert diploma offer

ubovu

定制留信网认证Cert【微信：176555708】【NYIT毕业证（纽约理工大学毕业证）成绩单offer】【微信：176555708】（留信学历认证永久存档查询）采用学校原版纸张（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信：176555708】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信：176555708】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份【微信：176555708】 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才 → 【关于价格问题（保证一手价格）我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：可来公司面谈，可签订合同，会陪同客户一起到教育部认证窗口递交认证材料，客户在教育部官方认证查询网站查询到认证通过结果后付款，不成功不收费！办理纽约理工大学毕业证（NYIT毕业证【微信：176555708】外观非常精致，由特殊纸质材料制成，上面印有校徽、校名、毕业生姓名、专业等信息。办理纽约理工大学毕业证（NYIT毕业证【微信：176555708】格式相对统一，各专业都有相应的模板。通常包括以下部分：校徽：象征着学校的荣誉和传承。校名:学校英文全称授予学位：本部分将注明获得的具体学位名称。毕业生姓名：这是最重要的信息之一，标志着该证书是由特定人员获得的。颁发日期：这是毕业正式生效的时间，也代表着毕业生学业的结束。其他信息：根据不同的专业和学位，可能会有一些特定的信息或章节。办理纽约理工大学毕业证（NYIT毕业证【微信：176555708】价值很高，需要妥善保管。一般来说，应放置在安全、干燥、防潮的地方，避免长时间暴露在阳光下。如需使用，最好使用复印件而不是原件，以免丢失。综上所述，办理纽约理工大学毕业证（NYIT毕业证【微信：176555708 】是证明身份和学历的高价值文件。外观简单庄重，格式统一，包括重要的个人信息和发布日期。对持有人来说，妥善保管是非常重要的。

How Can Microsoft Office 365 Improve Your Productivity?

Digital Host

Microsoft Office 365 is a cloud-based subscription service offering essential productivity tools. It includes Word for documents, Excel for data analysis, PowerPoint for presentations, Outlook for email, OneDrive for cloud storage, and Teams for collaboration. Key benefits are accessibility from any device, advanced security, and regular updates. Office 365 enhances collaboration with real-time co-authoring and Teams, streamlines communication with Outlook and Teams Chat, and improves data management with OneDrive and SharePoint. For reliable office 365 hosting, Digital Host offers various subscription plans, setup support, and training resources. Visit https://www.digitalhost.com/email-office/office-365/

How to Become a Digital Marketer in 2024.docx

InfyQ Seo Experts

In today's digital world, digital marketers are indispensable. They play a crucial role in helping businesses connect with their audiences effectively through various online channels. Whether you're considering a career change or aiming to advance in the field, here’s a detailed guide to thriving as a digital marketer in 2024. Why Choose Digital Marketing? Digital marketing encompasses a wide array of strategies aimed at engaging and converting online audiences. From optimizing websites for search engines to crafting compelling social media campaigns and leveraging data analytics, digital marketers drive business growth and enhance brand visibility in the digital sphere. Essential Skills for Success To excel in digital marketing, mastering a diverse skill set is essential: 1. SEO (Search Engine Optimization) Understanding Search Engine Optimization principles is vital for enhancing a website's visibility in search engine results. This includes keyword research, on-page optimization techniques, and building authoritative backlinks to boost organic traffic. 2. PPC (Pay-Per-Click) Advertising PPC advertising involves placing targeted ads on search engines and social media platforms, paying only when users click. Proficiency in platforms like Google Ads and Facebook Ads, along with strategic bidding and ad copywriting skills, is crucial for maximizing campaign ROI. 3. Social Media Marketing Social media platforms serve as powerful tools for engaging with audiences and building brand loyalty. Effective social media marketers understand platform nuances, create engaging content, and utilize analytics to refine strategies and drive meaningful engagement. 4. Content Marketing Content marketing revolves around creating valuable, relevant content that attracts and retains target audiences. This includes blog posts, videos, infographics, and eBooks tailored to resonate with audience interests and needs. 5. Email Marketing Email marketing remains an effective channel for nurturing leads and maintaining customer relationships. Skills in crafting personalized campaigns, segmenting audiences, and analyzing email performance metrics are essential for optimizing campaign effectiveness. 6. Analytics and Data Interpretation Data-driven decision-making is pivotal in digital marketing success. Proficiency in tools like Google Analytics enables marketers to track website traffic, user behavior, and campaign performance, providing actionable insights to drive continuous improvement.

Recently uploaded (13)

Java Training in Chandigarh.Mastering Java: From Fundamentals to Advanced App...

Do it again anti Republican shirt Do it again anti Republican shirt

The Money Wave 2024 Review_ Is It the Key to Financial Success.pdf

How God led me to DTS? Through many different signs and connections that I c...

Latest Deals in the Metaverse & NFT Markets.docx

The Money Wave 2024 Review: Is It the Key to Financial Success?

High-Yield Dow Jones Stocks Worth Investing in Today.docx

ADEGUNADEGUNADEGUNADEGUNADEGUNADEGUNADEGUN.pdf

The Ultimate Guide to Web Hosting Reviews in 2024.pdf

Module 16 Incineration of Healthcare Waste and the Stockholm Convention Guide...

New York Institute of Technology degree Cert diploma offer

How Can Microsoft Office 365 Improve Your Productivity?

How to Become a Digital Marketer in 2024.docx

Taming the resource tiger

1. Taming the resource tiger You cannot hide from Physics!

2. Mass How much matter is in an object

3. Data Storage • Hard Disk Drive - HDD • Magnetizes a thin film of ferromagnetic material on a disk • Reads it with a magnetic head on an actuator arm • Solid State Drive – SSD • Uses integrated circuit assemblies as memory to store data persistently • No moving parts

4. Areal Storage Density • SSD • 2.8 Tbit/in2 • HDD • 1.5 Tbit/in2 Terabits per square inch – numbers as of 2016 (see Wikipedia, our materials are improving)

5. When hard drives go bad

6. Streams: Computing Concept Definitions • Idea originating in 1950’s • Standard way to get Input and Output • A source or sink of data Who uses them • C – stdin, stderr, stdout • C++ iostream • Perl IO • Python io • Java • C#

7. What is a Stream? • Access input and output generically • Can write and read linearly • May or may not be seekable • Comes in chunks of data

8. Why do I care about streams? • They are created to handle massive amounts of data • Assume all files are too large to load into memory • If this means checking size before load, do it • If this means always treating a file as very large, do it • PHP streams were meant for this!

9. What does a this have to do with PHP?

10. The chat that worked for 3 days…

11. What uses streams in PHP? • EVERYTHING • include/require _once • stream functions • file system functions • many other extensions

12. ALL IO Attach Context Stream Transport Stream Filter Stream Wrapper How PHP Streams Work

13. Using Streams

14. You can also do logic on the fly!

15. What are Filters? • Performs operations on stream data • Can be prepended or appended (even on the fly) • Can be attached to read or write • When a filter is added for read and write, two instances of the filter are created.

16. Using Filters

17. Things to watch for! • Data has an input and output state • When reading in chunks, you may need to cache in between reads to make filters useful • Use the right tool for the job

18. Throw away your assumptions except for: There will be Terabytes of Cat Gifs!!

19. Dimension Both an object’s size and mathematical space

20. Random Access Memory (RAM) • The CPU uses RAM to work • It randomly shoves data inside and pulls data back out • RAM is faster then SSD and HDD • It’s also more expensive

21. Out of Memory

22. There are two reasons you’ll see that error • Recursion recursion recursion recursion • Solution: install xdebug and get your stacktrace • Loading too much data into memory • Solution: manage your memory

23. Inherently PHP hides this problem • Share nothing architecture • Extensions with C libraries that hide memory consumption • FastCGI/CGI blows away processes, restoring memory • Max child and other Apache settings blow away children, restoring memory

24. How do I fix it!

25. Halp, I can’t upload!!

26. Arrays are evil • There are other ways to store data that are more efficient • They should be used for small amounts of data • No matter how hard you try, there is C overhead

27. Process with the appropriate tools • Load data into the appropriate place for processing • Hint – arrays are IN MEMORY – that is generally not an appropriate place for processing • Datastores are meant for storing and retrieving data, use them

28. Select * from table

29. Use the iteration, Luke • Lazy fetching exists for database fetching – use it! • Always page (window) your result sets from the database – ALWAYS • Use filters or generators to format or alter results on the fly

30. The N+1 problem • In simple terms, nested loops • Don’t distance yourself too much from your datastore • Collapse into one or two queries instead

31. Throw away all your assumptions except:

32. Speed The rate at which an object covers distance

33. How does a CPU work?

34. CPU limitations • Transmission delays • Heat • Both are materials limitations • http://www.mooreslaw.org/

35. Why I no longer overclock

36. What does this have to do with PHP? • You are limited by the CPU your site is deployed upon. • Yes even in a cloud – there are still physical systems running your stuff • Yes even in a VM – there are still physical systems running your stuff • Follow good programming habits • PROFILE

37. Good programming habits • Turn on opcache in production! • Keep your code error AND WARNING free • Watch complex logic in loops • Short circuit the loop • Rewrite to do the logic on the entire set in one step • Calculate values only once • On small arrays use array_walk • On large arrays use generators/iterators • Use isset instead of in_array if possible • Profile to find the place to rewrite for slow code issues

38. Profiling Options Free • Xdebug • Xhprof • Uprofiler Paid • NewRelic • AppDynamics • Blackfire • Tideways • Tracelytics

39. Distribute the load • Perfect for heavy processing for some type of data • Queue code that requires heavy processing but not immediate viewing • Design your UX so you can inform users of completed jobs • Cache complex work items

40. Pick your system • php-resque • Gearman • Beanstalkd • IronMQ • RabbitMQ • ZeroMQ • AmazonSQS • Just visit http://queues.io

41. Job queuing and 10K page pdfs

42. Keep your CPU happy • Offload processing • Use a queue

43. Velocity Speed + Direction

44. Networking 101 • IP – forwards packets of data based on a destination address • TCP – verifies the correct delivery of data from client to server with error and lost data correction • Network Sockets – subroutines that provide TCP/IP (and UDP and some other support) on most systems

45. Packet of Data

46. Speed in the series of tubes • Bandwidth – size of your pipe • Latency – length of your pipe including size changes • Jitter – air bubbles in your pipe

47. Network Socket Types • Stream • Connection oriented (tcp) • Datagram • Connectionless (udp) • Raw • Low level protocols

48. Definitions • Socket • Bidirectional network stream that speaks a protocol • Transport • Tells a network stream how to communicate • Wrapper • Tells a stream how to handle specific protocols and encodings

49. Using Sockets

50. What does this have to do with PHP? • APIs fail • APIs go byby • AWS goes down • Or loses network connection to a specific area • Or otherwise fails

51. What do you mean we can’t write files?

52. Prepare for failure • Handle timeouts • Handle failures • Abstract enough to replace systems if necessary, but only as much as necessary • If you’re not paying for it, don’t base your business model on it

53. Checklist • Cultivate good coding habits • Try not to loop logic or processing • Don’t be afraid to offload work to other systems or services • Assume every file is huge • Assume there are 1 million rows in your DB table • Assume that every network request is slow or going to fail • Profile to find code bottlenecks, DON’T assume you know the bottleneck • Wrap 3rd party tools enough to deal with downtime or retirement of apis

54. SHHHHHH • Plotting • https://github.com/phplang/streams2/wiki • PHP is always improving!

55. About Me  http://emsmith.net  auroraeosrose@gmail.com  twitter - @auroraeosrose  IRC – freenode – auroraeosrose  #phpmentoring  https://joind.in/talk/18dd4

Editor's Notes

No matter how many virtual machines you throw at a problem you always have the physical limitations of hardware. Memory, CPU, and even your NIC's throughput have finite limits. Are you trying to load that 5 GB csv into memory to process it? No really, you shouldn't! PHP has many built in features to deal with data in more efficient ways that pumping everything into an array or object. Using PHP stream and stream filtering mechanisms you can work with chunked data in an efficient matter, with sockets and processes you can farm out work efficiently and still keep track of what your application is doing. These features can help with memory, CPU, and other physical system limitations to help you scale without the giant AWS bill.
Our first physical law we’ll talk about is mass, how much matter is in a thing In physics, mass is a property of a physical body. It is the measure of an object's resistance to acceleration (a change in its state of motion) when a force is applied.[1] It also determines the strength of its mutual gravitational attraction to other bodies. Mass is not the same as weight, even though we often calculate an object's mass by measuring its weight with a spring scale, rather than comparing it directly with known masses. An object on the Moon would weigh less than it does on Earth because of the lower gravity, but it would still have the same mass. This is because weight is a force, while mass is the property that (along with gravity) determines the strength of this force.
NAND sacrifices the random-access and execute-in-place advantages of NOR. NAND is best suited to systems requiring high capacity data storage. It offers higher densities, larger capacities, and lower cost. It has faster erases, sequential writes, and sequential reads. HDDs• Enthusiast multimedia users and heavy downloaders: Video collectors need space, and you can only get to 4TB of space cheaply with hard drives.• Budget buyers: Ditto. Plenty of cheap space. SSDs are too expensive for $500 PC buyers.• Graphic arts and engineering professionals: Video and photo editors wear out storage by overuse. Replacing a 1TB hard drive will be cheaper than replacing a 500GB SSD.
The maximum areal storage density for flash memory used in SSDs is 2.8 Tbit/in2 in laboratory demonstrations as of 2016, and the maximum for HDDs is 1.5 Tbit/in2. The areal density of flash memory is doubling every two years, similar to Moore's law (40% per year) and faster than the 10–20% per year for HDDs. As of 2016, maximum capacity was 10 terabytes for an HDD,[10] and 15 terabytes for an SSD.[15] HDDs were used in 70% of the desktop and notebook computers produced in 2016, and SSDs were used in 30%. The usage share of HDDs is declining and could drop below 50% in 2018–2019 according to one forecast, because SSDs are replacing smaller-capacity (less than one-terabyte) HDDs in desktop and notebook computers and MP3 players.[154] Areal density is a measure of the quantity of information bits that can be stored on a given length of track, area of surface, or in a given volume of a computer storage medium. Generally, higher density is more desirable, for it allows greater volumes of data to be stored in the same physical space. Density therefore has a direct relationship to storage capacity of a given medium. Density also generally has a fairly direct effect on the performance within a particular medium, as well as price.
Story about me and my vm drive and the bad blocks go bad
Quick computer science lesson Originally done with magic numbers in fortran, C and unix standardized the way it worked On Unix and related systems based on the C programming language, a stream is a source or sink of data, usually individual bytes or characters. Streams are an abstraction used when reading or writing files, or communicating over network sockets. The standard streams are three streams made available to all programs. Who else uses them? Most languages descended from C have the “files as streams concept” and ways to extend the IO functionality beyond merely files, this allows them to be merged all together Great way to standardize the way data is grabbed and used Questions on who has used streams in other languages
Streams are a huge underlying component of PHP Streams were introduced with PHP 4.3.0 – they are old, but underuse means they can have rough edges… so TEST TEST TEST But they are more powerful then almost anything else you can use Why is this better ? Lots and lots of data in small chunks lets you do large volumes without maxing out memory and cpu
So this is a very common problem in PHP scripts, PHP bombing out because a file_get_contents call loaded something too big into memory Although file_get_contents is pretty great, doing it without a size check is deadly (well, unless you total control the file) Consider files user data just like that POST data you just got from a user
Let’s talk about how space on disk is important
Talk about a very early experiment writing a chat room for my personal website… using a text file that I concated and read Hey, this was 1998 and it was running php-nuke ;) But that rapidly changed to a gigabyte file when my friends tested it all night long 
Any good extension will use the underlying streams API to let you use any kind of stream for example, cairo does this stuff to work with PHP streams is spread across at least two portions of the manual, plus appendixes for the build in transports/filters/context options. It’s very poorly arranged so be sure to take the time to learn where to look in the manual – there should be three main places What doesn’t use streams? Chmod, touch and some other very file specific funtionality, lazy/bad extensions, extensions with issues in the libraries they wrap around
All input and output comes into PHP It gets pushed through a streams filter Then through the streams wrapper During this point the stream context is available for the filter and wrapper to use Streams themselves are the “objects” coming in Wrappers are the “classes” defining how to deal with the stream
Some notes – file_get_contents and it’s cousin stream_get_contents are your fastest most efficient way if you need the whole file File(blah) is going to be the best way to get the whole file split by lines Both are going to stick the whole file into memory at some point. For very large files and to help with memory consumption, the use of fgets and fread will help
You don’t even have to load all the data in to work on it with PHP! You can do everything on the fly in chunks That’s the magic of filtering
A filter is a final piece of code which may perform operations on data as it is being read from or written to a stream. Any number of filters may be stacked onto a stream. Custom filters can be defined in a PHP script using stream_filter_register() or in an extension using the API Reference in Working with streams. To access the list of currently registered filters, use stream_get_filters(). Stream data is read from resources (both local and remote) in chunks, with any unconsumed data kept in internal buffers. When a new filter is prepended to a stream, data in the internal buffers, which has already been processed through other filters will not be reprocessed through the new filter at that time. This differs from the behavior of stream_filter_append(). Filters are nice for manipulating data on the fly – but remember you’ll be getting data in chunks, so your filter needs to be smart enough to handle that
Filters can be appended or prepended – and attached to READ or WRITE Notice that stream_filter_prepend and append are smart – if you opened with the r flag, by default it’ll attach to read, if you opened with the w flag, it will attach to write Note: Stream data is read from resources (both local and remote) in chunks, with any unconsumed data kept in internal buffers. When a new filter is prepended to a stream, data in the internal buffers, which has already been processed through other filters will not be reprocessed through the new filter at that time. This differs from the behavior of stream_filter_append(). Note: When a filter is added for read and write, two instances of the filter are created. stream_filter_prepend() must be called twice with STREAM_FILTER_READ and STREAM_FILTER_WRITE to get both filter resources.
Well it may look like manipulating data in a variable is preferable to the above. But the above is just a simple example. Once you add a filter to a stream it basically hides all the implementation details from the user. You will be unaware of the data being manipulated in a stream. And also the same filter can be used with any stream (files, urls, various protocols etc.) without any changes to the underlying code. Also multiple filters can be chained together, so that the output of one can be the input of another. The filters need an input state and an output state. And they need torespect the the fact that number of requested bytes does not necessarilymean reading the same amount of data on the other end. In fact the outputside does generally not know whether less, the same amount or more input isto be read. But this can be dealt with inside the filter. However thefilters should return the number input vs the number of output filtersalways independently. Regarding states we would be interested if reachingEOD on the input state meant reaching EOD on the output side prior to therequested amount, at the requested amount or not at all yet (more dataavailable).
Throw away all your old assumptions and make a new one Trust no one with your file/stream manipulations Assume that file is a terabyte zip file of cat gifs Nope, not kidding
Dimension is a neat word because we overload it, like filter In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it.
The "memory wall" is the growing disparity of speed between CPU and memory outside the CPU chip. An important reason for this disparity is the limited communication bandwidth beyond chip boundaries, which is also referred to as bandwidth wall. From 1986 to 2000, CPU speed improved at an annual rate of 55% while memory speed only improved at 10%. Given these trends, it was expected that memory latency would become an overwhelming bottleneck in computer performance
Uploading items kept failing Realized the issue was the sheer amount of data being synced, because the system had waited all evening when wifi went out
A lot of people don’t realize that you as a developer are responsible for managing the amount of memory consumed by PHP No one wants to hear that but it’s true
PHP’s inherent characteristics are hiding this issue with memory
It’s very hard to duplicate very large scale issues in testing, which is often why this stuff isn’t caught until it’s time to deploy
SO, I had a system that worked locally on a client and then had a nightly upload The upload itself was working properly – saving the data and processing it appropriately But the RETURN was failing and screwing up the clients, because the return package was simply to large to send down Two changes were made to the system to allow syncing already saved entries, and we no longer passed back the changed data Frankly because it wasn’t important But also because we added windowing to the paging system
In PHP 5.x a whopping 144 bytes per element were required. In PHP 7 the value is down to 36 bytes, or 32 bytes for the packed case but it’s STILL not the best So think about 30K items in an array
Story about a cron job adding new items to our database, and no windowing functionality in sight
Nested loops are of the devil
Assume the data is always big Because if it’s not now, some day it will be Tests usually miss this, so just make a good habit
So the first thing you always want to think about in your PHP application is speed right? After all PHP is soooo slow, computers are so slow… in everyday use and in kinematics, the speed of an object is the magnitude of its velocity (the rate of change of its position); it is thus a scalar quantity. The fastest possible speed at which energy or information can travel, according to special relativity, is the speed of light in a vacuum c = 299,792,458 metres per second (approximately 1079000000 km/h or 671,000,000 mph). Matter cannot quite reach the speed of light, as this would require an infinite amount of energy. In relativity physics, the concept of rapidity replaces the classical idea of speed.
A microprocessor -- also known as a CPU or central processing unit -- is a complete computation engine that is fabricated on a single chip. Using its ALU (Arithmetic/Logic Unit), a microprocessor can perform mathematical operations like addition, subtraction, multiplication and division. Modern microprocessors contain complete floating point processors that can perform extremely sophisticated operations on large floating point numbers. A microprocessor can move data from one memory location to another. A microprocessor can make decisions and jump to a new set of instructions based on those decisions.
Transmission delays occur in the wires that connect things together on a chip. The "wires" on a chip are incredibly small aluminum or copper strips etched onto the silicon. A chip is nothing more than a collection of transistors and wires that hook them together, and a transistor is nothing but an on/off switch. When a switch changes its state from on to off or off to on, it has to either charge up or drain the wire that connects the transistor to the next transistor down the line. Imagine that a transistor is currently "on." The wire it is driving is filled with electrons. When the switch changes to "off," it has to drain off those electrons, and that takes time. The bigger the wire, the longer it takes. As the size of the wires has gotten smaller over the years, the time required to change states has gotten smaller, too. But there is some limit -- charging and draining the wires takes time. That limit imposes a speed limit on the chip. There is also a minimum amount of time that a transistor takes to flip states. Transistors are chained together in strings, so the transistor delays add up. On a complex chip like the G5, there are likely to be longer chains, and the length of the longest chain limits the maximum speed of the entire chip. Finally, there is heat. Every time the transistors in a gate change state, they leak a little electricity. This electricity creates heat. As transistor sizes shrink, the amount of wasted current (and therefore heat) has declined, but there is still heat being created. The faster a chip goes, the more heat it generates. Heat build-up puts another limit on speed. processor speeds, or overall processing power for computers will double every two years Overclocking and burning a chip to death story 
Talk about my experiment with overclocking my Athlon Was my first custom built computer, I thought I was cool beans because I figured out how to flash it, Athlons were the new shiny I overclocked the crap out of it And… it caught on fire (partially because I did the heat stuff too thin but also because I too much overclocked it) The smell of a burning processor is something I will not forget, and will never do again!
(don’t guess) if you need to improve speed beyond good habits
These are just a few “good habits” to cultivate when coding None of them SHOULD be new =, and these are often considered micro optimizations It’s not worth it to rewrite your code probably for these, but it IS worth it to cultivate them as a natural part of your coding style It just takes practice
I’m sure there are more!
There is nothing wrong with offloading work PHP scales VERY well horizontally, and often pretty cheaply horizontally as well Spin up a dedicated box for jobs If you have scaling in place, you can spin up two during heavy load times! often reports or generating files or images It’s not realistic to expect complex reports to be done in seconds, physics apply here too, good UX will mask your offloading is a good way to balance offloaded work with immediate results
I’m not going to go into a lot of detail here, because what you eventually pick for jobs/queuing is going to be specific to your needs Queues.io is actually a really nice resource with lots of different queue types for many different languages
Story about the render system, and how good choices here (queueing, triggering one job from another, etc) made even huge file generation just work
Velocity is a physical vector quantity; both magnitude and direction are needed to define it. The scalar absolute value (magnitude) of velocity is called "speed", being a coherent derived unit whose quantity is measured in the SI (metric) system as metres per second (m/s) or as the SI base unit of (m⋅s−1). For example, "5 metres per second" is a scalar, whereas "5 metres per second east" is a vector. Speed describes only how fast an object is moving, whereas velocity gives both how fast and in what direction the object is moving.
As with all other communications protocol, TCP/IP is composed of layers: IP - is responsible for moving packet of data from node to node. IP forwards each packet based on a four byte destination address (the IP number). The Internet authorities assign ranges of numbers to different organizations. The organizations assign groups of their numbers to departments. IP operates on gateway machines that move data from department to organization to region and then around the world. TCP - is responsible for verifying the correct delivery of data from client to server. Data can be lost in the intermediate network. TCP adds support to detect errors or lost data and to trigger retransmission until the data is correctly and completely received. Sockets - is a name given to the package of subroutines that provide access to TCP/IP on most systems.
So your application level is the basic data you want to send in most http applications this is your http page INLUDING the headers section the transport is how you’re sending it – UDP and TCP are the most popular the Internet layer is the “IP” layer – with the header telling the system what address (ip) to send the data to and what port to take to then you get a frame header and footer on the actual piece of data the packet being sent
This is a VERY simplified analogy, but for the basic idea – think of the internet as water flowing through pipes at a constant pressure (data is electricity so close to the speed of light) bigger and better pipes can handle more, you can get air bubbles in the pipes, and no matter what you did, if the pipe is longer it will take longer
There are different types of socket types you can use, a lot of people use tcp and HTTP because they’re a known procol
What is streamable behavorior? We’ll get to that in a bit Protocol: set of rules which is used by computers to communicate with each other across a network Resource: A resource is a special variable, holding a reference to an external resource Talk about resources in PHP and talk about general protocols, get a list from the audience of protocols they can name (yes http is a protocol) A socket is a special type of stream – pound this into their heads A socket is an endpoint of communication to which a name can be bound. A socket has a type and one associated process. Sockets were designed to implement the client-server model for interprocess communication where: In php , a wrapper ties the stream to the transport – so your http wrapper ties your PHP data to the http transport and tells it how to behave when reading and writing data
By default sockets are going to assume tcp – since that’s a pretty standard way of doing things. Notice that we have to do things the old fashioned way just for this simple http request – sticking our headers together, making sure stuff gets closed. However if you can’t use allow_url_fopen this is a way around it a dirty dirty way but – there you have it remember allow_url_fopen only stops “drive-by” hacking
Docker and s3 and how abstracting stuff out kept me sane Also how error handling
Your checklist for not running out of PHP memory when your code runs
There is SOOO much more you can do from hooking objects to hooking the engine!

Taming the resource tiger

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Taming the resource tiger

Similar to Taming the resource tiger (20)

More from Elizabeth Smith

More from Elizabeth Smith (20)

Recently uploaded

Recently uploaded (13)

Taming the resource tiger

Editor's Notes