To set up a Django virtual environment in Windows, create a virtualenv, activate it, install Django and pip, and optionally install database drivers like mysql-python or psycopg by using easy_install and providing the URL. Once complete, the base environment is ready for Django development.
This document provides instructions for getting started with Docker and Docker Compose. It explains how to install Docker and Docker Compose, basic Docker commands like running containers and viewing logs, mapping ports, and using Docker Compose to define and run multi-container applications.
uWSGI - Swiss army knife for your Python web appsTomislav Raseta
uWSGI is a full-stack tool for building hosting services that acts as an application server for Python web apps using the WSGI specification. It provides a pluggable architecture, versatility, high performance, low resource usage, and reliability. Configuration options are extensive and allow for processes, threads, reloading, monitoring and more. Proper configuration and testing is required to optimize performance for production deployments.
CIbox - OpenSource solution for making your #devops betterAndrii Podanenko
This document describes an old and new development workflow for code reviews and continuous integration. The old workflow involved directly committing code to a shared master branch and deploying to a development server, while the new workflow uses feature branches, pull requests, and local virtual environments for development. It also introduces CIBox, an open source project that provides tools and automation to implement the new workflow, including provisioning a CI server and setting up initial project files.
JS Lab2017, 25 марта, Одесса
Андрей Кучеренко (Lead Software Engineer at EPAM Systems)
Разработка мультипакетных приложения: причины, способы, риски
Доклад раскроет тему разработки многопакетных приложений на javascript. Доклад сделан на основе реального опыта внедрения подобной разработки на проекте.
Все материалы: http://jslab.in.ua/
Организаторы: http://geekslab.org.ua/
Continuous integration (CI) involves developers integrating code into a shared repository multiple times per day, with each check-in being verified by automated builds to detect problems early. Continuous delivery (CD) means software can be released to production at any time. Continuous deployment means every change automatically gets deployed to production, resulting in many releases per day. Apiary uses continuous integration engines (CIEs) that support job definitions in repositories, autoscaling, pipelines, caching, Docker, matrix builds, and distributing jobs across multiple nodes to achieve CI and CD.
The document summarizes the steps taken to install and configure NGINX, PostgreSQL, Python, and uWSGI on a CentOS 7 server. Key steps include:
1) Installing NGINX using yum and enabling it to start automatically at boot;
2) Installing and initializing PostgreSQL, configuring it to allow local connections, and creating a database user;
3) Installing Python 3.7 using pyenv and setting it as the global version;
4) Installing uWSGI and using it to run a simple Python application served over HTTP on port 9090.
[Js hcm] Deploying node.js with Forever.js and nginxNicolas Embleton
You have a project in Node.js and you wonder how to make it run on your server? A real "production" server?
You have some application and you want to ensure that it will run - without downtime, that you can easily update - without downtime, and that you can scale it over multiple webservices as a load Balancing?
We will cover that by using:
- Nginx
- Forever.js
- Node.js
The document discusses Python virtual environments (virtualenv) and the pip package manager. It introduces virtualenv and pip, explains why they are useful tools for isolating Python environments and managing packages, and provides exercises for creating virtual environments, using pip to install/uninstall packages, creating your own pip packages, and sharing packages on PyPI. The goal is to help users understand and learn to use these tools in 90 minutes.
This document outlines the steps to configure multiple PostgreSQL 9.3 instances on the same server. It describes creating copies of the init files for each instance, configuring the data directories and ports, initializing the data directories, starting the services, and verifying the multiple instances are running on different ports.
The document provides step-by-step instructions for installing Postgresql server 9.1 on CentOS 6.5, including: updating repositories, creating a sudo user group, installing Postgresql, starting the server, editing configuration files, connecting locally, configuring the firewall to allow remote connections, and testing connections from other computers.
EuroPython 2014 - How we switched our 800+ projects from Apache to uWSGIMax Tepkeev
During the last 7 years the company I am working for developed more than 800 projects in PHP and Python. All this time we were using Apache+nginx for hosting this projects. In this talk I will explain why we decided to switch all our projects from Apache+nginx to uWSGI+nginx and how we did that.
This document provides an overview of Ansible, an IT automation tool. It discusses Ansible's features such as being agentless, using SSH, and being idempotent. It also covers installing Ansible, using Ansible modules, writing playbooks in YAML format, managing inventory, and using ad-hoc commands and roles for automation.
Slides for the Docker for Java Developers workshop at JavaLand in 2017. It covers building and running containers. It also covers running GUI applications in Docker and using the Docker registry.
Ondřej Procházka - Deployment podle Devel.czDevelcz
This document describes a deployment pipeline for software projects. It lists several project names and their owners. It outlines the steps in the pipeline including using git for version control, installing dependencies with tools like NPM and Composer, building code with tools like Grunt and Gulp, running tests, tagging releases, and deploying code to servers for hosting. The final stages involve deploying the code to production servers, configuring reverse proxies and caching for performance.
This document summarizes a presentation about the Conan package manager for C and C++ applications. It discusses popular C++ libraries that are commonly needed like Boost and Poco. It shows how Conan can be used to add these libraries as dependencies to a project without needing to build them manually. The document provides an example of using Conan to add Boost and Poco to a demo application that calculates an MD5 hash and validates an email address. It also gives an overview of how Conan works, including its package naming scheme, local caching of packages, and community around sharing packages on Bintray.
The Secrets of The FullStack Ninja - Part A - Session IOded Sagir
The document discusses setting up a web development environment. It will cover tools like Git, Node, NPM, Grunt, Bower and how to use them to setup a fullstack development environment for building single page applications. An agenda is provided that will go over these tools in detail over the course of a workshop, providing exercises to help attendees work with each tool hands-on.
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...VMware Tanzu
Pivotal HAWQ, one of the world’s most advanced enterprise SQL on Hadoop technology, coupled with the Hortonworks Data Platform, the only 100% open source Apache Hadoop data platform, can turbocharge your analytic efforts. The slides from this technical webinar present a deep dive on this powerful modern data architecture for analytics and data science.
Learn more here: http://pivotal.io/big-data/pivotal-hawq
This document discusses how Hortonworks Data Platform (HDP) can enable enterprises to build a modern data architecture centered around Hadoop. It describes how HDP provides a centralized platform for managing all types of data at scale using technologies like YARN. Case studies are presented showing how companies have used HDP to optimize costs, develop new analytics applications, and work towards creating a unified "data lake". The document outlines the key components of HDP including its support for any application, any data, and deployment anywhere. It also highlights how partners extend HDP's capabilities and how Hortonworks provides enterprise-grade support.
I gave this talk on the Highload++ conference 2015 in Moscow. Slides have been translated into English. They cover the Apache HAWQ components, its architecture, query processing logic, and also competitive information
1. The document discusses Project Geode, an open source distributed in-memory database for big data applications. It provides scale-out performance, consistent operations across nodes, high availability, powerful developer features, and easy administration of distributed nodes.
2. The document outlines Geode's architecture and roadmap. It also discusses why the project is being open sourced under Apache and describes some key use cases and customers of Geode.
3. The presentation includes a demo of Geode's capabilities including partitioning, queries, indexing, colocation, and transactions.
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
Come to this deep dive on how Pivotal's Data Lake Vision is evolving by embracing next generation in-memory data exchange and compute technologies around Spark and Tachyon. Did we say Hadoop, SQL, and what's the shortest path to get from past to future state? The next generation of data lake technology will leverage the availability of in-memory processing, with an architecture that supports multiple data analytics workloads within a single environment: SQL, R, Spark, batch and transactional.
Here are the slides for Greenplum Chat #8. You can view the replay here: https://www.youtube.com/watch?v=FKFiyJDgdQk
The increased frequency and sophistication of high-profile data breaches and malicious hacking is putting organizations at continued risk of data theft and significant business disruption. Complicating this scenario is the unbounded growth of Big Data and petabyte-scale data storage, new open source database and distribution schemes, and the continued adoption of cloud services by enterprises.
Pivotal Greenplum customers often look for additional encryption of data-at-rest and data-in-motion. The massively parallel processing (MPP) architecture of Pivotal Greenplum provides an architecture that is unlike traditional OLAP on RDBMS for data warehousing, and encryption capabilities must address the scale-out architecture.
The Zettaset Big Data Encryption Suite has been designed for optimal performance and scalability in distributed Big Data systems like Greenplum Database and Apache HAWQ.
Here is a replay of our recent Greenplum Chat with Zettaset:
00:59 What is Greenplum’s approach for encryption and why Zettaset?
02:17 Results of field testing Zettaset with Greenplum
03:50 Introduction to Zettaset, the security company
05:36 Overview of Zettaset and their solutions
14:51 Different layers for encrypting data at rest
16:50 Encryption key management for big data
20:51 Zettaset BD Encrypt for data at rest and data in motion
22:19 How to mitigate encryption overhead with an MPP scale-out system
24:12 How to deploy BD Encrypt
25:50 Deep dive on data at rest encryption
30:44 Deep dive on data in motion encryption
36:72 Q: How does Zettaset deal with encrypting Greenplums multiple interfaces?
38:08 Q: Can I encrypt data for a particular column?
40:26 How Zettaset fits into a security strategy
41:21 Q: What is the performance impact on queries by encrypting the entire database?
43:28 How Zettaset helps Greenplum meet IT compliance requirements
45:12 Q: How authentication for keys is obtained
48:50 Q: How can Greenplum users try out Zettaset?
50:53 Q: What is a ‘Zettaset Security Coach’?
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...Cloudera, Inc.
The document discusses migrating KT's CDR analysis system from a relational database to NexR's Hadoop-based Data Analytics Platform (NDAP). NDAP provides tools to help with the migration, including converting Oracle data and SQL queries to the Hive query language. The conversion process involves mapping data types, functions, and SQL syntax between Oracle and Hive. NDAP also includes performance monitoring and query optimization tools to help enterprise data engineers adapt to the new system.
This certificate of appreciation was presented to Shivram Mani from the Apache Software Foundation for serving as a mentor during Google Summer of Code 2016 from April 22 to August 23, 2016. Jason Titus, VP of Engineering, recognized Shivram Mani's contributions as a mentor during the summer program.
PXF is a unified access framework that provides a uniform SQL interface for heterogeneous data sources on HDFS. It exploits parallelism to efficiently access data across various storage formats and data sources. PXF uses a pluggable architecture with built-in connectors that allow it to access data in HDFS files, Hive tables, HBase tables, and other data sources. It provides a common developer view and allows writing queries against external data using various profile definitions and plugins.
This document summarizes a presentation about managing Apache HAWQ, an open source massively parallel processing (MPP) database, using Apache Ambari. It discusses how Ambari integrates with HAWQ for installation, configuration, topology recommendations, high availability, alerts and more. Challenges in the integration are addressed as HAWQ is not part of the Hortonworks Data Platform stack. The presentation recommends future work for Ambari like supporting automated HAWQ upgrades and enabling dynamic configuration reloads without requiring a service restart.
1. HCatalog is a table and storage management layer for Hadoop that provides a relational view of data in HDFS and abstracts data formats and locations from users.
2. Previously, HAWQ accessed Hive tables through PXF using external tables, but this required specifying the schema, location, and format which was error prone and wouldn't detect metadata changes.
3. The new integration retrieves metadata from HCatalog and parses it into in-memory catalog tables to provide dynamic access to Hive tables from HAWQ without needing to specify schemas.
Zeppelin Interpreters
PSQL (to became JDBC in 0.6.x)
Geode
SpringXD
Apache Ambari
Zeppelin Service
Geode, HAWQ and Spring XD services
Webpage Embedder View
HAWQ is an in-memory, distributed SQL query engine that runs as a Hadoop service. It provides two-way integration with HDFS, Hive, and HBase. HAWQ supports SQL transactions through commands like BEGIN, COMMIT, and ROLLBACK. External tables in HAWQ can be used to query data stored in HDFS files, Hive tables, and HBase tables.
HAWQ: a massively parallel processing SQL engine in hadoopBigData Research
HAWQ, developed at Pivotal, is a massively parallel processing SQL engine sitting on top of HDFS. As a hybrid of MPP database and Hadoop, it inherits the merits from both parties. It adopts a layered architecture and relies on the distributed file system for data replication and fault tolerance. In addition, it is standard SQL compliant, and unlike other SQL engines on Hadoop, it is fully transactional. This paper presents the novel design of HAWQ, including query processing, the scalable software interconnect based on UDP protocol, transaction management, fault tolerance, read optimized storage, the extensible framework for supporting various popular Hadoop based data stores and formats, and various optimization choices we considered to enhance the query performance. The extensive performance study shows that HAWQ is about 40x faster than Stinger, which is reported 35x-45x faster than the original Hive.
Pivotal is a trusted partner for IT innovation and transformation. From the technology, to the people, to the way people interact with technology, Pivotal is transforming how the world builds software.
At Strata NYC 2015, Pivotal, announced it will Supercharge the Hadoop Ecosystem by contributing the HAWQ advanced SQL on Hadoop analytics and MADlib machine learning technologies to The Apache Software Foundation.
The document discusses the new features in Pivotal HD 1.1, including improved high availability for HAWQ and Namenode, new UDF and diagnostic tools for HAWQ, upgraded Apache Hadoop components to version 2.0.5 and 2.0.6, improved Hive, HBase, and Oozie, Kerberos support for security, and new tools like the Unified Storage Service, Data Loader, and Command Center for easier administration.
This document provides an overview of modern big data analytics tools. It begins with background on the author and a brief history of Hadoop. It then discusses the growth of the Hadoop ecosystem from early projects like HDFS and MapReduce to a large number of Apache projects and commercial tools. It provides examples of companies and organizations using Hadoop. It also outlines concepts like SQL on Hadoop, in-database analytics using MADLib, and the evolution of Hadoop beyond MapReduce with the introduction of YARN. Finally, it discusses new frameworks being built on top of YARN for interactive, streaming, graph and other types of processing.
This document provides instructions for installing OpenStack with Calico network integration using Chef. It describes using Chef to install a single control node and at least two compute nodes connected in a BGP mesh. The instructions require at least four Ubuntu 14.04 servers, one for the control node, two for compute nodes, and one as the Chef bootstrap server. The document outlines preparing the OpenStack nodes, setting up Chef, bootstrapping the nodes with Chef roles, and running chef-client on each node to configure the BGP mesh.
This document summarizes how to install Cloud Foundry jobs using Nise BOSH, a lightweight BOSH emulator. It discusses:
- What cf-release and BOSH are and how Nise BOSH works
- Installing the dea_next job through 5 steps: initializing Nise BOSH, getting cf-release and deploy files, installing packages, configuring jobs, and starting processes
- How DEA/NG and CCv1 are compatible through reverting buildpack support in DEA/NG
Ansible is the simplest way to automate. SymfonyCafe, 2015Alex S
Ansible is a radically simple IT automation engine that is clear, fast, complete, efficient, and secure. It can be used for configuration management and infrastructure orchestration, deployments and builds, and provisioning for Vagrant. Ansible uses YAML files and templates to define automation tasks and plays. It provides advantages over shell scripts such as organization, reusability, and parallelization.
This document provides instructions for installing a single-node Hadoop cluster on Ubuntu. It outlines downloading and configuring Java, installing Hadoop, configuring SSH access to localhost, editing Hadoop configuration files, and formatting the HDFS filesystem via the namenode. Key steps include adding a dedicated Hadoop user, generating SSH keys, setting properties in core-site.xml, hdfs-site.xml and mapred-site.xml, and running 'hadoop namenode -format' to initialize the filesystem.
Devops Boise - Israel Shirk - Pragmatic Migration to Infrastructure As CodeIsrael Shirk
The original powerpoint and repos are available here: https://drive.google.com/open?id=0B-25O6PIpLzCNXBOakNEdVdGM2c
I'll be following this up with a series of blog posts in the next few weeks as I have time! If you'd like me to notify you as they come out, please e-mail me at israel at zerrtech.com. Thanks!
The Docker "Gauntlet" - Introduction, Ecosystem, Deployment, OrchestrationErica Windisch
This document summarizes Docker's growth over 15 months, including its community size, downloads, projects on GitHub, enterprise support offerings, and the Docker platform which includes the Docker Engine, Docker Hub, and partnerships. It also provides overviews of key Docker technologies like libcontainer, libchan, libswarm, and how images work in Docker.
This article will help you fetch details about the Ubuntu based AWS EC2 instance. You need to deploy the Python (2.7) based REST Services in Apache webserver. The core of application is Python DJango framework, which uses a custom virtual environment (vitualenv). The Apache uses mod_wsgi for connecting the WSGI application and mod_sec for security purposes.
Deploying Django with Apache and mod_wsgi is a method to get Django into production. mod_wsgi is an Apache module which is supposed to host any Python WSGI application, which includes Django. Django can work with any version of Apache that supports mod_wsgi.
Read the article further, to understand the step-by-step deployment process.
Ubuntu Server is lean, fast and powerful. Its services are reliable, predictable and economical. It is the perfect base on which you can build your instances. Django is a web framework which is written in Python. One can easily guess that everything, in Django, is also done in Python. Django was developed to simplify the creation of database driven sites. The best feature in Django is that it, probably, is the fastest website framework to create a fully functioning website.
This document discusses automated web acceptance testing using Behat and Mink. It provides an overview of Behat, a behavior-driven development framework for PHP, and Mink, a web acceptance testing framework. It then covers setting up a Behat project with Mink, writing feature files, implementing step definitions, running tests locally and on Sauce Labs. It also discusses using Relish for living documentation and integrating tests with Jenkins.
This document discusses software quality assurance tooling, focusing on pre-commit. It introduces pre-commit as a tool for running code quality checks before code is committed. Pre-commit allows configuring hooks that run checks and fixers on files matching certain patterns. Hooks can be installed from repositories and support many languages including Python. The document provides examples of pre-commit checks such as disallowing improper capitalization in code comments and files. It also discusses how to configure, run, update and install pre-commit hooks.
Chef is an open-source automation platform that treats infrastructure as code. It allows users to automate how infrastructure is configured, deployed and managed across any environment using a powerful DSL written in Ruby. Key features of Chef include server provisioning, automation of infrastructure changes, and management of configurations through recipes and cookbooks which are shared through an online community. Linecook is presented as an alternative to Chef for server automation that uses shell scripts instead of Ruby code and relies on established tools like SSH, VirtualBox, and bash instead of requiring installation of the Chef platform.
Usage Note of Apache Thrift for C++ Java PHP LanguagesWilliam Lee
Thrift is used to define interfaces and generate code to build RPC clients and servers. The document discusses installing tools and libraries needed for Thrift including GCC, Boost, Java, Ant, Autoconf and others. It then covers generating code for C++, Java and PHP from a Thrift IDL file and running a sample Thrift server and clients in C++ and Java.
Continuous Testing and New Tools for Automation - Presentation from StarWest ...Sauce Labs
Learn how you can create a full continuous integration solution entirely in the cloud using GitHub, Selenium, Sauce Labs, and Travis CI. Michael will show you how to take advantage of these hosted development resources to improve the velocity of your releases and provide the application quality your users demand. He will demonstrate how Sauce Labs can securely execute your Selenium tests in parallel and dramatically reduce the time required to run your critical integration and acceptance tests—so you can finally realize the promise of continuous delivery. www.saucelabs.com/signup/trial
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)Fabrice Bernhard
This is the presentation given at the Symfony Live 2011 conference. It is an introduction to the new agile movement spreading in the technical operations community called DevOps and how to adopt it on web development projects, in particular Symfony projects.
Plan of the slides :
- Configuration Management
- Development VM
- Scripted deployment
- Continuous deployment
Tools presented in the slides:
- Puppet
- Vagrant
- Fabric
- Jenkins / Hudson
Preparation study for Docker Event
Mulodo Open Study Group (MOSG) @Ho chi minh, Vietnam
http://www.meetup.com/Open-Study-Group-Saigon/events/229781420/
WhiteHedge is New Jersey, US based company that is Docker certified consulting and training partner. WhiteHedge also has partnered with Chef and contributes to Chef OS.
Docker containers and Chef are very popular tools. Learn how to use Chef and Docker together for effective DevOps.
This document discusses using AutomatedLab, an open-source lab automation framework, to help set up complex testing environments for continuous integration and continuous delivery (CI/CD) pipelines. AutomatedLab allows infrastructure to be defined as code and deployed idempotently to various environments like Hyper-V and Azure. It can deploy many common roles including Active Directory, SQL Server, and web servers. The document demonstrates how to define a simple test environment in AutomatedLab and integrate it into CI/CD pipelines to enable automated validation testing. AutomatedLab is well-suited for running on both Linux and Windows build workers and supports various deployment modes for testing purposes.
The document discusses the modern developer toolbox and outlines various tools that developers can use for development environments, testing, debugging, profiling, deployment, logging, and monitoring of applications. It provides recommendations for setting up development environments on different operating systems and with tools like Vagrant, Docker, Ansible, and Homebrew. It also discusses PHP installation and editors/IDEs to use. Testing with PHPUnit, Behat, and Jenkins is covered as well as debugging with XDebug, profiling with XHProf, and deployment with Ansible, Capistrano and other options. Logging with Monolog, Logstash and Kibana is also summarized along with monitoring metrics with StatsD, Graphite and Grafana.
How to use the WAN Gateway feature of Apache Geode to implement multi-site and active-active failover, disaster recovery, and global scale applications.
#GeodeSummit: Easy Ways to Become a Contributor to Apache GeodePivotalOpenSourceHub
The document provides steps for becoming a contributor to the Apache Geode project, beginning with joining online conversations about the project, then test-driving it by building and running examples, and finally improving the project by reporting findings, fixing bugs, or adding new features through submitting code. The key steps are to join mailing lists or chat forums to participate in discussions, quickly get started with the project by building and testing examples in 5 minutes, and then test release candidates and report any issues found on the project's issue tracker or documentation pages. Contributions to the codebase are also welcomed by forking the GitHub repository and submitting pull requests with bug fixes or new features.
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"PivotalOpenSourceHub
Keynote at Geode Summit 2016 by Dr. Justin Erenkrantz, Bloolmberg LP. Creating the Future of Big Data Through "The Apache Way" and why this matters to the community
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...PivotalOpenSourceHub
This document discusses combining stream processing and in-memory data grids for near-real-time aggregation and notifications. It describes storing immutable event data and filtering and aggregating events in real-time based on requested perspectives. Perspectives can be requested at any time for historical or real-time event data. The solution aims to be scalable, resilient, and low latency using Apache Storm for stream processing, Apache Geode for the event log and storage, and deployment patterns to collocate them for better performance.
In this session we review the design of the newly released off heap storage feature in Apache Geode, and discuss use cases and potential direction for additional capabilities of this feature.
This document discusses implementing a Redis adaptor using Apache Geode. It provides an overview of Redis data structures and commands, describes how Geode partitioned regions and indexes can be used to store and access Redis data, outlines advantages like scalability and high availability, and presents a roadmap for further development including supporting additional commands and performance optimization.
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & GeodePivotalOpenSourceHub
In this session we review the design of the current state of support for Apache Geode by Spring Cloud Data Flow, and explore additional use cases and future direction that Spring Cloud Data Flow and Apache Geode might evolve.
In this session we review the design of the current capabilities of the Spring Data GemFire API that supports Geode, and explore additional use cases and future direction that the Spring API and underlying Geode support might evolve.
#GeodeSummit - Modern manufacturing powered by Spring XD and GeodePivotalOpenSourceHub
This document summarizes a presentation about how TEKsystems Global Services helps modern manufacturing industries address challenges through big data solutions. It outlines TEKsystems' services and capabilities, as well as real-world applications for manufacturing, financial services, and life sciences. The presentation describes reference architectures and customer success stories in marine seismic data and gaming industries. It positions TEKsystems as having expertise, proven track records, and packaged offerings to provide big data solutions from pilot to production.
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...PivotalOpenSourceHub
One of the largest retailers in North America are considering Apache Geode for their new mobile loyalty application, to support their digital transformation effort. They would use Geode to provide operational data services for their mobile cloud service. This retailer needs to replace sluggish response times with sub-second response which will improved conversion rates. They also want to able to close the loop between data science findings and app experience. This way the right customer interaction is suggested when it is needed such as when customers are looking at their mobile app while walking in the store, or sending notifications at the individuals most likely shopping times. The final benefits of using Geode will include faster development cycles, increased customer loyalty, and higher revenue.
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...PivotalOpenSourceHub
In this session we explore a case study of a large-scale government fraud detection program that prevents billions of dollars in fraudulent payments each year leveraging the beta release of the GemFire+Greenplum Connector, which is planned for release in GemFire 9. Topics will include an overview of the system architecture and a review of the new GemFire+Greenplum Connector features that simplify use cases requiring a blend of massively parallel database capabilities and accelerated in-memory data processing.
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)PivotalOpenSourceHub
Today, if events change the decision model, we wait until the next batch model build for new insights. By extending fast “time-to-decisions” into the world of Big Data Analytics to get fast “time-to-insights”, apps will get what used to be batch insights in near real time. The technology enabling this includes smart in-memory data storage, new storage class memory, and products designed to do one or more parts of an analysis pipeline very well. In this talk we describe how Ampool is building on Apache Geode to allow Big Data analysis solutions to work together with a scalable smart storage class memory layer to allow fast and complex end-to-end pipelines to be built -- closing the loop and providing dramatically lower time to critical insights.
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...PivotalOpenSourceHub
This talk introduces an open-source solution that integrates cloud native apps running on Cloud Foundry with an open-source hybrid transactions + analytics real-time solution. The architecture is based on the fastest scalable, highly available and fully consistent In-Memory Data Grid (Apache Geode / GemFire), natively integrated to the first open-source massive parallel data warehouse (Greenplum Database) in a hybrid transactional and analytical architecture that is extremely fast, horizontally scalable, highly resilient and open source. This session also features a live demo running on Cloud Foundry, showing a real case of real-time closed-loop analytics and machine learning using the featured solution.
Apache Apex and Apache Geode are two of the most promising incubating open source projects. Combined, they promise to fill gaps of existing big data analytics platforms. Apache Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream and batch processing. Apex is highly scalable, performant, fault tolerant, and strong in operability. Apache Geode provides a database-like consistency model, reliable transaction processing and a shared-nothing architecture to maintain very low latency performance with high concurrency processing. We will also look at some use cases where how these two projects can be used together to form distributed, fault tolerant, reliable in memory data processing layer.
#GeodeSummit - Where Does Geode Fit in Modern System ArchitecturesPivotalOpenSourceHub
The document discusses how Apache Geode fits into modern system architectures using the Command Query Responsibility Segregation (CQRS) pattern. CQRS separates reads and writes so that each can be optimized independently. Geode is well-suited as the read store in a CQRS system due to its ability to efficiently handle queries and cache data through regions. The document provides references on CQRS and related patterns to help understand how they can be applied with Geode.
How Southwest Airlines Uses Geode
Distributed systems and fast data require new software patterns and implementation skills. Learn how Southwest Airlines uses Apache Geode, organizes team responsibilities, and approaches design tradeoffs. Drawing inspiration from real whiteboard conversations, we’ll explore: common development pitfalls, environment capacity planning, streaming data patterns like consumer checkpointing, support roles, and production lessons learned.
Every day, Apache Geode improves how Southwest Airlines schedules nearly 4,000 flights and serves over 500,000 passengers. It’s an essential component of Southwest’s ability to reduce flight delays and support future growth.
#GeodeSummit - Wall St. Derivative Risk Solutions Using GeodePivotalOpenSourceHub
In this talk, Andre Langevin discusses how Geode forms the core of many Wall Street derivative risk solutions. By externalizing risk from trading systems, Geode-based solutions provide cross-product risk management at speeds suitable for automated hedging, while simultaneously eliminating the back office costs associated with traditional trading system based solutions.
Building Apps with Distributed In-Memory Computing Using Apache GeodePivotalOpenSourceHub
Slides from the Meetup Monday March 7, 2016 just before the beginning of #GeodeSummit, where we cover an introduction of the technology and community that is Apache Geode, the in-memory data grid.
GPORCA is newly open source advanced query optimizer that is a subproject of Greenplum Database open source project. GPORCA is the query optimizer used in commercial distributions of both Greenplum and HAWQ. In these distributions GPORCA has achieved 1000x performance improvement across TPC-DS queries by focusing on three distinct areas: Dynamic Partition Elimination, SubQuery Unnesting, and Common Table Expression.
Now that GPORCA is open source, we are looking for collaborators to help us realize the ultimate dream for GPORCA - to work with any database.
The new breed of data management systems in Big Data have to process so much data that optimization mistakes are magnified in traditional optimizers. Furthermore, coding and manual optimization of complex queries has proven to be hard.
In this session, Venkatesh will discuss:
- Overview of GPORCA
- How to add GPORCA to HAWQ with a build option
- How GPORCA could be made to work with any database
- Future vision for GPORCA and more immediate plans
- How to work with GPORCA, and how to contribute to GPORCA
How AI is Revolutionizing Data Collection.pdfPromptCloud
Artificial Intelligence (AI) is transforming the landscape of data collection, making it more efficient, accurate, and insightful than ever before. With AI, businesses can automate the extraction of vast amounts of data from diverse sources, analyze patterns in real-time, and gain deeper insights with minimal human intervention. This revolution in data collection enables companies to make faster, data-driven decisions, enhance their competitive edge, and unlock new opportunities for growth.
AI-powered tools can handle complex and dynamic web content, adapt to changes in website structures, and even understand the context of data through natural language processing. This means that data collection is not only faster but also more precise, reducing the time and effort required for manual data extraction. Furthermore, AI can process unstructured data, such as social media posts and customer reviews, providing valuable insights into customer sentiment and market trends.
Embrace the future of data collection with AI and stay ahead of the curve. Learn more about how PromptCloud’s AI-driven web scraping solutions can transform your data strategy. https://www.promptcloud.com/contact/
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion dataSamuel Jackson
We present our work to improve data accessibility and performance for data-intensive tasks within the fusion research community. Our primary goal is to develop services that facilitate efficient access for data-intensive applications while ensuring compliance with FAIR principles [1], as well as adoption of interoperable tools, methods and standards.
The major outcome of our work is the successful creation and deployment of a data service for the MAST (Mega Ampere Spherical Tokamak) experiment [2], leading to substantial enhancements in data discoverability, accessibility, and overall data retrieval performance, particularly in scenarios involving large-scale data access. Our work follows the principles of Analysis-Ready, Cloud Optimised (ARCO) data [3] by using cloud optimised data formats for fusion data.
Our system consists of a query-able metadata catalogue, complemented with an object storage system for publicly serving data from the MAST experiment. We will show how our solution integrates with the Pandata stack [4] to enable data analysis and processing at scales that would have previously been intractable, paving the way for data-intensive workflows running routinely with minimal pre-processing on the part of the researcher. By using a cloud-optimised file format such as zarr [5] we can enable interactive data analysis and visualisation while avoiding large data transfers. Our solution integrates with common python data analysis libraries for large, complex scientific data such as xarray [6] for complex data structures and dask [7] for parallel computation and lazily working with larger that memory datasets.
The incorporation of these technologies is vital for advancing simulation, design, and enabling emerging technologies like machine learning and foundation models, all of which rely on efficient access to extensive repositories of high-quality data. Relying on the FAIR guiding principles for data stewardship not only enhances data findability, accessibility, and reusability, but also fosters international cooperation on the interoperability of data and tools, driving fusion research into new realms and ensuring its relevance in an era characterised by advanced technologies in data science.
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016) https://doi.org/10.1038/sdata.2016.18
[2] M Cox, The Mega Amp Spherical Tokamak, Fusion Engineering and Design, Volume 46, Issues 2–4, 1999, Pages 397-404, ISSN 0920-3796, https://doi.org/10.1016/S0920-3796(99)00031-9
[3] Stern, Charles, et al. "Pangeo forge: crowdsourcing analysis-ready, cloud optimized data production." Frontiers in Climate 3 (2022): 782909.
[4] Bednar, James A., and Martin Durant. "The Pandata Scalable Open-Source Analysis Stack." (2023).
[5] Alistair Miles (2024) ‘zarr-developers/zarr-python: v2.17.1’. Zenodo. doi: 10.5281/zenodo.10790679
[6] Hoyer, S. & Hamman, J., (20
Overview of Statistical software such as ODK, surveyCTO,and CSPro
2. Software installation(for computer, and tablet or mobile devices)
3. Create a data entry application
4. Create the data dictionary
5. Create the data entry forms
6. Enter data
7. Add Edits to the Data Entry Application
8. CAPI questions and texts
Annex K RBF's The World Game pdf documentSteven McGee
Signals & Telemetry Annex K for RBF's The World Game / Trade Federations / USPTO 13/573,002 Heart Beacon Cycle Time - Space Time Chain meters, metrics, standards. Adaptive Procedural template framework structured data derived from DoD / NATO's system of systems engineering tech framework
Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...weiwchu
We recently discovered that models trained with large-scale speech datasets sourced from the web could achieve superior accuracy and potentially lower cost than traditionally human-labeled or simulated speech datasets. We developed a customizable AI-driven data labeling system. It infers word-level transcriptions with confidence scores, enabling supervised ASR training. It also robustly generates phone-level timestamps even in the presence of transcription or recognition errors, facilitating the training of TTS models. Moreover, It automatically assigns labels such as scenario, accent, language, and topic tags to the data, enabling the selection of task-specific data for training a model tailored to that particular task. We assessed the effectiveness of the datasets by fine-tuning open-source large speech models such as Whisper and SeamlessM4T and analyzing the resulting metrics. In addition to openly-available data, our data handling system can also be tailored to provide reliable labels for proprietary data from certain vertical domains. This customization enables supervised training of domain-specific models without the need for human labelers, eliminating data breach risks and significantly reducing data labeling cost.
2. About Me
Apache HAWQ committer
linkedin: https://cn.linkedin.com/in/wangzw
github: https://github.com/wangzw
page: http://www.wangzw.org
3. Outline
Requirement to build hawq
Setup build & test environment
Build and Test
Do everything with Docker
Demo & QA
4. Requirement
Build
OS: Centos 7
Depend packages (libraries and headers)
Java Development Kits (7+)
Python development package (optional, 2.7)
GCC and other build utilities
Apache Maven (3.0)
Test
OS: Centos 7
Depend packages (libraries)
Java Runtime Environment(7+)
Python 2.7
GCC and other build utilities
Apache HDFS
5. Setup Build Env On CentOS 7
We can install everything with Yum.
Extra yum repositories are required:
epel for libgsasl ...
bintray-wangzw-rpm for libhdfs3
JAVA_HOME should be set correctly
Find script on github:
https://gist.github.com/wangzw/26accf185caa081ae069
6. Setup Test Env On CentOS 7
Non-root user is required
SSH connection should work without password
A working Apache HDFS cluster with at least 3 datanodes
HDFS should support “append” and “truncate” features
7. Build Apache HAWQ
Get source code
git clone https://github.com/apache/incubator-hawq.git
/path/hawq_src
Build and install libyarn
mkdir -p /path/hawq_src/depends/libyarn/build
cd /path/hawq_src/depends/libyarn/build && ../bootstrap --prefix=/usr/
make && sudo make install && ldconfig
8. Build Apache HAWQ (cont.)
Build and install Apache HAWQ
cd /path/hawq_src
./configure --prefix=/path/to/install
make && make install
9. Test Apache HAWQ
Initialize an Apache HAWQ single node cluster
modify /path/to/install/etc/hawq-site.xml with HDFS information
source /path/to/install/greenplum_path.sh
hawq init cluster
10. Test Apache HAWQ (cont.)
Run install check with schedule named GOOD
cd /path/hawq_src
source /path/to/install/greenplum_path.sh
make installcheck-good
11. Build & Test with Docker
Following the instruction to setup build & test env with docker
https://github.com/wangzw/hawq-devel-env
Docker images can be found at
https://hub.docker.com/r/mayjojo/hawq-devel/
https://hub.docker.com/r/mayjojo/hawq-test/
12. Demo & QA
More Information
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61320026