This document provides a beginner's guide to contributing to open source projects. It discusses why people contribute (e.g. to expand knowledge), what organizations gain from contributions (e.g. business enablement), and how to get started. The guide recommends starting with documentation, answering questions, reporting bugs precisely, and eventually writing code as skills are built. Contributing helps individuals and moves projects forward for the benefit of all.
The document discusses the video streaming industry and analyzes various competitors in the space such as Hulu, Crackle, YouTube, Netflix and more. It examines factors like pricing, quality of content delivery methods for each platform. The summary also provides an overview of Hulu's business strategies around multi-platform expansion, leveraging data, and negotiating new partnerships to maintain independence in a changing industry.
Hero Hub Help - YouTube Content Strategy For BrandsBrendan Gahan
Full blog post here - http://brendangahan.com/youtube-channel-strategy-hero-hub-help/
A brief overview of the Hero, Hub, Help YouTube Content Strategy,
Encore is a new software development platform that aims to radically improve developer productivity and experience when building modern cloud software. It provides unique end-to-end insights through a framework that enforces constraints and static analysis to understand applications. This allows Encore to automatically setup infrastructure, instrument tracing, and more. Early users report fantastic experiences with no boilerplate and being able to focus on their product instead of infrastructure. The founders aim to grow through word-of-mouth by nurturing a community of builders and engaging developers where they gather.
Advanced Content Creation, SEO & StorytellingCasey Armstrong
Advanced Content Creation, SEO & Storytelling
Here are the slides for the content marketing, SEO, storytelling, and growth hacking events that Patrick Vlaskovits and I hosted with our partners below across Europe in 2016:
- Beta-i (Lisbon)
- Porto Design Factory (Porto)
- Founders (Copenhagen)
- EIT Digital (Helsinki)
- Mosaik (Budapest)
We broke the events into the following 5 sections:
- What really is growth hacking?
- State Of The Industry
- Advanced Content Marketing & Creation
- Advanced SEO
- Storytelling
All of our examples came from personal experiences that we had not written or spoken about prior.
If you have any questions, please contact me (@CaseyA - casey@fullstackmarketer.com) or Patrick (@Pv).
Farcana is developing an AAA metaverse game with battle royale mode and BTC prizes. Most current NFT games are pyramid schemes that will fail once hype decreases, but Farcana has a stable economic model backed by real BTC mining assets. The game will have an NFT marketplace and characters/items that can be used to earn rewards by playing. Farcana aims to introduce a new benchmark for metaverse gaming with its play-to-hash model where the prize fund is backed by public Bitcoin mining facilities.
LinkedIn started in 2003 with 2700 members in the first week and has since grown to over 400 million members globally. It initially had a monolithic Java application called Leo serving all pages and using a single SQL database. To scale, LinkedIn developed its first dedicated service for the member connection graph and another for search. It later introduced replica databases, extracted services from Leo using a service-oriented architecture, added caching, and implemented Kafka as a universal data pipeline. LinkedIn now has over 750 services, uses multiple data centers, and architectures like Rest.li and various data infrastructure solutions to continue scaling as a global site with massive traffic and data volumes.
Gamestate is a megaverse nexus, uniting gamers, fans, developers, creators, and merchants in a place of fun, discovery, and learning.
An open world, offering sales channels for games, apps, advertising, gaming equipment, music, media, and general merchandise as well as a Rocket Launchpad accelerator for indie game startups.
Unified profiles solve the problem of fragmented gaming accounts and achievements; allowing gamers to create and import their existing game profiles and leaderboard ranks, collated into a single portable, immutable, privacy-centric, achievements-based blockchain digital identity profile for ultimate flexing and bragging rights!
The document discusses Netflix and its rise as an online streaming platform. It provides details on Netflix's history starting as a DVD rental service and its transition to online streaming. It highlights Netflix's large library of content, affordable subscription costs, and availability across devices as factors in its success. The document also discusses Netflix's competitors, its strategy of producing original content, and its growth in subscribers globally. Netflix is presented as revolutionizing the entertainment industry and becoming the dominant force in online streaming.
SolChicks, a play-to-earn fantasy game built on Solana, made headlines after being backed by over 113 different venture capital funds.
The SolChicks game demo attracted over 50,000 players in only the first week of its release. Their parent company Catheon also owns other games such as Seoul Stars, a “sing-to-earn” game endorsed by K-pop stars, and Angrymals, a player-versus-player fortress defence strategy mobile game inspired by Angry Birds and Worms.
In a press release published after its successful IDO, SolChicks said that it has raised over $20 million from more than 300 private investors. The game’s IDO is set to be conducted on 38 launchpads at a public price of $0.05 per token, implying a fully diluted market capitalization of $500 million for the $CHICKS token.
A Git Workflow Model or Branching StrategyVivek Parihar
Vivek Parihar is a serial entrepreneur and polyglot engineer who currently serves as VP of Engineering at XOXODay. He has co-founded two startups and previously served as Head of Engineering for Mobile at Yatra. When not working, he enjoys extreme thrill-seeking adventures like trekking and boxing. The document then outlines Gitflow, a branching model for managing code development, including feature branches for new features, release branches to prepare releases, and hotfix branches for urgent bug fixes in production.
This presentation provides information on Inflectra, our product suite and our partnership programs, including solution partners, implementation partners, and technology partners.
SolChicks pitch deck: $77M for blockchain gamingPitch Decks
SolChicks is a play-to-earn fantasy game built on Solana. The SolChicks game demo attracted over 50,000 players in only the first week of its release.
The startup gained over 350,000 followers on Twitter and claims its community consists of 700,000 across 20 countries, amassed in just four short months.
The Australian-founded play-to-earn blockchain gaming company started just September has already raised $77 million from venture capital and institutional investment funds, as well as more than 100 staff. SolChicks is scheduled for a mini-game release at the end of March 2022, an “alpha” release in April and an official release in September.
PDT 79 - $10 million - Seed - Qortex.pdfHajeJanKamps
Qortex provides intelligent video analytics to help businesses deeply understand video using AI. Their solution allows for video categorization, correlation of audience actions to video context, and content curation. Their first product, On-Stream, places smart video overlays for connected TV, web video, and games. On-Stream delivers more relevant, less interruptive ads and has shown engagement improvements up to 6.8x and CTR improvements up to 2.3x compared to traditional video ads. Qortex has now expanded into the growing connected TV market and their marketplace connects major brands, agencies, demand side platforms and publishers.
The document discusses strategies for event marketing before, during, and after an industry event. It summarizes research from a survey of 403 technology professionals about their preferences and behaviors related to industry events. Some key findings include: 50% of attendees schedule meetings with vendors before events; 64% are more likely to visit booths with subject matter experts; and 59% would engage post-event with vendors that demonstrated how their technology could help the attendee's business. The document advocates for an integrated marketing approach using events as part of an overall strategy.
The document discusses trends in the US music streaming market and provides recommendations for consumer marketing. It notes that streaming grew 32% in 2013 to 118.1 billion streams. Most streaming occurs on mobile devices, with 96% of Pandora's 80 million users accessing it via mobile. It identifies four target listener groups and recommends focusing messaging on switching and new streaming users, simplifying the core messages. The document outlines a strategic direction, positioning and messaging framework, and discusses goals for a go-to-market approach and ongoing customer engagement model.
The document provides an introduction to Git and GitHub. It explains that Git is an open-source version control system created by Linus Torvalds, while GitHub is a hosting service for software development projects that uses Git for version control. The document outlines the agenda which includes explaining what a version control system is, demonstrating GitHub, and reviewing important Git commands.
Bitspawn Esports Software Investment Deck Eric Godwin
Bitspawn is developing an esports platform to address problems in the growing industry such as unpaid player winnings and a lack of monetization opportunities. Their platform will provide automated tournaments, match reporting, payments, and connections to sponsors. This will help more players to be discovered and make a living from esports while giving advertisers access to the large esports audience. Bitspawn aims to become the leading esports platform through partnerships, promoted events, and supporting multiple game titles.
Oh sweet! The Sugar learning environmentJulie Pichon
Learn about Sugar, the learning platform for children. Sugar offers an innovative desktop environment designed to encourage collaboration and critical thinking through Activities.
Presented at Irish Hackerspace Week in August 2010.
Making Your First Open-Source ContributionJulie Pichon
A lightning talk aiming to make the path a bit clearer for open-source enthusiasts who would like to make their first contribution but are not sure where to start, using my first patch as an example.
Transcript: http://www.jpichon.net/blog/2013/02/talk-transcript-first-open-source-contribution/
Presented at Irish Hackerspace Week in August 2011.
Presented at Ireland Girl Geek Dinners in January 2013.
Presented at theDoctConf in June 2013.
Making your first OpenStack contribution (EuroPython)Julie Pichon
The document discusses contributing to OpenStack and provides guidance on setting up accounts, choosing a first task, fixing bugs, writing code and documentation, and submitting patches for review. Key steps include setting up Launchpad, Gerrit, and OpenStack Foundation accounts, using DevStack for testing, addressing bugs marked as low-hanging fruit, writing unit tests, fixing documentation typos, and ensuring code/documentation style guidelines are followed before submitting patches for review on Gerrit.
Making Your First Open-Source Contribution (EuroPython)Julie Pichon
Do you like open-source? Would you like to give back somehow but are not sure what to do or where to start? In this presentation we look at the usual workflow for making any kind of contribution, using a real patch as an example. I'm using my first contribution to OpenStack as an example, as this seems fitting for a Python conference!
Hand-out: http://tinyurl.com/ep-open-source
Video: https://www.youtube.com/watch?v=U7HJuC84Lpw
Transcript from a previous, quite similar version of the talk: http://www.jpichon.net/blog/2013/02/talk-transcript-first-open-source-contribution/
Presented in Berlin for EuroPython, July 23rd 2014.
This document discusses how to contribute to open source projects. It recommends starting small by working on your own projects and then contributing to smaller open source projects. It suggests picking a simple project like a website to start with. It also provides tips for how to communicate as an open source contributor, such as being polite, using bug databases and documentation, and asking others for help. The overall message is that starting with small contributions and prioritizing communication and documentation will help new contributors get involved in open source.
Gnunify 2016 | Open Source Contributions | Drupal | PurushotamPurushotam Rai
This document discusses Drupal, an open-source content management system. It covers why Drupal is open source, how it has a large worldwide user base and community. The document also outlines ways to contribute to Drupal, such as documentation, translations, development, and forums. It provides an example of how to solve a problem by developing a custom module for a job portal site.
The document summarizes how the BI team at Big Fish Games pitched their initial investment, structured their team, and approached their initial build out of BI capabilities. To pitch the initial investment, they focused on compelling business deliverables and iterating over key business problems. For their team structure, they brought in experienced engineers, paired people, and learned through real projects. In their initial build out, they focused on incremental delivery through business projects, gradually transitioned users, and leveraged their vendor(s).
This document discusses improving the reliability and availability of Hadoop clusters. It notes that while Hadoop is taking on more database-like features, the uptime of many Hadoop clusters and lack of SLAs is still an afterthought. It proposes separating computing and storage to improve availability like cloud Hadoop offerings do. It also suggests building KPIs and monitoring around Hadoop clusters similar to how many companies monitor data warehouses. Centralizing Hadoop infrastructure management into a "Big Data as a Service" model is presented as another way to improve reliability.
Presentation from physical to virtual to cloud emcxKinAnx
The document discusses three paradigm shifts in information technology: 1) From physical to virtual computing as virtualization becomes mainstream, 2) The network becoming the computer through network-centric architectures, 3) Storage evolving from a server-centric to a virtual, flexible model. These shifts are creating an industrialized "cloud computing" platform for intelligent, on-demand delivery of IT services.
The document summarizes recommendations for efficiently and effectively managing Apache Hadoop based on observations from analyzing over 1,000 customer bundles. It covers common operational mistakes like inconsistent operating system configurations involving locale, transparent huge pages, NTP, and legacy kernel issues. It also provides recommendations for optimizing configurations involving HDFS name node and data node settings, YARN resource manager and node manager memory settings, and YARN ATS timeline storage. The presentation encourages adopting recommendations built into the SmartSense analytics product to improve cluster operations and prevent issues.
This document summarizes improvements made to HDFS to optimize performance, stabilize operations, and improve supportability. Key areas discussed include logging enhancements, metrics and tools for troubleshooting, load management through RPC improvements, and changes to reduce garbage collection overhead and improve liveness detection. Specific optimizations covered range from code changes to reduce logging verbosity to adding batch processing of block reports.
This document provides an overview of the past, present, and future of Apache Hadoop YARN. It discusses how YARN has evolved from Apache Hadoop 2.6/2.7 to now support 2.8 with features like dynamic resource configuration, container resizing, and Docker support. Upcoming work includes support for arbitrary resource types, federation of multiple YARN clusters, and a new ResourceManager UI. The future of YARN scheduling may include distributed scheduling, intra-queue preemption, and scheduling based on actual resource usage.
Centrica implemented a Hadoop data platform to gain insights from large and diverse data sources. This provided a single customer view and enabled new applications and dashboards to improve customer service. The previous data infrastructure was complicated and could not scale to handle growing IoT and smart meter data. The Hadoop implementation followed agile and DevOps practices and has been successful, winning industry awards. Centrica aims to further collaboration and leverage cloud to reduce costs as big data adoption continues.
This document discusses best practices for running Spark in production. It begins with introductions from the presenters and an overview of Spark deployment modes on YARN. The main topics covered are Spark security using Kerberos authentication and authorization, communication channels and encryption in YARN cluster mode, common issues, and performance tuning. For performance, it recommends choosing executor and task sizes to balance efficiency and overhead, and increasing task parallelism to mitigate data skew problems. The goal is to understand workload patterns and monitor behavior to effectively tune Spark for different situations.
This document discusses Symantec's journey towards enabling self-service analytics clusters using Cloudbreak and Ambari. It describes how Symantec built a self-service analytics platform using Ambari to automate the deployment of Hadoop clusters on their private OpenStack cloud. However, they later needed a solution that could deploy clusters across different cloud providers. They adopted Cloudbreak to deploy clusters on AWS and contributed extensions like Keystone v3 support to enable Cloudbreak to work with their OpenStack cloud as well. This allows them to deploy analytics clusters across different clouds through a single tool and interface.
Impetus provides expert consulting services around Hadoop implementations, including R&D, assessment, deployment (on private and public clouds), optimizations for enhanced static shared data implementations.
This presentation speaks about Advanced Hadoop Tuning and Optimisation.
This document introduces BugBase, a simple PHP bug tracking system. It provides an overview of BugBase's features such as login/registration, dashboard, reporting bugs, and an administration system. The document also discusses why bug tracking is useful for sharing information, helping newcomers, and keeping records. It notes that the current version of BugBase has room for improvement with 98 commits in 7 days leaving quality issues and pending work. Developers are invited to fork and contribute to BugBase to help with its evolution.
It shows all the main functionality of bugzilla useful for a tester.
It shows how to log-in, how to generate different types of report, How to submit a bug.
LF_APIStrat17_Pain-Free Microservices Integration Using Contract TestsLF_APIStrat
Service integration can be a pain when providers of APIs don't have visibility into how they're being consumed. Evolving their APIs can be a slow, painful process. When a contract breaking change is released, API consumers may not find out until in production, which integration point failed and why.
Contract Tests are lightweight, easy to maintain, and quick at detecting breaking changes in API contracts. Even better are Consumer-Driven Contract Tests (CDCTs), that help consumers build their integrations based on provider contracts they expect. Serving these contracts to providers enable their creators evolve their APIs while complying with consumers' specifications.
This document discusses open source software and how to get involved in open source projects. It covers basics of Git and GitHub, how to contribute to open source by forking projects, making changes, and submitting pull requests. It encourages learning skills, finding projects to contribute to, and provides tips for getting started with open source contributions. The presenter shares their own open source journey and invites questions from the audience.
OpenAmplify is proud to release Version 2.0 to the world’s first comprehensive semantic platform. Full of new features, but still compatible with V1.1, OpenAmplify 2.0 reflects our commitment to delivering real-world, groundbreaking advances to the Semantic Web community.
As part of our latest release, OpenAmplify version 2.0, we offered a live webinar on January 21. 2010. OpenAmplify CIO Mike Petit led an informative short session about this new release and answered community questions.
This document provides guidance on technical speaking, including tips for preparing and delivering local meetup talks and conference presentations. It discusses overcoming the fear of having nothing to say, finding local meetups to speak at, creating effective content and slides, practicing delivery, handling questions, applying to conferences, increasing odds of acceptance, preparing for a first conference talk, and following up after speaking. The document emphasizes preparing thoroughly, keeping slides concise, practicing delivery, and networking with attendees.
This presentation shows you how to create a smoke testing process for your website or mobile app. A smoke test allows you to test your UI and make sure that everything functions how you imagined it to function.
WordCamp Columbus 2011 - What's Next for WordPressandrewnacin
This document summarizes WordPress updates from version 3.0 to the current version and discusses what may be coming next. It also outlines ways for users to get involved, such as testing beta releases, contributing to documentation, reviewing themes, joining core development, and spreading the word about WordPress. The document encourages users to help improve WordPress by taking advantage of its open governance model and freedoms guaranteed by the GPL license.
GNUnify 2017 - Working on my first BUG.Aastha Vijay
This document provides an overview of how to report and fix bugs using Bugzilla. It discusses what a bug is, how Bugzilla helps track bugs, the basic steps for creating an account and filing a new bug, including providing a descriptive summary, reproduction steps, and answering questions. Tips are provided for searching bugs, using bugmail filtering, tagging bugs, and using keywords to find relevant bugs.
Hooks are experience designs used by products to form habits in users by connecting a user's problem to a company's solution. There are four parts to a hook: a trigger, an action, a reward, and investment. Triggers can be external like notifications or internal like emotions. The action is a simple behavior done in anticipation of a reward. Rewards activate the brain's reward system and can come from social interaction, problem-solving, or a sense of mastery. With repeated use forming an investment, hooks change occasional behaviors into habits that benefit the company. While hooks can increase user retention, product designers have a responsibility to use them ethically to build good habits.
Oak Systems Pvt Ltd is a specialist independent Testing and V&V organization based in Bangalore with offices in Pune and other places. Oak Systems is celebrating its 10th anniversary during 2008. As part of these celebrations, we have launched Oak~TQ Seminars 2008, a free technical seminar Series throughout the year. This is our way of saying ‘Big Thank You’ to the Indian Software Industry.
------
VALENTINE’S DAY SPECIAL: A FREE I.T. TECHNICAL SEMINAR ON “FROM TESTER WITH LOVE … THE ART OF DEFECT REPORTING” BY MR. MOHAN KUMAR K L, INFOSYS
SESSION CHAIR: MR SANKET ATAL, DIRECTOR, ORACLE
VENUE: INDIRANAGAR CLUB, BANGALORE
Speaker: Mohan Kumar KL, fondly known as KLM, is a Product Line Manager in Product Management & Development group of Finacle, Infosys.
He is responsible for testing, validation and quality certification of the Product Finacle(tm). He has a rich experience in Banking and Banking Technology spanning over two decades.
Session Chair - SANKET ATAL
Mr. Atal is Director of R&D with Oracle's Fusion Middleware Group, heading the SOA development organization in India. Sanket has been with Oracle for 11 years and was one of the founders of Oracle's R&D centre in Portland, Oregon, USA.
This document summarizes lessons learned from moving the game Bejeweled Blitz from a premium paid model to a freemium model on iOS. Some key points:
1) The switch was very successful, driving a 9x increase in daily downloads, 5x increase in daily/weekly active users, and 5x increase in daily revenue.
2) iOS user metrics like engagement, retention, and monetization significantly outperformed the Facebook version of the game in many cases by 2x.
3) A balanced freemium model using boosts, rare gems, and daily spins was effective at driving revenue while keeping the game fun to play without paying.
4) Promptly responding
There’s a huge disconnect between the business world and the engineering world that drives our software projects into the ground. We rewrite our software over and over again, not because we lack the engineering skills to build great software, but because we fail to communicate, make decisions in ignorance, and don’t adapt when our current strategy is obviously failing.
What if we could measure the indirect costs of pain building up on a software project? What if we could measure the loss of productivity, the escalating costs and risks, and could steer our projects with a data-driven feedback loop?
Visibility changes everything. With visibility, we can bridge the gap between the business world and the engineering world, and get everyone pulling the same direction.
Find out how you can:
1. Identify the biggest causes of productivity loss on your software project
2. Translate the world of developer pain into explicit costs and risks
3. Collaborate with other industry professionals in the art of data-driven software mastery
Let's break down the challenges and learn our way to success, one small victory at a time.
Speaker: Janelle Klein
Janelle is a NFJS Tour Speaker and author of the book, Idea Flow: How to Measure the PAIN in Software Development: a modern strategy for systematically optimizing software productivity with a data-driven feedback loop.
This document discusses how social networks have changed the rules of collaboration and communication. It analyzes the factors that have made social networks like Facebook successful, such as being community-oriented, addictive through constant updates, having a low barrier to entry, requiring little resources, and being mobile-friendly. These same factors can be applied to project management collaboration through technologies that emphasize people over resources, are loosely coupled, self-organizing, have open APIs, and keep things simple.
The document discusses improving bug tracking systems. It describes the current process of reporting bugs which involves users providing detailed steps to reproduce the issue. It envisions a future where conversational agents assist users in reporting bugs by asking targeted questions to gather key details. This helps identify the likely cause of the bug and location to fix it. The document also discusses building models to predict bug fixes using decision trees trained on historical bug report data.
Showing How Security Has (And Hasn't) Improved, After Ten Years Of TryingDan Kaminsky
The document discusses the results of fuzz testing software from 2000-2010 to analyze how software security has improved over the last decade. The testing involved fuzzing four file formats (Office, PDF, etc.) across 18 programs from different years. This resulted in over 175,000 crashes. Analysis found over 900 unique bugs. Later versions had fewer exploitable bugs, indicating improving code quality. The results provide a potential "fuzzmark" metric for software security improvements, though comparisons across formats require more controls. The testing process and challenges ensuring data integrity are also outlined.
This document provides tips and shortcuts for using common computer programs and technologies more efficiently. It discusses keyboard shortcuts for Windows, browsers, Microsoft Office, searching, and other tasks. The tips are intended to help users feel more in control of their technology and spend less time on menial tasks. Discussion is encouraged about barriers to improving skills and how people can help each other.
This document discusses running Apache Spark and Apache Zeppelin in production. It begins by introducing the author and their background. It then covers security best practices for Spark deployments, including authentication using Kerberos, authorization using Ranger/Sentry, encryption, and audit logging. Different Spark deployment modes like Spark on YARN are explained. The document also discusses optimizing Spark performance by tuning executor size and multi-tenancy. Finally, it covers security features for Apache Zeppelin like authentication, authorization, and credential management.
This document discusses Spark security and provides an overview of authentication, authorization, encryption, and auditing in Spark. It describes how Spark leverages Kerberos for authentication and uses services like Ranger and Sentry for authorization. It also outlines how communication channels in Spark are encrypted and some common issues to watch out for related to Spark security.
The document discusses the Virtual Data Connector project which aims to leverage Apache Atlas and Apache Ranger to provide unified metadata and access governance across data sources. Key points include:
- The project aims to address challenges of understanding, governing, and controlling access to distributed data through a centralized metadata catalog and policies.
- Apache Atlas provides a scalable metadata repository while Apache Ranger enables centralized access governance. The project will integrate these using a virtualization layer.
- Enhancements to Atlas and Ranger are proposed to better support the project's goals around a unified open metadata platform and metadata-driven governance.
- An initial minimum viable product will be built this year with the goal of an open, collaborative ecosystem around shared
This document discusses using a data science platform to enable digital diagnostics in healthcare. It provides an overview of healthcare data sources and Yale/YNHH's data science platform. It then describes the data science journey process using a clinical laboratory use case as an example. The goal is to use big data and machine learning to improve diagnostic reproducibility, throughput, turnaround time, and accuracy for laboratory testing by developing a machine learning algorithm and real-time data processing pipeline.
This document discusses using Apache Spark and MLlib for text mining on big data. It outlines common text mining applications, describes how Spark and MLlib enable scalable machine learning on large datasets, and provides examples of text mining workflows and pipelines that can be built with Spark MLlib algorithms and components like tokenization, feature extraction, and modeling. It also discusses customizing ML pipelines and the Zeppelin notebook platform for collaborative data science work.
This document compares the performance of Hive and Spark when running the BigBench benchmark. It outlines the structure and use cases of the BigBench benchmark, which aims to cover common Big Data analytical properties. It then describes sequential performance tests of Hive+Tez and Spark on queries from the benchmark using a HDInsight PaaS cluster, finding variations in performance between the systems. Concurrency tests are also run by executing multiple query streams in parallel to analyze throughput.
The document discusses modern data applications and architectures. It introduces Apache Hadoop, an open-source software framework for distributed storage and processing of large datasets across clusters of commodity hardware. Hadoop provides massive scalability and easy data access for applications. The document outlines the key components of Hadoop, including its distributed storage, processing framework, and ecosystem of tools for data access, management, analytics and more. It argues that Hadoop enables organizations to innovate with all types and sources of data at lower costs.
This document provides an overview of data science and machine learning. It discusses what data science and machine learning are, including extracting insights from data and computers learning without being explicitly programmed. It also covers Apache Spark, which is an open source framework for large-scale data processing. Finally, it discusses common machine learning algorithms like regression, classification, clustering, and dimensionality reduction.
This document provides an overview of Apache Spark, including its capabilities and components. Spark is an open-source cluster computing framework that allows distributed processing of large datasets across clusters of machines. It supports various data processing workloads including streaming, SQL, machine learning and graph analytics. The document discusses Spark's APIs like DataFrames and its libraries like Spark SQL, Spark Streaming, MLlib and GraphX. It also provides examples of using Spark for tasks like linear regression modeling.
This document provides an overview of Apache NiFi and dataflow. It begins with an introduction to the challenges of moving data effectively within and between systems. It then discusses Apache NiFi's key features for addressing these challenges, including guaranteed delivery, data buffering, prioritized queuing, and data provenance. The document outlines NiFi's architecture and components like repositories and extension points. It also previews a live demo and invites attendees to further discuss Apache NiFi at a Birds of a Feather session.
Many Organizations are currently processing various types of data and in different formats. Most often this data will be in free form, As the consumers of this data growing it’s imperative that this free-flowing data needs to adhere to a schema. It will help data consumers to have an expectation of about the type of data they are getting and also they will be able to avoid immediate impact if the upstream source changes its format. Having a uniform schema representation also gives the Data Pipeline a really easy way to integrate and support various systems that use different data formats.
SchemaRegistry is a central repository for storing, evolving schemas. It provides an API & tooling to help developers and users to register a schema and consume that schema without having any impact if the schema changed. Users can tag different schemas and versions, register for notifications of schema changes with versions etc.
In this talk, we will go through the need for a schema registry and schema evolution and showcase the integration with Apache NiFi, Apache Kafka, Apache Storm.
There is increasing need for large-scale recommendation systems. Typical solutions rely on periodically retrained batch algorithms, but for massive amounts of data, training a new model could take hours. This is a problem when the model needs to be more up-to-date. For example, when recommending TV programs while they are being transmitted the model should take into consideration users who watch a program at that time.
The promise of online recommendation systems is fast adaptation to changes, but methods of online machine learning from streams is commonly believed to be more restricted and hence less accurate than batch trained models. Combining batch and online learning could lead to a quickly adapting recommendation system with increased accuracy. However, designing a scalable data system for uniting batch and online recommendation algorithms is a challenging task. In this talk we present our experiences in creating such a recommendation engine with Apache Flink and Apache Spark.
DeepLearning is not just a hype - it outperforms state-of-the-art ML algorithms. One by one. In this talk we will show how DeepLearning can be used for detecting anomalies on IoT sensor data streams at high speed using DeepLearning4J on top of different BigData engines like ApacheSpark and ApacheFlink. Key in this talk is the absence of any large training corpus since we are using unsupervised machine learning - a domain current DL research threats step-motherly. As we can see in this demo LSTM networks can learn very complex system behavior - in this case data coming from a physical model simulating bearing vibration data. Once draw back of DeepLearning is that normally a very large labaled training data set is required. This is particularly interesting since we can show how unsupervised machine learning can be used in conjunction with DeepLearning - no labeled data set is necessary. We are able to detect anomalies and predict braking bearings with 10 fold confidence. All examples and all code will be made publicly available and open sources. Only open source components are used.
QE automation for large systems is a great step forward in increasing system reliability. In the big-data world, multiple components have to come together to provide end-users with business outcomes. This means, that QE Automations scenarios need to be detailed around actual use cases, cross-cutting components. The system tests potentially generate large amounts of data on a recurring basis, verifying which is a tedious job. Given the multiple levels of indirection, the false positives of actual defects are higher, and are generally wasteful.
At Hortonworks, we’ve designed and implemented Automated Log Analysis System - Mool, using Statistical Data Science and ML. Currently the work in progress has a batch data pipeline with a following ensemble ML pipeline which feeds into the recommendation engine. The system identifies the root cause of test failures, by correlating the failing test cases, with current and historical error records, to identify root cause of errors across multiple components. The system works in unsupervised mode with no perfect model/stable builds/source-code version to refer to. In addition the system provides limited recommendations to file/open past tickets and compares run-profiles with past runs.
Improving business performance is never easy! The Natixis Pack is like Rugby. Working together is key to scrum success. Our data journey would undoubtedly have been so much more difficult if we had not made the move together.
This session is the story of how ‘The Natixis Pack’ has driven change in its current IT architecture so that legacy systems can leverage some of the many components in Hortonworks Data Platform in order to improve the performance of business applications. During this session, you will hear:
• How and why the business and IT requirements originated
• How we leverage the platform to fulfill security and production requirements
• How we organize a community to:
o Guard all the players, no one gets left on the ground!
o Us the platform appropriately (Not every problem is eligible for Big Data and standard databases are not dead)
• What are the most usable, the most interesting and the most promising technologies in the Apache Hadoop community
We will finish the story of a successful rugby team with insight into the special skills needed from each player to win the match!
DETAILS
This session is part business, part technical. We will talk about infrastructure, security and project management as well as the industrial usage of Hive, HBase, Kafka, and Spark within an industrial Corporate and Investment Bank environment, framed by regulatory constraints.
HBase is a distributed, column-oriented database that stores data in tables divided into rows and columns. It is optimized for random, real-time read/write access to big data. The document discusses HBase's key concepts like tables, regions, and column families. It also covers performance tuning aspects like cluster configuration, compaction strategies, and intelligent key design to spread load evenly. Different use cases are suitable for HBase depending on access patterns, such as time series data, messages, or serving random lookups and short scans from large datasets. Proper data modeling and tuning are necessary to maximize HBase's performance.
There has been an explosion of data digitising our physical world – from cameras, environmental sensors and embedded devices, right down to the phones in our pockets. Which means that, now, companies have new ways to transform their businesses – both operationally, and through their products and services – by leveraging this data and applying fresh analytical techniques to make sense of it. But are they ready? The answer is “no” in most cases.
In this session, we’ll be discussing the challenges facing companies trying to embrace the Analytics of Things, and how Teradata has helped customers work through and turn those challenges to their advantage.
In this talk, we will present a new distribution of Hadoop, Hops, that can scale the Hadoop Filesystem (HDFS) by 16X, from 70K ops/s to 1.2 million ops/s on Spotiy's industrial Hadoop workload. Hops is an open-source distribution of Apache Hadoop that supports distributed metadata for HSFS (HopsFS) and the ResourceManager in Apache YARN. HopsFS is the first production-grade distributed hierarchical filesystem to store its metadata normalized in an in-memory, shared nothing database. For YARN, we will discuss optimizations that enable 2X throughput increases for the Capacity scheduler, enabling scalability to clusters with >20K nodes. We will discuss the journey of how we reached this milestone, discussing some of the challenges involved in efficiently and safely mapping hierarchical filesystem metadata state and operations onto a shared-nothing, in-memory database. We will also discuss the key database features needed for extreme scaling, such as multi-partition transactions, partition-pruned index scans, distribution-aware transactions, and the streaming changelog API. Hops (www.hops.io) is Apache-licensed open-source and supports a pluggable database backend for distributed metadata, although it currently only support MySQL Cluster as a backend. Hops opens up the potential for new directions for Hadoop when metadata is available for tinkering in a mature relational database.
In high-risk manufacturing industries, regulatory bodies stipulate continuous monitoring and documentation of critical product attributes and process parameters. On the other hand, sensor data coming from production processes can be used to gain deeper insights into optimization potentials. By establishing a central production data lake based on Hadoop and using Talend Data Fabric as a basis for a unified architecture, the German pharmaceutical company HERMES Arzneimittel was able to cater to compliance requirements as well as unlock new business opportunities, enabling use cases like predictive maintenance, predictive quality assurance or open world analytics. Learn how the Talend Data Fabric enabled HERMES Arzneimittel to become data-driven and transform Big Data projects from challenging, hard to maintain hand-coding jobs to repeatable, future-proof integration designs.
Talend Data Fabric combines Talend products into a common set of powerful, easy-to-use tools for any integration style: real-time or batch, big data or master data management, on-premises or in the cloud.
While you could be tempted assuming data is already safe in a single Hadoop cluster, in practice you have to plan for more. Questions like: "What happens if the entire datacenter fails?, or "How do I recover into a consistent state of data, so that applications can continue to run?" are not a all trivial to answer for Hadoop. Did you know that HDFS snapshots are handling open files not as immutable? Or that HBase snapshots are executed asynchronously across servers and therefore cannot guarantee atomicity for cross region updates (which includes tables)? There is no unified and coherent data backup strategy, nor is there tooling available for many of the included components to build such a strategy. The Hadoop distributions largely avoid this topic as most customers are still in the "single use-case" or PoC phase, where data governance as far as backup and disaster recovery (BDR) is concerned are not (yet) important. This talk first is introducing you to the overarching issue and difficulties of backup and data safety, looking at each of the many components in Hadoop, including HDFS, HBase, YARN, Oozie, the management components and so on, to finally show you a viable approach using built-in tools. You will also learn not to take this topic lightheartedly and what is needed to implement and guarantee a continuous operation of Hadoop cluster based solutions.
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPathCommunity
Welcome to our third live UiPath Community Day Amsterdam! Come join us for a half-day of networking and UiPath Platform deep-dives, for devs and non-devs alike, in the middle of summer ☀.
📕 Agenda:
12:30 Welcome Coffee/Light Lunch ☕
13:00 Event opening speech
Ebert Knol, Managing Partner, Tacstone Technology
Jonathan Smith, UiPath MVP, RPA Lead, Ciphix
Cristina Vidu, Senior Marketing Manager, UiPath Community EMEA
Dion Mes, Principal Sales Engineer, UiPath
13:15 ASML: RPA as Tactical Automation
Tactical robotic process automation for solving short-term challenges, while establishing standard and re-usable interfaces that fit IT's long-term goals and objectives.
Yannic Suurmeijer, System Architect, ASML
13:30 PostNL: an insight into RPA at PostNL
Showcasing the solutions our automations have provided, the challenges we’ve faced, and the best practices we’ve developed to support our logistics operations.
Leonard Renne, RPA Developer, PostNL
13:45 Break (30')
14:15 Breakout Sessions: Round 1
Modern Document Understanding in the cloud platform: AI-driven UiPath Document Understanding
Mike Bos, Senior Automation Developer, Tacstone Technology
Process Orchestration: scale up and have your Robots work in harmony
Jon Smith, UiPath MVP, RPA Lead, Ciphix
UiPath Integration Service: connect applications, leverage prebuilt connectors, and set up customer connectors
Johans Brink, CTO, MvR digital workforce
15:00 Breakout Sessions: Round 2
Automation, and GenAI: practical use cases for value generation
Thomas Janssen, UiPath MVP, Senior Automation Developer, Automation Heroes
Human in the Loop/Action Center
Dion Mes, Principal Sales Engineer @UiPath
Improving development with coded workflows
Idris Janszen, Technical Consultant, Ilionx
15:45 End remarks
16:00 Community fun games, sharing knowledge, drinks, and bites 🍻
It's your unstructured data: How to get your GenAI app to production (and spe...Zilliz
So you've successfully built a GenAI app POC for your company -- now comes the hard part: bringing it to production. Aparavi addresses the challenges of AI projects while addressing data privacy and PII. Our Service for RAG helps AI developers and data scientists to scale their app to 1000s to millions of users using corporate unstructured data. Aparavi’s AI Data Loader cleans, prepares and then loads only the relevant unstructured data for each AI project/app, enabling you to operationalize the creation of GenAI apps easily and accurately while giving you the time to focus on what you really want to do - building a great AI application with useful and relevant context. All within your environment and never having to share private corporate data with anyone - not even Aparavi.
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Zilliz
Enterprises have traditionally prioritized data quantity, assuming more is better for AI performance. However, a new reality is setting in: high-quality data, not just volume, is the key. This shift exposes a critical gap – many organizations struggle to understand their existing data and lack effective curation strategies and tools. This talk dives into these data challenges and explores the methods of automating data curation.
Redefining Cybersecurity with AI CapabilitiesPriyanka Aash
In this comprehensive overview of Cisco's latest innovations in cybersecurity, the focus is squarely on resilience and adaptation in the face of evolving threats. The discussion covers the imperative of tackling Mal information, the increasing sophistication of insider attacks, and the expanding attack surfaces in a hybrid work environment. Emphasizing a shift towards integrated platforms over fragmented tools, Cisco introduces its Security Cloud, designed to provide end-to-end visibility and robust protection across user interactions, cloud environments, and breaches. AI emerges as a pivotal tool, from enhancing user experiences to predicting and defending against cyber threats. The blog underscores Cisco's commitment to simplifying security stacks while ensuring efficacy and economic feasibility, making a compelling case for their platform approach in safeguarding digital landscapes.
The Zaitechno Handheld Raman Spectrometer is a powerful and portable tool for rapid, non-destructive chemical analysis. It utilizes Raman spectroscopy, a technique that analyzes the vibrational fingerprint of molecules to identify their chemical composition. This handheld instrument allows for on-site analysis of materials, making it ideal for a variety of applications, including:
Material identification: Identify unknown materials, minerals, and contaminants.
Quality control: Ensure the quality and consistency of raw materials and finished products.
Pharmaceutical analysis: Verify the identity and purity of pharmaceutical compounds.
Food safety testing: Detect contaminants and adulterants in food products.
Field analysis: Analyze materials in the field, such as during environmental monitoring or forensic investigations.
The Zaitechno Handheld Raman Spectrometer is easy to use and features a user-friendly interface. It is compact and lightweight, making it ideal for field applications. With its rapid analysis capabilities, the Zaitechno Handheld Raman Spectrometer can help you improve efficiency and productivity in your research or quality control workflows.
Top 12 AI Technology Trends For 2024.pdfMarrie Morris
Technology has become an irreplaceable component of our daily lives. The role of AI in technology revolutionizes our lives for the betterment of the future. In this article, we will learn about the top 12 AI technology trends for 2024.
Keynote : AI & Future Of Offensive SecurityPriyanka Aash
In the presentation, the focus is on the transformative impact of artificial intelligence (AI) in cybersecurity, particularly in the context of malware generation and adversarial attacks. AI promises to revolutionize the field by enabling scalable solutions to historically challenging problems such as continuous threat simulation, autonomous attack path generation, and the creation of sophisticated attack payloads. The discussions underscore how AI-powered tools like AI-based penetration testing can outpace traditional methods, enhancing security posture by efficiently identifying and mitigating vulnerabilities across complex attack surfaces. The use of AI in red teaming further amplifies these capabilities, allowing organizations to validate security controls effectively against diverse adversarial scenarios. These advancements not only streamline testing processes but also bolster defense strategies, ensuring readiness against evolving cyber threats.
Finetuning GenAI For Hacking and DefendingPriyanka Aash
Generative AI, particularly through the lens of large language models (LLMs), represents a transformative leap in artificial intelligence. With advancements that have fundamentally altered our approach to AI, understanding and leveraging these technologies is crucial for innovators and practitioners alike. This comprehensive exploration delves into the intricacies of GenAI, from its foundational principles and historical evolution to its practical applications in security and beyond.
Demystifying Neural Networks And Building Cybersecurity ApplicationsPriyanka Aash
In today's rapidly evolving technological landscape, Artificial Neural Networks (ANNs) have emerged as a cornerstone of artificial intelligence, revolutionizing various fields including cybersecurity. Inspired by the intricacies of the human brain, ANNs have a rich history and a complex structure that enables them to learn and make decisions. This blog aims to unravel the mysteries of neural networks, explore their mathematical foundations, and demonstrate their practical applications, particularly in building robust malware detection systems using Convolutional Neural Networks (CNNs).
Increase Quality with User Access Policies - July 2024Peter Caitens
⭐️ Increase Quality with User Access Policies ⭐️, presented by Peter Caitens and Adam Best of Salesforce. View the slides from this session to hear all about “User Access Policies” and how they can help you onboard users faster with greater quality.
6. The Legalese
Apache License 2.0
BSD 3-Clause "New" or "Revised"
license
BSD 2-Clause "Simplified" or
"FreeBSD" license
GNU General Public License (GPL)
GNU Library or "Lesser" General
Public License (LGPL)
MIT license
Mozilla Public License 2.0
Common Development and
Distribution License
Eclipse Public License
7. In a Nutshell
Grossly
Oversimplified
Explanation
Free As in Beer ,
Free As in Speech
9. The How – Absolute First Step
Documentation
Edit Wiki
Contribute Example
Do a screenshot
10. The How – The replier
Answer
Mailing List
IRC Channel
Write your own
experiences
Attend user groups
Arrange user groups
11. The How – The Bug Finder
Bug Finder
Report a Bug
Reporting a bug is
harder than actually
solving it.
12. The Apprentice Bug Finder
Precise and informative bug report
A Bad report:
“FooBar Doesn’t work”
A Slightly Better report:
“FooBar Doesn’t work when I press Key K”
A Good report:
“FooBar Broken: Using version 10.5 , on OS Version 200.3
when I press K, exception ArrowMissing raised. Note this only
happens when K is pressed after J and O. Tried it with Version 199.7
and this behaviour did not happen. I recently updated directly from
199.7 and did not apply 199.8”
13. The Master Bug Finder
A Good report:
“FooBar Broken: Using version 10.5 , on OS Version 200.3 when I
press K, exception ArrowMissing raised. Note this only happens when K is
pressed after J and O. Tried it with Version 199.7 and this behaviour did not
happen. I recently updated directly from 199.7 and did not apply 199.8”
Code where this behaviour is seen
b = x + 25;
Code solution
b = x;
Welcome to my presentation.
My session is an account of my personal “contribution” journey into the often contentious and confusing Open Source World.
As part of my journey I hope to shed light on:
The Why, What and How
Why do people contribute? What is the economic incentive for people to contribute? Do folks just willy-nilly add "stuff" and is it all held together by a piece of string? How does one communicate with the Open Source community? What happens if i dont know how to code or English isn't my first language, can i contribute? Do i need to ask permission before i contribute ?
What does an organization get out of open source contributions ? Not all organizations are like HortonWorks or RedHat or one of the other open source heavy companies. Most organizations are technology consumers , what happens if you work in one of those organizations. Does it make sense for your organization to be an active participant in the Open source community and if yes then what is the advantage to an organization.
If the Why and What are decided, discussed and understood – it then becomes a question of the How. How does one actually go about contributing to open source? What are the skills, steps, pitfalls to avoid ?
As a starter , I want to answer the question everyone is dying to ask - Is the Open source world made up of Wizards with tall pointy grey hats, long grey cloak and a silver scarf and do they know magic ?
And the answer is of course yes.
So, Why do I contribute
Personal background :
- Not paid to do development
a) working on solving analytical business problems that involves large amounts of data. Business focus and tech is atmost a side concern.
b) every month there is a new release from a vendor
c)
2. Ease of understanding for newer technology
3. Doesn’t do what it says on the tin
4. Getting better.
How do I choose a project to contribute to:
I use The “Scratch my own itch” technique.
-> Choose a problem you are interested in.
If you are interested in Machine Learning then choose a project relevant to your interests. If you are interested in SQL on Hadoop – then that’s the way to go etc.
-> Even better choose a problem your organization is interested in resolving.
Why is this important
- This is important because this is a long race..
Time
Motivation
Shallow learning curve
Pavlovian response
So we know the why and where to contribute:
I want to talk about whats in it for your organization to contribute
1. Open Source is a practical way to create and nurture good quality software which then enables the business.
Most code is infrastructure – no material value
2. Increase knowledge value of developers
3. Use the community as a partner to sync with other projects solving the same problems.
4,. Complaint box – Given enough eye balls all bugs are shallow.
5. Lever-age – Hire better developers
What it is not : IT is not about Karma , morality or any sort of good feelings. Its an economic necessity.
One question I get asked is
How do I protect my IP without being held liable
Not an expert
Get legal team involved
An over simplified explanation
Free as in beer
Free as in speech
This is where Apache Software Foundation comes in
What does it bring to the table
- A clean well defined legal framework for contribution.
A strong community
Lots of pre-defined and clear grunt work that has been sorted. –
Elders in the community
So now how do you actually start contributing
My first contribution was
How to enforce coding standards in IntelliJ for NiFI.
1. Dip
Doesn’t have to be about the project alone.
It could be about
Version control
Tips and techniques learnt
Your experiences
Finding the problem is harder than actually solving it.
The first aim of a bug report is to let the programmer see the failure with their own eyes. If you can't be with them to make it fail in front of them, give them detailed instructions so that they can make it fail for themselves.
In case the first aim doesn't succeed, and the programmer can't see it failing themselves, the second aim of a bug report is to describe what went wrong. Describe everything in detail. State what you saw, and also state what you expected to see. Write down the error messages, especially if they have numbers in.
By all means try to diagnose the fault yourself if you think you can, but if you do, you should still report the symptoms as well.
Be ready to provide extra information if the programmer needs it. If they didn't need it, they wouldn't be asking for it. They aren't being deliberately awkward. Have version numbers at your fingertips, because they will probably be needed.
Write clearly. Say what you mean, and make sure it can't be misinterpreted.
Above all, be precise. Programmers like precision.
SO I found a bug in the way NiFI interacts with AWS
So how do I think I can better
Learning from the elders in the group.
Better at team work
Be a better replier for newbies coming into the group
And of course writing more code.