The document announces a Cascading Meetup on March 5th, 2013 in Cupertino, CA to discuss enterprise data workflows, ANSI SQL support, and test-driven development. It provides examples of how large organizations use data workflows between front-end applications, back-office systems, logs, and Hadoop clusters to drive analytics and reporting. Main Street firms are also migrating workflows to Hadoop for cost savings and scalability.
My talk at the Long Now Foundation seminars on Long Term Thinking on September 5, 2012. Overlaps with a number of other talks, but contains material not found anywhere else. Audio and video are available at http://longnow.org/seminars/02012/sep/05/birth-global-mind/
Localized methods for diffusions in large graphsDavid Gleich
I describe a few ongoing research projects on diffusions in large graphs and how we can create efficient matrix computations in order to determine them efficiently.
This document provides resources for internet safety for parents. It lists websites that provide information on internet safety laws, monitoring children's social media use, identifying personal information available online, reviewing virtual worlds and social networks, reports on cyberbullying, and additional resources from organizations like the FBI and PBS on parenting in the digital age.
What Android Can Learn from Steve JobsTim O'Reilly
A meditation on Jobs' quote that design is an expression of the "fundamental soul" of a human creation. What is the fundamental soul of Google, and how should it be reflected in Android?
This document provides a summary of funding opportunities, events, and other resources for UK creative, digital and design businesses. It includes information on upcoming events, public funding calls from Innovate UK and other sources, private financing options, and support for launching or growing a business. The document acts as a monthly digest of useful information for UK creative businesses seeking funding and support.
The roadtrip that led to my first rails commit and how you could make yours tooMohnish Jadwani
This document summarizes the author's experience making their first commit to the Rails codebase. It describes the requirement to add a custom rake task, researching the solution by reading blogs and Rails Guides. The author then forks the Rails repository, makes the code changes in a new branch, writes a detailed commit message, and submits a pull request. The pull request was quickly merged and the author shares lessons learned around approaching the community and starting small contributions.
This document discusses social networks and professionalism. It defines social networks and provides examples of popular social networks like Facebook, Twitter, LinkedIn and Flickr. It discusses privacy issues related to social networks and provides tips for using social networks professionally. Some key tips include maintaining separate private and professional profiles, avoiding posting unprofessional content, and using privacy settings to control who can view your information. The document also provides resources on privacy settings for different social networks.
A presentation by SMART Infrastructure Facility's Geomatics Research Fellow, Dr Tomas Holderness, and Vice Chancellor's Post Doctoral Research Fellow, Dr Etienne Turpin, to the International Symposium For Next Generation Infrastructure (ISNGI), Vienna September 2014.
معماری مبتنی بر سرویس، اصول و اجزا
این مبحث یکی از فناوری هایی است که در درس مهندسی فناوری اطلاعات 2 برای دانشجویان مهندسی فناوری اطلاعات ارائه می دهم.
Comment le picture marketing permet de développer ses ventes en ligne et en b...Emilie Marquois
Avec l’explosion des usages et applications mobiles, découvrez les notions et outils clés pour faire la différence grâce à la communication par l’image.
The Clothesline Paradox and the Sharing Economy (Keynote file)Tim O'Reilly
My keynote at OSCON 2012 in Portland, July 18, 2012. Focuses on the contribution of open source software to the economy, using the metaphor of "the clothesline paradox" first articulated by Steve Baer in CoEvolution Quarterly in 1975
My talk at the Stanford Technology Ventures Program on March 6, 2013. I talk about some technical and business lessons from Square, Uber, AirBnB, and the Google Autonomous Vehicle that are applicable to today's startups.
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...OSCON Byrum
This document discusses the implications of cloud computing and data privacy in light of the US Patriot Act. It notes that the Patriot Act allows US authorities broad surveillance powers over data, including that of foreign nationals and companies, even if the data resides outside the US. This creates risks for cloud computing as data may be located in multiple jurisdictions and users have little visibility into or control over data location. The document recommends that companies take a risk-based approach and consider a private cloud or hybrid cloud model to better protect sensitive data and maintain accountability over any legal requests for information.
Este documento discute la importancia de la colaboración en las empresas y su impacto en el CRM. Explica que la colaboración requiere escuchar a los clientes, crear comunidades, analizar conversaciones, y administrar campañas de manera colaborativa. También cubre tendencias como el uso de redes sociales, wikis y aplicaciones móviles para mejorar la colaboración. Finalmente, proporciona pasos para comenzar a colaborar internamente, incluyendo educarse, validar la cultura, establecer metas y planes
Columbia Law School - Decentralized Ledgers Presentation on 4/7/2014Ldger, Inc
Principals from Tillit explore the business and legal implications of advanced blockchain technologies, including smart contracts, digital stored value and the concepts of "code as law"
This document discusses technical debt and strategies for managing and selling technical debt rearchitecture projects. It defines technical debt as work that is postponed to a later time, such as lack of testing or architecture planning. While some debt can be useful for time to market goals, ignoring debt accumulation can slow a project over time. The document provides examples of technical debt elements to examine in a codebase and recommends conducting due diligence to understand existing debt. It also presents stories and metaphors to help explain the risks of debt to business stakeholders and the value of rearchitecture projects when needed.
This document discusses the evolution of computing architectures and data processing techniques over time. As data grew larger than what could fit on a single computer, distributed systems and topologies like Hadoop emerged. This led to a shift from traditional data modeling to algorithmic modeling using machine learning. The rise of big data, IoT, and complex analytics is now disrupting businesses by enabling new, automated data products and feedback loops. This presents opportunities for companies in various industries to optimize operations using data science.
Mesosphere lightening talk presented at the first Mesos Townhall Meeting 2013-11-19 https://www.eventbrite.com/e/mesostownhall-meeting-1119-tickets-9104464699
Enterprise Data Workflows with CascadingPaco Nathan
Cascading meetup held jointly with Enterprise Big Data meetup at Tata Consultancy Services in Santa Clara on 2012-12-17
http://www.meetup.com/cascading/events/94079162/
A Data Scientist And A Log File Walk Into A Bar...Paco Nathan
Presented at Splunk .conf 2012 in Las Vegas. Includes an overview of the Cascading app based on City of Palo Alto open data. PS: email me if you need a different format than Keynote: @pacoid or pnathan AT concurrentinc DOT com
Functional programming for optimization problems in Big DataPaco Nathan
Functional programming techniques can help optimize problems in big data. In 1997, four independent teams worked on horizontally scaling workflows using commodity hardware, enabling major internet successes. This led to the emergence of MapReduce and Apache Hadoop, which are still used today. Functional programming allows breaking problems into independent pieces that can run in parallel.
Intro to Data Science for Enterprise Big DataPaco Nathan
If you need a different format (PDF, PPT) instead of Keynote, please email me: pnathan AT concurrentinc DOT com
An overview of Data Science for Enterprise Big Data. In other words, how to combine structured and unstructured data, leveraging the tools of automation and mathematics, for highly scalable businesses. We discuss management strategy for building Data Science teams, basic requirements of the "science" in Data Science, and typical data access patterns for working with Big Data. We review some great algorithms, tools, and truisms for building a Data Science practice, and provide plus some great references to read for further study.
Presented initially at the Enterprise Big Data meetup at Tata Consultancy Services, Santa Clara, 2012-08-20 http://www.meetup.com/Enterprise-Big-Data/events/77635202/
This document discusses the marketing funnel abstraction and workflows for processing large-scale clickstream data. It describes using a marketing funnel model to analyze customer behavior and calculate metrics like cost per acquisition. The document outlines some of the complexities in working with real-world clickstream data at large scales. It then provides a historical example of building a Hadoop application in 2008 to process billions of events for an online advertising company. This highlighted needs for improved workflow abstractions. The Cascading open source project is introduced as addressing some of these needs.
The document discusses 10 things learned from implementing OpenStack. It covers topics like cloud geography, industry and technology diversity in OpenStack implementations, hybrid cloud models focusing on storage, continuous vs staged integration, the duality of OpenStack storage, diversity of development and operations models, distributed vs centralized control, using erasure coding vs RAID for resiliency, and the idea of a shared service proposal.
1. The document discusses 10 things learned from implementing OpenStack including cloud geography, industry diversity, and technology diversity.
2. It explores the variety of consumption models for OpenStack including rack appliances, controllers, and software instances.
3. Integration approaches are discussed ranging from continuous integration to staged integration for different environments like surgery, air traffic control, or military systems.
The document describes how to perform various text analytics workflows like word count, stop word filtering, TF-IDF using Cascading from ingesting documents to deployment on Amazon EMR. It shows the code required at each step and how adding features like testing and checkpoints only adds a few extra lines of code while allowing the workflow to run on datasets of any scale.
This document discusses web standards and the state of the mobile web in May 2012. It covers key technologies like HTML, CSS, JavaScript, graphics standards, offline access APIs and device access APIs. It also mentions trends in the mobile browser market in China, the role of audio/video and real-time communications. Overall trends covered include the growing importance of the mobile web, new APIs and what may come next in web standards development.
IT-as-a-Service: Cloud Computing and the Evolving Role of Enterprise ITBob Rhubart
The document discusses Oracle Enterprise Architecture and IT-as-a-Service. It provides the NIST definition of cloud computing which describes cloud models as enabling on-demand access to shared configurable computing resources that can be rapidly provisioned. The definition notes the essential characteristics of cloud computing include on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service. It also lists the common cloud service models of SaaS, PaaS, and IaaS and deployment models of public cloud, private cloud, community cloud, and hybrid cloud. Finally, it outlines how cloud computing impacts different layers of IT architecture from a business, application, information, and technology perspective.
This corporate briefing document discusses Nuxeo, an open source enterprise content management software vendor. [1] It provides key statistics on Nuxeo such as its founding in 2000, 35 employees worldwide, and annual turnover of €3.5 million in 2007. [2] The document outlines Nuxeo's extensible and customizable platform, prestigious customers, and partners who provide joint offerings and integrate Nuxeo into mission-critical projects. [3] It concludes by discussing Nuxeo's business model of subscription offerings, homogeneous platform designed for integrators, and ecosystem of users, partners, and ISVs who contribute to innovation and distribution.
This document discusses integrating operational technology (OT) systems with information technology (IT) systems. It presents messaging patterns for high-performance data exchange between OT and IT. These include publish-subscribe, request-reply, and guaranteed delivery patterns. It also discusses integration patterns like message translation, content filtering, splitting/aggregating, and choreography mediated by an integration bus. The bus supports various protocols and provides adaptation, transformation, and governance capabilities to connect heterogeneous systems in a system-of-systems. A July 2012 release of the integration platform will provide these communication and integration capabilities.
Hadoop is used at Salesforce for several big data use cases including product metrics, user behavior analysis, capacity planning, and collaborative filtering. For product metrics, Hadoop collects and analyzes log data from over 130,000 customers to track feature usage, standard metrics, and metrics across channels. It generates reports and dashboards to provide insights to executives and product managers.
Human in the loop: a design pattern for managing teams working with MLPaco Nathan
Strata CA 2018-03-08
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/64223
Although it has long been used for has been used for use cases like simulation, training, and UX mockups, human-in-the-loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. One approach, active learning (a special case of semi-supervised learning), employs mostly automated processes based on machine learning models, but exceptions are referred to human experts, whose decisions help improve new iterations of the models.
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
Strata Singapore 2017 session talk 2017-12-06
https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/65611
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called active learning allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We’ll consider some of the technical aspects — including available open source projects — as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn’t it applicable?
* How do HITL approaches compare/contrast with more “typical” use of Big Data?
* What’s the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time:
* In what ways do the humans involved learn from the machines?
* In particular, we’ll examine use cases at O’Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
Human-in-a-loop: a design pattern for managing teams which leverage ML
Big Data Spain, 2017-11-16
https://www.bigdataspain.org/2017/talk/human-in-the-loop-a-design-pattern-for-managing-teams-which-leverage-ml
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called _active learning_ allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.
This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We'll consider some of the technical aspects -- including available open source projects -- as well as management perspectives for how to apply HITL:
* When is HITL indicated vs. when isn't it applicable?
* How do HITL approaches compare/contrast with more "typical" use of Big Data?
* What's the relationship between use of HITL and preparing an organization to leverage Deep Learning?
* Experiences training and managing a team which uses HITL at scale
* Caveats to know ahead of time
* In what ways do the humans involved learn from the machines?
In particular, we'll examine use cases at O'Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter](https://jupyter.org/ for implementation).
Humans in a loop: Jupyter notebooks as a front-end for AIPaco Nathan
JupyterCon NY 2017-08-24
https://www.safaribooksonline.com/library/view/jupytercon-2017-/9781491985311/video313210.html
Paco Nathan reviews use cases where Jupyter provides a front-end to AI as the means for keeping "humans in the loop". This talk introduces *active learning* and the "human-in-the-loop" design pattern for managing how people and machines collaborate in AI workflows, including several case studies.
The talk also explores how O'Reilly Media leverages AI in Media, and in particular some of our use cases for active learning such as disambiguation in content discovery. We're using Jupyter as a way to manage active learning ML pipelines, where the machines generally run automated until they hit an edge case and refer the judgement back to human experts. In turn, the experts training the ML pipelines purely through examples, not feature engineering, model parameters, etc.
Jupyter notebooks serve as one part configuration file, one part data sample, one part structured log, one part data visualization tool. O'Reilly has released an open source project on GitHub called `nbtransom` which builds atop `nbformat` and `pandas` for our active learning use cases.
This work anticipates upcoming work on collaborative documents in JupyterLab, based on Google Drive. In other words, where the machines and people are collaborators on shared documents.
Humans in the loop: AI in open source and industryPaco Nathan
Nike Tech Talk, Portland, 2017-08-10
https://niketechtalks-aug2017.splashthat.com/
O'Reilly Media gets to see the forefront of trends in artificial intelligence: what the leading teams are working on, which use cases are getting the most traction, previews of advances before they get announced on stage. Through conferences, publishing, and training programs, we've been assembling resources for anyone who wants to learn. An excellent recent example: Generative Adversarial Networks for Beginners, by Jon Bruner.
This talk covers current trends in AI, industry use cases, and recent highlights from the AI Conf series presented by O'Reilly and Intel, plus related materials from Safari learning platform, Strata Data, Data Show, and the upcoming JupyterCon.
Along with reporting, we're leveraging AI in Media. This talk dives into O'Reilly uses of deep learning -- combined with ontology, graph algorithms, probabilistic data structures, and even some evolutionary software -- to help editors and customers alike accomplish more of what they need to do.
In particular, we'll show two open source projects in Python from O'Reilly's AI team:
• pytextrank built atop spaCy, NetworkX, datasketch, providing graph algorithms for advanced NLP and text analytics
• nbtransom leveraging Project Jupyter for a human-in-the-loop design pattern approach to AI work: people and machines collaborating on content annotation
Lessons learned from 3 (going on 4) generations of Jupyter use cases at O'Reilly Media. In particular, about "Oriole" tutorials which combine video with Jupyter notebooks, Docker containers, backed by services managed on a cluster by Marathon, Mesos, Redis, and Nginx.
https://conferences.oreilly.com/fluent/fl-ca/public/schedule/detail/62859
https://conferences.oreilly.com/velocity/vl-ca/public/schedule/detail/62858
O'Reilly Media has experimented with different uses of Jupyter notebooks in their publications and learning platforms. Their latest approach embeds notebooks with video narratives in online "Oriole" tutorials, allowing authors to create interactive, computable content. This new medium blends code, data, text, and video into narrated learning experiences that run in isolated Docker containers for higher engagement. Some best practices for using notebooks in teaching include focusing on concise concepts, chunking content, and alternating between text, code, and outputs to keep explanations clear and linear.
See 2020 update: https://derwen.ai/s/h88s
SF Python Meetup, 2017-02-08
https://www.meetup.com/sfpython/events/237153246/
PyTextRank is a pure Python open source implementation of *TextRank*, based on the [Mihalcea 2004 paper](http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) -- a graph algorithm which produces ranked keyphrases from texts. Keyphrases generally more useful than simple keyword extraction. PyTextRank integrates use of `TextBlob` and `SpaCy` for NLP analysis of texts, including full parse, named entity extraction, etc. It also produces auto-summarization of texts, making use of an approximation algorithm, `MinHash`, for better performance at scale. Overall, the package is intended to complement machine learning approaches -- specifically deep learning used for custom search and recommendations -- by developing better feature vectors from raw texts. This package is in production use at O'Reilly Media for text analytics.
Use of standards and related issues in predictive analyticsPaco Nathan
My presentation at KDD 2016 in SF, in the "Special Session on Standards in Predictive Analytics In the Era of Big and Fast Data" morning track about PMML and PFA http://dmg.org/kdd2016.html
The document discusses how data science may reinvent learning and education. It begins with background on the author's experience in data teams and teaching. It then questions what an "Uber for education" may look like and discusses definitions of learning, education, and schools. The author argues interactive notebooks like Project Jupyter and flipped classrooms can improve learning at scale compared to traditional lectures or MOOCs. Content toolchains combining Jupyter, Thebe, Atlas and Docker are proposed for authoring and sharing computational narratives and code-as-media.
Jupyter for Education: Beyond Gutenberg and ErasmusPaco Nathan
O'Reilly Learning is focusing on evolving learning experiences using Jupyter notebooks. Jupyter notebooks allow combining code, outputs, and explanations in a single document. O'Reilly is using Jupyter notebooks as a new authoring environment and is exploring features like computational narratives, code as a medium for teaching, and interactive online learning environments. The goal is to provide a better learning architecture and content workflow that leverages the capabilities of Jupyter notebooks.
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
http://www.meetup.com/Seattle-Data-Science/events/223445403/
Almost a dozen almost-truisms about Data that almost everyone should consider carefully as they embark on a journey into Data Science. There are a number of preconceptions about working with data at scale where the realities beg to differ. This talk estimates that number to be at least eleven, through probably much larger. At least that number has a great line from a movie. Let's consider some of the less-intuitive directions in which this field is heading, along with likely consequences and corollaries -- especially for those who are just now beginning to study about the technologies, the processes, and the people involved.
Microservices, containers, and machine learningPaco Nathan
http://www.oscon.com/open-source-2015/public/schedule/detail/41579
In this presentation, an open source developer community considers itself algorithmically. This shows how to surface data insights from the developer email forums for just about any Apache open source project. It leverages advanced techniques for natural language processing, machine learning, graph algorithms, time series analysis, etc. As an example, we use data from the Apache Spark email list archives to help understand its community better; however, the code can be applied to many other communities.
Exsto is an open source project that demonstrates Apache Spark workflow examples for SQL-based ETL (Spark SQL), machine learning (MLlib), and graph algorithms (GraphX). It surfaces insights about developer communities from their email forums. Natural language processing services in Python (based on NLTK, TextBlob, WordNet, etc.), gets containerized and used to crawl and parse email archives. These produce JSON data sets, then we run machine learning on a Spark cluster to find out insights such as:
* What are the trending topic summaries?
* Who are the leaders in the community for various topics?
* Who discusses most frequently with whom?
This talk shows how to use cloud-based notebooks for organizing and running the analytics and visualizations. It reviews the background for how and why the graph analytics and machine learning algorithms generalize patterns within the data — based on open source implementations for two advanced approaches, Word2Vec and TextRank The talk also illustrates best practices for leveraging functional programming for big data.
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
The document provides an overview of Graph Analytics in Spark. It discusses Spark components and key distinctions from MapReduce. It also covers GraphX terminology and examples of composing node and edge RDDs into a graph. The document provides examples of simple traversals and routing problems on graphs. It discusses using GraphX for topic modeling with LDA and provides further reading resources on GraphX, algebraic graph theory, and graph analysis tools and frameworks.
Graph analytics can be used to analyze a social graph constructed from email messages on the Spark user mailing list. Key metrics like PageRank, in-degrees, and strongly connected components can be computed using the GraphX API in Spark. For example, PageRank was computed on the 4Q2014 email graph, identifying the top contributors to the mailing list.
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
The document discusses Apache Spark and its role in big data and emerging technologies for big data. It provides background on MapReduce and the emergence of specialized systems. It then discusses how Spark provides a unified engine for batch processing, iterative jobs, SQL queries, streaming, and more. It can simplify programming by using a functional approach. The document also discusses Spark's architecture and performance advantages over other frameworks.
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
The document provides an overview of real-time analytics using Spark Streaming. It discusses Spark Streaming's micro-batch approach of treating streaming data as a series of small batch jobs. This allows for low-latency analysis while integrating streaming and batch processing. The document also covers Spark Streaming's fault tolerance mechanisms and provides several examples of companies like Pearson, Guavus, and Sharethrough using Spark Streaming for real-time analytics in production environments.
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
Spark and Databricks component of the O'Reilly Media webcast "2015 Data Preview: Spark, Data Visualization, YARN, and More", as a preview of the 2015 Strata + Hadoop World conference in San Jose http://www.oreilly.com/pub/e/3289
A New Year in Data Science: ML UnpausedPaco Nathan
This document summarizes Paco Nathan's presentation at Data Day Texas in 2015. Some key points:
- Paco Nathan discussed observations and trends from the past year in machine learning, data science, big data, and open source technologies.
- He argued that the definitions of data science and statistics are flawed and ignore important areas like development, visualization, and modeling real-world business problems.
- The presentation covered topics like functional programming approaches, streaming approximations, and the importance of an interdisciplinary approach combining computer science, statistics, and other fields like physics.
- Paco Nathan advocated for newer probabilistic techniques for analyzing large datasets that provide approximations using less resources compared to traditional batch processing approaches.
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPathCommunity
Welcome to our third live UiPath Community Day Amsterdam! Come join us for a half-day of networking and UiPath Platform deep-dives, for devs and non-devs alike, in the middle of summer ☀.
📕 Agenda:
12:30 Welcome Coffee/Light Lunch ☕
13:00 Event opening speech
Ebert Knol, Managing Partner, Tacstone Technology
Jonathan Smith, UiPath MVP, RPA Lead, Ciphix
Cristina Vidu, Senior Marketing Manager, UiPath Community EMEA
Dion Mes, Principal Sales Engineer, UiPath
13:15 ASML: RPA as Tactical Automation
Tactical robotic process automation for solving short-term challenges, while establishing standard and re-usable interfaces that fit IT's long-term goals and objectives.
Yannic Suurmeijer, System Architect, ASML
13:30 PostNL: an insight into RPA at PostNL
Showcasing the solutions our automations have provided, the challenges we’ve faced, and the best practices we’ve developed to support our logistics operations.
Leonard Renne, RPA Developer, PostNL
13:45 Break (30')
14:15 Breakout Sessions: Round 1
Modern Document Understanding in the cloud platform: AI-driven UiPath Document Understanding
Mike Bos, Senior Automation Developer, Tacstone Technology
Process Orchestration: scale up and have your Robots work in harmony
Jon Smith, UiPath MVP, RPA Lead, Ciphix
UiPath Integration Service: connect applications, leverage prebuilt connectors, and set up customer connectors
Johans Brink, CTO, MvR digital workforce
15:00 Breakout Sessions: Round 2
Automation, and GenAI: practical use cases for value generation
Thomas Janssen, UiPath MVP, Senior Automation Developer, Automation Heroes
Human in the Loop/Action Center
Dion Mes, Principal Sales Engineer @UiPath
Improving development with coded workflows
Idris Janszen, Technical Consultant, Ilionx
15:45 End remarks
16:00 Community fun games, sharing knowledge, drinks, and bites 🍻
Cracking AI Black Box - Strategies for Customer-centric Enterprise ExcellenceQuentin Reul
The democratization of Generative AI is ushering in a new era of innovation for enterprises. Discover how you can harness this powerful technology to deliver unparalleled customer value and securing a formidable competitive advantage in today's competitive market. In this session, you will learn how to:
- Identify high-impact customer needs with precision
- Harness the power of large language models to address specific customer needs effectively
- Implement AI responsibly to build trust and foster strong customer relationships
Whether you're at the early stages of your AI journey or looking to optimize existing initiatives, this session will provide you with actionable insights and strategies needed to leverage AI as a powerful catalyst for customer-driven enterprise success.
Finetuning GenAI For Hacking and DefendingPriyanka Aash
Generative AI, particularly through the lens of large language models (LLMs), represents a transformative leap in artificial intelligence. With advancements that have fundamentally altered our approach to AI, understanding and leveraging these technologies is crucial for innovators and practitioners alike. This comprehensive exploration delves into the intricacies of GenAI, from its foundational principles and historical evolution to its practical applications in security and beyond.
Keynote : Presentation on SASE TechnologyPriyanka Aash
Secure Access Service Edge (SASE) solutions are revolutionizing enterprise networks by integrating SD-WAN with comprehensive security services. Traditionally, enterprises managed multiple point solutions for network and security needs, leading to complexity and resource-intensive operations. SASE, as defined by Gartner, consolidates these functions into a unified cloud-based service, offering SD-WAN capabilities alongside advanced security features like secure web gateways, CASB, and remote browser isolation. This convergence not only simplifies management but also enhances security posture and application performance across global networks and cloud environments. Discover how adopting SASE can streamline operations and fortify your enterprise's digital transformation strategy.
Top 12 AI Technology Trends For 2024.pdfMarrie Morris
Technology has become an irreplaceable component of our daily lives. The role of AI in technology revolutionizes our lives for the betterment of the future. In this article, we will learn about the top 12 AI technology trends for 2024.
Keynote : AI & Future Of Offensive SecurityPriyanka Aash
In the presentation, the focus is on the transformative impact of artificial intelligence (AI) in cybersecurity, particularly in the context of malware generation and adversarial attacks. AI promises to revolutionize the field by enabling scalable solutions to historically challenging problems such as continuous threat simulation, autonomous attack path generation, and the creation of sophisticated attack payloads. The discussions underscore how AI-powered tools like AI-based penetration testing can outpace traditional methods, enhancing security posture by efficiently identifying and mitigating vulnerabilities across complex attack surfaces. The use of AI in red teaming further amplifies these capabilities, allowing organizations to validate security controls effectively against diverse adversarial scenarios. These advancements not only streamline testing processes but also bolster defense strategies, ensuring readiness against evolving cyber threats.
The History of Embeddings & Multimodal EmbeddingsZilliz
Frank Liu will walk through the history of embeddings and how we got to the cool embedding models used today. He'll end with a demo on how multimodal RAG is used.
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc
In a landmark year marked by significant AI advancements, it’s vital to prioritize transparency, accountability, and respect for privacy rights with your AI innovation.
Learn how to navigate the shifting AI landscape with our innovative solution TRUSTe Responsible AI Certification, the first AI certification designed for data protection and privacy. Crafted by a team with 10,000+ privacy certifications issued, this framework integrated industry standards and laws for responsible AI governance.
This webinar will review:
- How compliance can play a role in the development and deployment of AI systems
- How to model trust and transparency across products and services
- How to save time and work smarter in understanding regulatory obligations, including AI
- How to operationalize and deploy AI governance best practices in your organization
Increase Quality with User Access Policies - July 2024Peter Caitens
⭐️ Increase Quality with User Access Policies ⭐️, presented by Peter Caitens and Adam Best of Salesforce. View the slides from this session to hear all about “User Access Policies” and how they can help you onboard users faster with greater quality.
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...Fwdays
.NET 8 brought a lot of improvements for developers and maturity to the Azure serverless container ecosystem. So, this talk will cover these changes and explain how you can apply them to your projects. Another reason for this talk is the re-invention of Serverless from a DevOps perspective as a Platform Engineering trend with Backstage and the recent Radius project from Microsoft. So now is the perfect time to look at developer productivity tooling and serverless apps from Microsoft's perspective.
Demystifying Neural Networks And Building Cybersecurity ApplicationsPriyanka Aash
In today's rapidly evolving technological landscape, Artificial Neural Networks (ANNs) have emerged as a cornerstone of artificial intelligence, revolutionizing various fields including cybersecurity. Inspired by the intricacies of the human brain, ANNs have a rich history and a complex structure that enables them to learn and make decisions. This blog aims to unravel the mysteries of neural networks, explore their mathematical foundations, and demonstrate their practical applications, particularly in building robust malware detection systems using Convolutional Neural Networks (CNNs).
Demystifying Neural Networks And Building Cybersecurity Applications
Cascading meetup #4 @ BlueKai
1. Cascading Meetup #4
BlueKai
Cupertino, CA
2013-03-05
Copyright @2013, Concurrent, Inc.
Tuesday, 05 March 13 1
2. Cascading Meetup
Document
Collection
Scrub
Tokenize
token
M
HashJoin Regex
Left token
GroupBy R
Stop Word token
List
RHS
Count
Word
Count
1. Enterprise Data Workflows
2. ANSI SQL Support
3. Test-Driven Development
Tuesday, 05 March 13 2
3. Enterprise Data Workflows
Customers
Let’s consider an example app…
at the front end Web
App
LOB use cases drive demand for apps
logs Cache
logs
Logs
Support
source
trap sink
tap
tap tap
Data
Modeling PMML
Workflow
source
sink
tap
tap
Analytics
Cubes customer
Customer
profile DBs
Prefs
Hadoop
Cluster
Reporting
Tuesday, 05 March 13 3
LOB use cases drive the demand for Big Data apps
4. Enterprise Data Workflows
Customers
An example… in the back office
Organizations have substantial investments Web
App
in people, infrastructure, process
logs Cache
logs
Logs
Support
source
trap sink
tap
tap tap
Data
Modeling PMML
Workflow
source
sink
tap
tap
Analytics
Cubes customer
Customer
profile DBs
Prefs
Hadoop
Cluster
Reporting
Tuesday, 05 March 13 4
Enterprise organizations have seriously ginormous investments in existing back office practices:
people, infrastructure, processes
5. Enterprise Data Workflows
Customers
An example… for the heavy lifting!
“Main Street” firms are migrating Web
App
workflows to Hadoop, for cost
savings and scale-out
logs Cache
logs
Logs
Support
source
trap sink
tap
tap tap
Data
Modeling PMML
Workflow
source
sink
tap
tap
Analytics
Cubes customer
Customer
profile DBs
Prefs
Hadoop
Cluster
Reporting
Tuesday, 05 March 13 5
“Main Street” firms have invested in Hadoop to address Big Data needs,
off-setting their rising costs for Enterprise licenses from SAS, Teradata, etc.
6. Two Avenues…
Enterprise: must contend with
complexity at scale everyday…
incumbents extend current practices and
infrastructure investments – using J2EE,
complexity ➞
ANSI SQL, SAS, etc. – to migrate
workflows onto Apache Hadoop while
leveraging existing staff
Start-ups: crave complexity and
scale to become viable…
new ventures move into Enterprise space
to compete using relatively lean staff,
while leveraging sophisticated engineering
practices, e.g., Cascalog and Scalding
scale ➞
Tuesday, 05 March 13 6
Enterprise data workflows are observed in two modes: start-ups approaching complexity and incumbent firms grappling with complexity
7. Two Avenues…
Enterprise: must contend with
complexity at scale everyday…
incumbents extend current practices and
infrastructure investments – using J2EE,
complexity ➞
ANSI SQL, SAS, etc. – to migrate
workflows onto Apache Hadoop while
leveraging existing staff
Hadoop almost never gets used
in isolation; data workflows define
Start-ups: crave complexity and
scale to become viable… the “glue” required for system
new ventures move into Enterprise space of Enterprise apps
integration
to compete using relatively lean staff,
while leveraging sophisticated engineering
practices, e.g., Cascalog and Scalding
scale ➞
Tuesday, 05 March 13 7
Hadoop is almost never used in isolation.
Enterprise data workflows are about system integration.
There are a couple different ways to arrive at the party.
8. Cascading Meetup
Document
Collection
Scrub
Tokenize
token
M
HashJoin Regex
Left token
GroupBy R
Stop Word token
List
RHS
Count
Word
Count
1. Enterprise Data Workflows
2. ANSI SQL Support
3. Test-Driven Development
Tuesday, 05 March 13 8
9. Cascading workflows – ANSI SQL
• collab with Optiq – industry-proven code base
Customers
• ANSI SQL parser/optimizer atop Cascading
flow planner Web
App
• JDBC driver to integrate into existing
tools and app servers logs
logs Cache
Logs
• relational catalog over a collection Support
source
of unstructured data trap
tap
tap sink
tap
• SQL shell prompt to run queries Modeling PMML
Data
Workflow
source
sink
tap
tap
Analytics
Cubes customer
Customer
profile DBs
Prefs
Hadoop
Cluster
Reporting
Tuesday, 05 March 13 9
ANSI SQL as “machine code” -- the lingua franca of Enterprise system integration.
Cascading partnered with Optiq, the team behind Mondrian, etc., with an Enterprise-proven code base for an ANSI SQL parser/optimizer.
10. Cascading workflows – ANSI SQL
• collab with Optiq – industry-proven code base
Customers
• ANSI SQL parser/optimizer atop Cascading
flow planner Web
App
• JDBC driver to integrate into existing
tools and app servers logs
logs Cache
Premise: most SQL in the world gets Logs
• relational catalog over a collection Support
of unstructured datawritten by machines… trap
tap
source
tap sink
tap
• SQL shell prompt to run isn’t a database; this is about making
This queries Modeling PMML
Data
Workflow
machine-to-machine communications sink
tap
source
tap
simpler and more robust at scale.
Analytics
Cubes customer
Customer
profile DBs
Prefs
Hadoop
Cluster
Reporting
Tuesday, 05 March 13 10
ANSI SQL as “machine code” -- the lingua franca of Enterprise system integration.
Cascading partnered with Optiq, the team behind Mondrian, etc., with an Enterprise-proven code base for an ANSI SQL parser/optimizer.
11. Cascading workflows – ANSI SQL
• enable analysts without retraining
on Hadoop, etc. Customers
• transparency for Support, Ops, Web
App
Finance, et al.
logs Cache
logs
Logs
Support
source
trap sink
tap
tap tap
Data
a language for queries – not a database, Modeling PMML
Workflow
but ANSI SQL as a DSL for workflows sink
tap
source
tap
Analytics
Cubes customer
Customer
profile DBs
Prefs
Hadoop
Cluster
Reporting
Tuesday, 05 March 13 11
ANSI SQL as “machine code” -- the lingua franca of Enterprise system integration.
Cascading partnered with Optiq, the team behind Mondrian, etc., with an Enterprise-proven code base for an ANSI SQL parser/optimizer.
12. ANSI SQL – reviews
Open Source 'Lingual' Helps SQL Devs Unlock Hadoop
Thor Olavsrud, 2013-02-22
cio.com/article/729283/Open_Source_Lingual_Helps_SQL_Devs_Unlock_Hadoop
Hadoop Apps Without MapReduce Mindsets
Adrian Bridgwater, 2013-02-28
drdobbs.com/open-source/hadoop-apps-without-mapreduce-mindsets/240149708
Concurrent gives old SQL users new Hadoop tricks
Jack Clark, 2013-02-20
theregister.co.uk/2013/02/20/hadoop_sql_translator_lingual_launches/
Concurrent Open Source Project Ties SQL to Hadoop
Michael Vizard, 2013-02-21
itbusinessedge.com/blogs/it-unmasked/concurrent-open-source-project-ties-sql-to-hadoop.html
Concurrent Releases Lingual, a SQL DSL for Hadoop
Boris Lublinsky, 2013-02-28
infoq.com/news/2013/02/Lingual
Tuesday, 05 March 13 12
13. ANSI SQL – CSV data in local file system
cascading.org/lingual
Tuesday, 05 March 13 13
The test database for MySQL is available for download from https://launchpad.net/test-db/
Here we have a bunch o’ CSV flat files in a directory in the local file system.
Use the “lingual” command line interface to overlay DDL to describe the expected table schema.
14. ANSI SQL – shell prompt, catalog
cascading.org/lingual
Tuesday, 05 March 13 14
Use the “lingual” SQL shell prompt to run SQL queries interactively, show catalog, etc.
15. ANSI SQL – queries
cascading.org/lingual
Tuesday, 05 March 13 15
Here’s an example SQL query on that “employee” test database from MySQL.
16. ANSI SQL – layers
abstraction RDBMS JVM Cluster
parser ANSI SQL ANSI SQL
compliant parser compliant parser
optimizer logical plan, logical plan,
optimized based on stats optimized based on stats
planner physical plan API “plumbing”
machine query history, app history,
data table stats tuple stats
topology b-trees, etc. heterogenous, distributed:
Hadoop, IMDG, etc.
visualization ERD flow diagram
schema table schema tuple schema
catalog relational catalog tap usage DB
provenance (manual audit) data set
producers/consumers
Tuesday, 05 March 13 16
When you peel back the onion skin on a SQL query, each of the abstraction layers used in an RDBMS has an analogue (or better) in the context of Enterprise Data Workflows running on JVM clusters
17. ANSI SQL – JDBC driver
public void run() throws ClassNotFoundException, SQLException {
Class.forName( "cascading.lingual.jdbc.Driver" );
Connection connection =
DriverManager.getConnection( "jdbc:lingual:local;schemas=src/main/resources/data/example" );
Statement statement = connection.createStatement();
ResultSet resultSet = statement.executeQuery(
"select *n"
+ "from "EXAMPLE"."SALES_FACT_1997" as sn"
+ "join "EXAMPLE"."EMPLOYEE" as en"
+ "on e."EMPID" = s."CUST_ID"" );
while( resultSet.next() ) {
int n = resultSet.getMetaData().getColumnCount();
StringBuilder builder = new StringBuilder();
for( int i = 1; i <= n; i++ ) {
builder.append( ( i > 1 ? "; " : "" )
+ resultSet.getMetaData().getColumnLabel( i ) + "=" + resultSet.getObject( i ) );
}
System.out.println( builder );
}
resultSet.close();
statement.close();
connection.close();
}
Tuesday, 05 March 13 17
Note that in this example the schema for the DDL has been derived directly from the CSV files.
In other words, point the JDBC connection at a directory of flat files and query as if they were already loaded into SQL.
18. ANSI SQL – JDBC driver
$ gradle clean jar
$ hadoop jar build/libs/lingual-examples–1.0.0-wip-dev.jar
CUST_ID=100; PROD_ID=10; EMPID=100; NAME=Bill
CUST_ID=150; PROD_ID=20; EMPID=150; NAME=Sebastian
Caveat: if you absolutely positively must have sub-second
SQL query response for Pb-scale data on a 1000+ node
cluster… Good luck with that! (call the MPP vendors)
This ANSI SQL library is primarily intended for batch
workflows – high throughput, not low-latency –
for many under-represented use cases in Enterprise IT.
It’s essentially ANSI SQL as a DSL.
Tuesday, 05 March 13 18
success
19. Cascading Meetup
Document
Collection
Scrub
Tokenize
token
M
HashJoin Regex
Left token
GroupBy R
Stop Word token
List
RHS
Count
Word
Count
1. Enterprise Data Workflows
2. ANSI SQL Support
3. Test-Driven Development
Tuesday, 05 March 13 19
21. Test-Driven Development (TDD)
In terms of Big Data apps,TDD is not
generally part of the conversation
Tuesday, 05 March 13 21
TDD is not usually high on the list when people start discussing Big Data apps.
22. Traps – Cascading “exceptional data”
• assert patterns (regex) on the tuple streams
Customers
• adjust assert levels, like log4j levels
• define traps on branches Web
App
• tuples which fail asserts get trapped
logs Cache
logs
Logs
Support
source
trap sink
tap
tap tap
Data
Modeling PMML
Workflow
source
sink
tap
tap
Analytics
Cubes customer
Customer
profile DBs
Prefs
Hadoop
Cluster
Reporting
Tuesday, 05 March 13 22
An innovation in Cascading was to introduce the notion of a “data exception”,
based on setting stream assertion levels as part of the business logic of an app.
23. Traps – example code
// set up...
Pipe etlPipe = new Pipe( "etlPipe" );
// some processing...
AssertMatches assertMatches = new AssertMatches( ".*true" );
etlPipe = new Each( etlPipe, AssertionLevel.STRICT, assertMatches );
// some processing...
FlowDef flowDef = FlowDef.flowDef().setName( "etl" )
.addSource( etlPipe, jsonTap )
.addTrap( etlPipe, trapTap )
.addTailSink( etlPipe, cacheTap );
if( options.has( "assert" ) )
flowDef.setAssertionLevel( AssertionLevel.STRICT );
else
flowDef.setAssertionLevel( AssertionLevel.NONE );
Tuesday, 05 March 13 23
Example use in Cascading code
24. Traps – redirect exceptions in production
shunt the trapped exceptional data to other
parts of the organization: Customers
• Ops: notifications Web
App
• QA: investigate data anomalies
• Support: review customer records logs
logs
Logs
Cache
•
Finance: audit Support
source
trap sink
tap
tap tap
Data
Modeling PMML
Workflow
source
sink
tap
tap
Analytics
Cubes customer
Customer
profile DBs
Prefs
Hadoop
Cluster
Reporting
Tuesday, 05 March 13 24
25. TDD – practice at scale
1. assert expected patterns in raw input
2. run just that, to find edge cases
3. handle the edge cases for input data
4. assert expected patterns after first chunk of processing
5. run just that, to verify failure
6. code until test passes GIS Regex
tree
Scrub
export parse-tree species
7. repeat #4 for each chunk
M M
Estimate
Join Geohash
height
Regex
src
parse-gis
Tree Filter
tree
Metadata height
Failure M
Traps
Calculate Filter Sum
Join
distance distance moment Filter
sum_moment
Estimate R M R M
road
road
Regex
traffic
parse-road
shade
Estimate Road
Join
Albedo Segments
Geohash Join
M
R
Road
Metadata gps R
gps reco
logs
Count
Geohash Max
gps_count
recent_visit
M R
Tuesday, 05 March 13 25
26. TDD – Cascalog features
consider that TDD is about asserting and negating logical
predicates…
• Cascalog is based on logical predicates
• function definitions as composable subqueries
• functions are not particularly far from being unit tests
• Midje: facts, mocks
sritchie.github.com/2011/09/30/testing-cascalog-with-midje.html
sritchie.github.com/2012/01/22/cascalog-testing-20.html
Tuesday, 05 March 13 26
Moreover, the Cascalog language by Nathan Marz, Sam Ritchie, et al., nearly uses TDD as its methodology --
in the transition from ad-hoc queries as logic predicates, then composing those predicates into large-scale apps.
27. Cascading Meetup
Document
Collection
Scrub
Tokenize
token
M
HashJoin Regex
Left token
GroupBy R
Stop Word token
List
RHS
Count
Word
Count
1. Enterprise Data Workflows
2. ANSI SQL Support
3. Test-Driven Development
…plus, a proposal
Tuesday, 05 March 13 27
28. ANSI SQL – multiple flows
GIS Regex
tree
Scrub
export parse-tree species
M M
Estimate
Join Geohash
height
Regex
src
parse-gis
Tree Filter
tree
Metadata height
Failure M
Traps
Calculate Filter Sum
Join
distance distance moment Filter
sum_moment
Estimate R M R M
road
road
Regex
traffic
parse-road
shade
Estimate Road
Join
Albedo Segments
Geohash Join
M
R
Road
Metadata gps R
gps reco
logs
Count
Geohash Max
gps_count
recent_visit
M R
Suppose your organization is responsible
for an large-scale app…
Multiple teams develop reusable libraries…
Tuesday, 05 March 13 28
Suppose you have a app with a complex flow diagram like this, with contributions to the business logic from different departments…
29. ANSI SQL – multiple flows
GIS Regex
tree
Scrub
export parse-tree species
M M
Estimate
Join Geohash
height
Regex
src
parse-gis
Tree Filter
tree
Metadata height
Failure M
Traps
Calculate Filter Sum
Join
distance distance moment Filter
sum_moment
Estimate R M R M
road
road
Regex
traffic
parse-road
shade
Estimate Road
Join
Albedo Segments
Geohash Join
M
R
Road
Metadata gps R
gps reco
logs
Count
Geohash Max
gps_count
recent_visit
M R
Data Analysts: ANSI SQL queries
for data prep
(displaces Hive, etc.)
Tuesday, 05 March 13 29
Analysts are generally working with ANSI SQL queries in a DW, e.g., for ETL, data prep, pulling data cubes.
These can migrate into a Cascading app to run on Hadoop.
30. ANSI SQL – multiple flows
GIS Regex
tree
Scrub
export parse-tree species
M M
Estimate
Join Geohash
height
Regex
src
parse-gis
Tree Filter
tree
Metadata height
Failure M
Traps
Calculate Filter Sum
Join
distance distance moment Filter
sum_moment
Estimate R M R M
road
road
Regex
traffic
parse-road
shade
Estimate Road
Join
Albedo Segments
Geohash Join
M
R
Road
Metadata gps R
gps reco
logs
Count
Geohash Max
gps_count
recent_visit
M R
Server-side Engineering: HBase tap
for customer profiles
(integrating other components)
Tuesday, 05 March 13 30
Engineering provides integration with customer profiles, e.g., transactional data objects in HBase.
These can migrate into a Cascading app to run on Hadoop.
31. ANSI SQL – multiple flows
GIS Regex
tree
Scrub
export parse-tree species
M M
Estimate
Join Geohash
height
Regex
src
parse-gis
Tree Filter
tree
Metadata height
Failure M
Traps
Calculate Filter Sum
Join
distance distance moment Filter
sum_moment
Estimate R M R M
road
road
Regex
traffic
parse-road
shade
Estimate Road
Join
Albedo Segments
Geohash Join
M
R
Road
Metadata gps R
gps reco
logs
Count
Geohash Max
gps_count
recent_visit
M R
Ops + Support: Traps get
routed to customer review
(ties into notifications, etc.)
Tuesday, 05 March 13 31
Support needs to review exceptional data, via reports/notifications.
These can migrate into a Cascading app to run on Hadoop.
32. ANSI SQL – multiple flows
GIS Regex
tree
Scrub
export parse-tree species
M M
Estimate
Join Geohash
height
Regex
src
parse-gis
Tree Filter
tree
Metadata height
Failure M
Traps
Calculate Filter Sum
Join
distance distance moment Filter
sum_moment
Estimate R M R M
road
road
Regex
traffic
parse-road
shade
Estimate Road
Join
Albedo Segments
Geohash Join
M
R
Road
Metadata gps R
gps reco
logs
Count
Geohash Max
gps_count
recent_visit
M R
Data Scientists: R => PMML
for predictive models
(displaces SAS, etc.)
Tuesday, 05 March 13 32
Scientists perform their model creation work in R, Weka, SAS, Microstrategy, etc., which can export as PMML.
These can migrate into a Cascading app to run on Hadoop.
33. ANSI SQL – multiple flows
GIS Regex
tree
Scrub
export parse-tree species
M M
Estimate
Join Geohash
height
Regex
src
parse-gis
Tree Filter
tree
Metadata height
Failure M
Traps
Calculate Filter Sum
Join
distance distance moment Filter
sum_moment
Estimate R M R M
road
road
Regex
traffic
parse-road
shade
Estimate Road
Join
Albedo Segments
Geohash Join
M
R
Road
Metadata gps R
gps reco
logs
Count
Geohash Max
gps_count
recent_visit
M R
App Engineering: Java/Scala/Clojure
for business logic in data pipelines
(displaces Pig, etc.)
Tuesday, 05 March 13 33
Generally the revenue apps require some custom business logic -- representing business process for LOB.
These can migrate into a Cascading app to run on Hadoop.