Caserta Presentation:
General Data Protection Regulation (GDPR) is a business and technical challenge for companies worldwide - and the deadlines are coming fast! American institutions that do business in the EU or have customers from the EU will have their data practices affected. With this in mind, Caserta – joined by Waterline Data, Salt Recruiting, and Squire Patton Boggs – hosted a BDW Meetup on the GDPR, which is perhaps the most controversial data legislation that has been passed to date.
Joe Caserta, Founding President, Caserta, spoke on the basics of the GDPR, how it will impact data privacy around the world, and some techniques geared towards compliance.
This presentation will discuss the stories of 3 companies that span different industries; what challenges they faced and how cloud analytics solved for them; what technologies were implemented to solve the challenges; and how they were able to benefit from their new cloud analytics environments.
The objectives of this session include:
• Detail and explain the key benefits and advantages of moving BI and analytics workloads to the cloud, and why companies shouldn’t wait any longer to make their move.
• Compare the different analytics cloud options companies have, and the pros and cons of each.
• Describe some of the challenges companies may face when moving their analytics to the cloud, and what they need to prepare for.
• Provide the case studies of three companies, what issues they were solving for, what technologies they implemented and why, and how they benefited from their new solutions.
• Learn what to look for one considering a partner and trusted advisor to assist with an analytics cloud migration.
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...Neo4j
This document discusses how graphs and cloud computing can accelerate innovation. It notes that all data and organizations are naturally connected in complex ways and graphs are core to modern intelligent applications. Connections in data help with personalization, recommendations, health, fraud prevention, and more. The document highlights growing adoption of graph databases and Neo4j's cloud-managed graph database service, Neo4j Aura, which provides simplicity, flexibility, reliability, and empowers faster iteration and collaboration in the cloud.
The 20th annual Enterprise Data World (EDW) Conference took place in San Diego last month April 17-21. It is recognized as the most comprehensive educational conference on data management in the world.
Joe Caserta was a featured presenter. His session “Evolving from the Data Warehouse to Big Data Analytics - the Emerging Role of the Data Lake," highlighted the challenges and steps to needed to becoming a data-driven organization.
Joe also participated in in two panel discussions during the show:
• "Data Lake or Data Warehouse?"
• "Big Data Investments Have Been Made, But What's Next
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...DATAVERSITY
J.B. Hunt, one of the leading providers of transportation and logistics services in North America, recognizes the criticality of customer responsiveness, service quality, and operational efficiency for its success. However, with its data spread across multiple sources, including legacy mainframe systems, the organization was struggling to meet data requirements from multiple departments. They struggled to troubleshoot operational issues and respond to customers quickly.
Join this webinar to hear about the optimized solution J. B. Hunt implemented, which automates real-time data pipelines for a reliable cloud data lake and provides multiple user groups an in-the-moment view of data without overwhelming internal operational systems. Discover how J.B. Hunt now leverages a modernized data environment to accelerate data delivery and drive various AI and analytics initiatives such as real-time service-pricing, competitive counterbidding, and improving their customer experience.
Learn how you can:
• Ingest data in real-time from legacy mainframe systems, enterprise applications, and more
• Create a reliable cloud data lake to accelerate AI and Analytic Initiatives
• Catalog, prepare, and provision data to empower data consumers
• Drive operational efficiency and customer experience with AI-augmented insights
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Smarter businesses apply AI to learn and continuously evolve the way they work. To extract full value from AI, companies need data strategy that gives them access to all their data – no matter where it lives – in an environment that easily scales and applies the latest discovery technology including advanced analytics, visualization and AI. Learn how IBM Watson and Data provides all the tools companies need to embed AI, machine learning and deep learning in their business, while enabling professionals to gain the most from their data to drive smarter business and lead industry-changing transformations.
Joe Caserta was a featured speaker, along with MIT Sloan School faculty and other industry thought-leaders. His session 'You're the New CDO, Now What?' discussed how new CDOs can accomplish their strategic objectives and overcome tactical challenges in this emerging executive leadership role.
In its tenth year, the MIT CDOIQ Symposium 2016 continues to explore the developing role of the Chief Data Officer.
For more information, visit http://casertaconcepts.com/
Moving Past Infrastructure Limitations Presented by MediaMath
This presentation was given at a Big Data Warehousing Meetup with Caserta Concepts, MediaMath and Qubole. You can learn more about the event here: http://www.meetup.com/Big-Data-Warehousing/events/228372516/
Event description:
At Caserta Concepts, we are firm believers in big data thriving on the cloud. The instant-on, nearly unlimited storage and computing capabilities of AWS has made it the defacto solution for a full spectrum of organizations needing to process large amounts of data.
What's more, an ecosystem of value-added platforms has emerged to further ease and democratize the implementation of cloud based solutions. Qubole has developed a great platform for easily deploying and managing ephemeral and long-lived Hadoop and Spark clusters on AWS.
Moving Past Infrastructure Limitations: Data Warehousing at MediaMath
Over the past year and a half, MediaMath has undertaken a “data liberation” effort in an attempt to leave their bigbox, monolithic data warehouse behind. In this talk, Rory Sawyer, Software Engineer at MediaMath, will describe how this effort transformed MediaMath’s legacy architecture and legacy mindset, which imposed harsh inefficiencies on data sharing and utilization. The current mindset removes these inefficiencies and allows them to say “yes” to more projects and ideas.
Rory will also demo how MediaMath uses Amazon Web Services and Qubole so that infrastructure is no longer a limiting factor on what and how users query. This combination allows them to scale their resources up and down as needed while bridging different data sources and execution engines. Using and extending MediaMath’s data warehousing is no longer a privileged activity but an ability that every employee and client has.
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
Over the past eight or nine years, applying DevOps practices to various areas of technology within business has grown in popularity and produced demonstrable results. These principles are particularly fruitful when applied to a data analytics environment. Bob Eilbacher explains how to implement a strong DevOps practice for data analysis, starting with the necessary cultural changes that must be made at the executive level and ending with an overview of potential DevOps toolchains. Bob also outlines why DevOps and disruption management go hand in hand.
Topics include:
- The benefits of a DevOps approach, with an emphasis on improving quality and efficiency of data analytics
- Why the push for a DevOps practice needs to come from the C-suite and how it can be integrated into all levels of business
- An overview of the best tools for developers, data analysts, and everyone in between, based on the business’s existing data ecosystem
- The challenges that come with transforming into an analytics-driven company and how to overcome them
- Practical use cases from Caserta clients
This presentation was originally given by Bob at the 2017 Strata Data Conference in New York City.
Focus on Your Analysis, Not Your SQL CodeDATAVERSITY
This document discusses the challenges of using SQL for data analysis and introduces Alteryx as an alternative. It notes that SQL can be difficult to understand and repeat, while Alteryx allows users to see the full data workflow, perform transformations without coding, and access different data sources flexibly. The presentation includes an agenda, overview of Alteryx's benefits, and demonstration of its capabilities.
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
A modern, flexible approach to Hadoop implementation incorporating innovations from HP Haven
Jeff Veis
Vice President
HP Software Big Data
Gilles Noisette
Master Solution Architect
HP EMEA Big Data CoE
Are you your company’s chief data officer? Given the scarcity of the official role, it’s likely that you’re not — at least in title. But that doesn't mean that you shouldn't operate like one. Do you approach data leadership as a C-level executive or a senior data head? Is your team’s output strategic or just operational? In this interactive keynote, one of the Windy City’s foremost data leaders will lead an interactive discussion on what it takes to lead like a chief, what it looks like, and how to get there and get it done.
Using Machine Learning to Understand and Predict Marketing ROIDATAVERSITY
Marketing is all about attracting, retaining and building profitable relationships with your customers, but how do you know which customers to target, which campaigns to run, and which marketing programs to invest in, to get most return for your dollar?
Join Alteryx and Keyrus as we demonstrate how to combine all relevant marketing, sales and customer data, and perform sophisticated analytics to deepen customer insight and calculate ROI of marketing programs.
You’ll walk away knowing how to:
Segment and profile your customers – take that raw data and translate it into real value
Build a marketing attribution model within Alteryx, creating a personal answer engine for your company.
Leverage R or Python code in an Alteryx workflow so data scientists can collaborate with non-coding stake holders in a code-friendly and code-free environment.
Join Alteryx and Keyrus and get the actionable insights you need to drive marketing ROI analytics, and answer million-dollar questions without spending millions of dollars on standardized solutions.
Reveal the Intelligence in your Data with Talend Data FabricJean-Michel Franco
Discover the Winter'20 release of Talend Data Fabric.
Find out about the newly released product, Talend Data Inventory, and the powerful new capabilities and AI that accelerate and modernize data engineering. Find out how to:
- Ensure trusted data at first sight with Data Inventory
- Increase efficiency and productivity with Pipeline Designer
- Automate more integration tasks with AI and APIs
Data Catalog as the Platform for Data IntelligenceAlation
Data catalogs are in wide use today across hundreds of enterprises as a means to help data scientists and business analysts find and collaboratively analyze data. Over the past several years, customers have increasingly used data catalogs in applications beyond their search & discovery roots, addressing new use cases such as data governance, cloud data migration, and digital transformation. In this session, the founder and CEO of Alation will discuss the evolution of the data catalog, the many ways in which data catalogs are being used today, the importance of machine learning in data catalogs, and discuss the future of the data catalog as a platform for a broad range of data intelligence solutions.
This document discusses how enterprise information management is key to effective governance, risk management, and compliance (GRC). It defines GRC and explains that traditional GRC strategies often fail because information is siloed across unstructured files and structured data systems. Effective GRC requires synchronizing information and activities across governance, risk, and compliance to operate efficiently, enable information sharing, report activities, and avoid duplication. The document proposes that an information management system like M-Files can bridge the gap by structuring unstructured content and building relationships between structured and unstructured data. This allows information to be more easily found, visualized, and analyzed to support GRC.
Reinventing the Modern Information Pipeline: Paxata and MapRLilia Gutnik
(Presented at MapR's Big Data Everywhere event in Redwood City, CA in December 2016)
The relationship between business teams and IT has changed as the complexity of data has increased. A traditional data pipeline designed for an IT-centered approach to information management is not designed for the data demands of today's business decisions. Designing a big data strategy requires modernizing previous approaches. Self-service data preparation in a collaborative, intuitive, governed, and secure environment is the key to a nimble and decisive business unit.
During this Big Data Warehousing Meetup, Caserta Concepts and Databricks addressed the number one operational and analytic goal of nearly every organization today – to have complete view of every customer. Customer Data Integration (CDI) must be implemented to cleanse and match customer identities within and across various data systems. CDI has been a long-standing data engineering challenge, not just one of logic and complexity but also of performance and scalability.
The speakers brought together best practice techniques with Apache Spark to achieve complete CDI.
Speakers:
Joe Caserta, President, Caserta Concepts
Kevin Rasmussen, Big Data Engineer, Caserta Concepts
Vida Ha, Lead Solutions Engineer, Databricks
The sessions covered a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Topics included:
· Building an end-to-end CDI pipeline in Apache Spark
· What works, what doesn’t, and how do we use Spark we evolve
· Innovation with Spark including methods for customer matching from statistical patterns, geolocation, and behavior
· Using Pyspark and Python’s rich module ecosystem for data cleansing and standardization matching
· Using GraphX for matching and scalable clustering
· Analyzing large data files with Spark
· Using Spark for ETL on large datasets
· Applying Machine Learning & Data Science to large datasets
· Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally
The speakers also touched on data governance, on-boarding new data rapidly, how to balance rapid agility and time to market with critical decision support and customer interaction. They also shared examples of problems that Apache Spark is not optimized for.
For more information on the services offered by Caserta Concepts, visit our website: http://casertaconcepts.com/
ADV Slides: The World in 2045 – What Has Artificial Intelligence Created?DATAVERSITY
How will technology and society change in the next 25 years? We have been discussing how technology has evolved in the last few years; in this episode, we look forward to the next 25 years.
The year 2045 may seem far away, but we already have predictions about the technological innovations prevalent in 2045. Hint: Artificial intelligence will have a huge impact.
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
According to Forrester Research, only 22% of companies are currently seeing a significant return from data science expenditures. Most data science implementations are high-cost IT projects, local applications that are not built to scale for production workflows, or laptop decision support projects that never impact customers. Despite this high failure rate, we keep hearing the same mantra and solutions over and over again. Everybody talks about how to create models, but not many people talk about getting them into production where they can impact customers.
Harvinder Atwal offers an entertaining and practical introduction to DataOps, a new and independent approach to delivering data science value at scale, used at companies like Facebook, Uber, LinkedIn, Twitter, and eBay. The key to adding value through DataOps is to adapt and borrow principles from Agile, Lean, and DevOps. However, DataOps is not just about shipping working machine learning models; it starts with better alignment of data science with the rest of the organization and its goals. Harvinder shares experience-based solutions for increasing your velocity of value creation, including Agile prioritization and collaboration, new operational processes for an end-to-end data lifecycle, developer principles for data scientists, cloud solution architectures to reduce data friction, self-service tools giving data scientists freedom from bottlenecks, and more. The DataOps methodology will enable you to eliminate daily barriers, putting your data scientists in control of delivering ever-faster cutting-edge innovation for your organization and customers.
25 May 2018, the General Data Protection Regulation (GDPR) deadline, is less than 6 months away.
As the attention on the regulation is at the top, there is now a growing concern for any organization that is affected by.
We would like to invite you to join our webinar to share with you our approach and help your organization and you document repository to be compliant with GDPR.
During the webinar, our special guests, George Parapadakis – Business Solutions Strategy, Alfresco and Bart van Bouwel – Managing Partner, CDI-Partners, will provide you with:
- How to implement GDPR in your document repository
- How the Alfresco Digital Business Platform can help your organization to be compliant with GDPR
- Xenit approach: a managed shared drive
-Xenit demonstration
-Top tips to start preparing for the GDPR.
Webinar: Designing Storage Architectures for Data Privacy, Compliance and Gov...Storage Switzerland
Managing data is about more than managing capacity growth; organizations today need to adhere to increasingly strict data privacy, compliance and governance regulations. Privacy regulations like GDPR and California’s Consumer Privacy Act place new expectations on organizations that require them to not only protect data but also organize it so it can be found and deleted on request. Traditional backup and archive are ill-equipped to help organization adhere to these new regulations.
In this webinar join Storage Switzerland and Hitachi Vantara for a roundtable discussion on the meaning of these various regulations, the impact of them on traditional storage infrastructures and how to design a storage architecture that can meet today’s regulations as well as tomorrows.
How Cloudera SDX can aid GDPR compliance 6.21.18Cloudera, Inc.
Big data solutions from Cloudera can help organizations comply with the GDPR in three main ways:
1) Provide comprehensive encryption, access controls, and auditing to satisfy principles around integrity, confidentiality, and accountability.
2) Track the classification, usage, and lineage of personal data to demonstrate lawfulness, fairness, and transparency.
3) Enable capabilities like fast data updates, redaction, and erasure of individual records to comply with principles regarding purpose limitation, data minimization, accuracy, and storage limitation.
[Webinar Slides] Data Privacy – Learn What It Takes to Protect Your InformationAIIM International
Follow along with these webinar slides as we take a close look at what it takes to prepare for all kinds of data privacy regulations – learn how to protect your data in order to be compliant with regulators or for healthy business practices in general.
Want to follow along with the webinar replay? Download it here for free: http://info.aiim.org/protect-your-information
Beyond GDPR Compliance - Role of Internal AuditOmo Osagiede
Internal audit can play a strategic role in supporting an organization's GDPR compliance and remediation activities by:
1) Providing expertise and a "big picture" view of personal data flows and requirements.
2) Identifying opportunities to improve data governance and privacy risk management practices.
3) Conducting reviews of key GDPR compliance elements like data mapping, privacy impact assessments, and data subject rights management.
Mastering Data Compliance in a Dynamic Business LandscapeDenodo
Watch full webinar here: https://buff.ly/48rpLQ3
Join us for an enlightening webinar, "Mastering Data Compliance in a Dynamic Business Landscape," presented by Denodo Technologies and W5 Consulting. This session is tailored for business leaders and decision-makers who are navigating the complexities of data compliance in an ever-evolving business environment.
This webinar will focus on why data compliance is crucial for your business. Discover how to turn compliance into a competitive advantage, enhancing operational efficiency and market trust. We'll also address the risks of non-compliance, including financial penalties and the loss of customer trust, and provide strategies to proactively overcome these challenges.
Key Takeaways:
- How can your business leverage data management practices to stay agile and compliant in a rapidly changing regulatory landscape?
- Keys to balancing data accessibility with security and privacy in today's data-driven environment.
- What are the common pitfalls in achieving compliance with regulations like GDPR, CCPA, and HIPAA, and how can your business avoid them?
We will go beyond the technical aspects and delve into how you can strategically position your organization in the realm of data management and compliance. Learn how to craft a data compliance strategy that aligns with your business goals, enhances operational efficiency, and builds stakeholder trust.
Building the Governance Ready Enterprise for GDPR ComplianceIndex Engines Inc.
The EU General Data Protection Regulation (GDPR) fundamentally changes how organizations manage personal data. Giving citizens the right to access, rectify, erase, restrict, and migrate their personal content existing in any data center that does business in the European Union.
Index Engines' technology delivers extensive search and management solutions that empower you to find all personal data under management with considerable precision and meet or exceed the requirements of the regulation through implementation of powerful indexing technology. Index Engines supports all classes of data from primary storage to legacy backup data.
The document discusses organizations' experiences with GDPR compliance after the May 2018 deadline. It finds that many organizations are still dealing with residual risks and have uncovered more personal data than expected during their discovery processes. Specifically, organizations have struggled to fully comply with data deletion requests due to data being spread across systems without full lineage. The document advocates that organizations view GDPR not just as a compliance burden but as an opportunity to improve data governance, build customer trust, and enable digital expansion.
Date: 15th November 2017
Location: AI Lab Theatre
Time: 16:30 - 17:00
Speaker: Elisabeth Olafsdottir / Santiago Castro
Organisation: Microsoft / Keyrus
Ciso round table on effective implementation of dlp & data securityPriyanka Aash
The document discusses an effective implementation of data loss prevention (DLP) and data security. It covers key factors like the evolving threat landscape, business drivers for DLP, common challenges, and approaches to solve data security issues. An effective methodology is proposed, including identifying critical data and channels, deploying suitable policies, monitoring incidents, and establishing governance through continuous review and improvement. Critical success factors include business involvement, a phased implementation approach, and repeating the plan-do-check-act cycle periodically. The expected project outcomes are protection of critical channels, improved data tracking and awareness, and happier customers and auditors.
This document discusses security considerations for data lakes. It notes that data lakes consolidate an organization's most valuable data, making them an attractive target for hackers. The document outlines key risks like housing all customer data in a single repository and in the cloud. It proposes security design principles like zero trust and least privilege. The document then presents a protection framework with components like access controls, network security, data protection policies and governance. Specific capabilities are described for areas like platform access, policies, network isolation, and data protection. The goal is to properly secure the data lake while still enabling data sharing and analytics.
The document discusses the risks associated with big data, including increased data production leading to higher costs of replication and storage, evolving privacy and security regulations, and growing litigation and discovery obligations. It notes that most of the significant risks and costs of big data are not clearly visible and addresses challenges in areas like existing infrastructure, regulatory compliance, contracting, data retention, and eDiscovery.
Richard Hogg & Dennis Waldron - #InfoGov17 - Cognitive Unified Governance & P...ARMA International
GDPR is Coming, May 25 2018 brings a whole new order of EU Personal Data Privacy and Protection rights, duties and obligations. What changes, what's your risk and how can you start to prepare?
How can a Unified Governance strategy and capabilities transform both your information governance program, and provide a framework for personal data?
How that strategy can leverage metadata to support and accelerate meeting regulatory issues.
The General Data Protection Regulation and the DAMA DMBOK – Tools you can use for Compliance
Abstract: The General Data Protection Regulation will be the law governing data privacy in Europe in 2018. Surveys show that less than 50% of organisations are aware of the changes within the legislation, and even fewer have any plan for achieving compliance. In this session, Daragh O Brien takes us on a high level overview of the GDPR and how the disciplines of the DMBOK can help compliance.
Notes: DMBOK is an abbreviation for the "Data Management Book of Knowledge" which is published by DAMA International (The Data Management Association)
Information is currency in the 21st century...Is your data enabling you to drive the right digital transformation in your organisation? - Jasmit Sagoo, CTO, Veritas
Data- and database security & GDPR: end-to-end offerCapgemini
This document discusses Capgemini and Sogeti's end-to-end offering for database security and GDPR compliance. It outlines a four-phase approach including a GDPR readiness assessment, roadmap development, privacy impact assessment, and implementing database security solutions. Each phase has defined activities, timelines, and results to help organizations assess their GDPR compliance and secure databases containing personal data. The offering is designed to help organizations address new accountability and security requirements under the upcoming GDPR regulation.
My keynote speech at the ISACA IIA Belgium software watch day in October 2014 in Brussels on the value of big data and data analytics for auditors and other assurance professionals
Cloudera's big data platform can help organizations comply with the EU's General Data Protection Regulation (GDPR) in three key ways:
1. It provides a single system to securely store, govern, and manage all analytic workloads and personal data across on-premises, cloud, structured, and unstructured data sources.
2. Its shared services like data catalog, security, governance, and lifecycle management can be applied uniformly across the platform to meet GDPR principles like data minimization, storage limitation, and accuracy.
3. Specific capabilities like its GDPR data hub, consent management, and ability to delete individual data records upon request help automate key GDPR requirements at scale,
The EU General Protection Regulation and how Oracle can help Niklas Hjorthen
The document discusses Oracle's technology solutions that can help organizations comply with the EU General Data Protection Regulation (GDPR). It provides an overview of GDPR requirements and describes Oracle products that address key areas like data discovery, access controls, monitoring and auditing, and personal data management. It outlines a multi-step approach organizations can take using Oracle technologies to establish the necessary technical foundation and processes for GDPR compliance.
Similar to General Data Protection Regulation - BDW Meetup, October 11th, 2017 (20)
Introduction to Data Science (Data Summit, 2017)Caserta
This document summarizes an introduction to data science presentation by Joe Caserta and Bill Walrond of Caserta Concepts. Caserta Concepts is an internationally recognized data innovation and engineering consulting firm. The agenda covers why data science is important, challenges of working with big data, governing big data, the data pyramid, what data scientists do, standards for data science, and a demonstration of data analysis. Popular machine learning algorithms like regression, decision trees, k-means clustering and collaborative filtering are also discussed.
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
This document discusses the evolution of data analytics and modeling. It describes three waves: the first with slow hardware and manual entry; the second with faster PCs but tool explosions; and the third wave now with big data, cloud warehouses, and data-driven tools like Looker and BigQuery. It argues that in this current wave, having a flexible yet performant data model built on SQL in a warehouse, and using a language like LookML to define relationships and translate questions, allows gaining reliable answers with agility without worrying about low-level syntax or tools.
The Data Lake - Balancing Data Governance and Innovation Caserta
Joe Caserta gave the presentation "The Data Lake - Balancing Data Governance and Innovation" at DAMA NY's one day mini-conference on May 19th. Speakers covered emerging trends in Data Governance, especially around Big Data.
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few.
Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO.
Agenda included:
- Pizza and Networking
- Joe Caserta, President, Caserta Concepts - Why are we here?
- Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration
- Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing
- James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service
- Q&A, Networking
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
This document discusses appropriate and inappropriate use cases for Apache Spark based on the type of data and workload. It provides examples of good uses, such as batch processing, ETL, and machine learning/data science. It also gives examples of bad uses, such as random access queries, frequent incremental updates, and low latency stream processing. The document recommends using a database instead of Spark for random access, updates, and serving live queries. It suggests using message queues instead of files for low latency stream processing. The goal is to help users understand how to properly leverage Spark for big data workloads.
This document discusses balancing data governance and innovation. It describes how traditional data analytics methods can inhibit innovation by requiring lengthy processes to analyze new data. The document advocates adopting a data lake approach using tools like Hadoop and Spark to allow for faster ingestion and analysis of diverse data types. It also discusses challenges around simultaneously enabling innovation through a data lake while still maintaining proper data governance, security, and quality. Achieving this balance is key for organizations to leverage data for competitive advantage.
Introducing Kudu, Big Data Warehousing MeetupCaserta
Not just an SQL interface or file system, Kudu - the new, updating column store for Hadoop, is changing the storage landscape. It's easy to operate and makes new data immediately available for analytics or operations.
At the Caserta Concepts Big Data Warehousing Meetup, our guests from Cloudera outlined the functionality of Kudu and talked about why it will become an integral component in big data warehousing on Hadoop.
To learn more about what Caserta Concepts has to offer, visit http://casertaconcepts.com/
How do you balance the need for structured and rule-based governance to assure enterprise data quality - with the imperative to innovate in order to stay relevant and competitive in today's business marketplace?
At the recent CDO Summit in NYC, a range of C-Level Executives across a variety of industries came to hear Joe Caserta, president of Caserta Concepts, put it all in perspective.
Joe talked about the challenges of "data sprawl" and the paradigm shift underway in the evolving big data and data-driven world.
For more information or to contact us, visit http://casertaconcepts.com/
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented What Data Do You Have and Where is it?
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
Joe Caserta, President at Caserta Concepts, presented "Setting Up the Data Lake" at a DAMA Philadelphia Chapter Meeting.
For more information on the services offered by Caserta Concepts, visit our website at http://casertaconcepts.com/.
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
Joe Caserta, President at Caserta Concepts presented at the 3rd Annual Enterprise DATAVERSITY conference. The emphasis of this year's agenda is on the key strategies and architecture necessary to create a successful, modern data analytics organization.
Joe Caserta presented Incorporating the Data Lake into Your Analytics Architecture.
For more information on the services offered by Caserta Concepts, visit out website at http://casertaconcepts.com/.
During a Big Data Warehousing Meetup in NYC, Elliott Cordo, Chief Architect at Caserta Concepts discussed emerging trends in real time data processing. The presentation included processing frameworks such as Spark and Storm, as well datastore technologies ranging from NoSQL to Hadoop. He also discussed exciting new AWS services such as Lambda, Kenesis, and Kenesis Firehose.
In this presentation at DAMA New York, Joe started by asking a key question: why are we doing this? Why analyze and share all these massive amounts of data? Basically, it comes down to the belief that in any organization, in any situation, if we can get the data and make it correct and timely, insights from it will become instantly actionable for companies to function more nimbly and successfully. Enabling the use of data can be a world-changing, world-improving activity and this session presents the steps necessary to get you there. Joe explained the concept of the "data lake" and also emphasizes the role of a strong data governance strategy that incorporates seven components needed for a successful program.
For more information on this presentation or Caserta Concepts, visit our website at http://casertaconcepts.com/.
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
Joe Caserta went over the details inside the big data ecosystem and the Caserta Concepts Data Pyramid, which includes Data Ingestion, Data Lake/Data Science Workbench and the Big Data Warehouse. He then dove into the foundation of dimensional data modeling, which is as important as ever in the top tier of the Data Pyramid. Topics covered:
- The 3 grains of Fact Tables
- Modeling the different types of Slowly Changing Dimensions
- Advanced Modeling techniques like Ragged Hierarchies, Bridge Tables, etc.
- ETL Architecture.
He also talked about ModelStorming, a technique used to quickly convert business requirements into an Event Matrix and Dimensional Data Model.
This was a jam-packed abbreviated version of 4 days of rigorous training of these techniques being taught in September by Joe Caserta (Co-Author, with Ralph Kimball, The Data Warehouse ETL Toolkit) and Lawrence Corr (Author, Agile Data Warehouse Design).
For more information, visit http://casertaconcepts.com/.
Against the backdrop of Big Data, the Chief Data Officer, by any name, is emerging as the central player in the business of data, including cybersecurity. The MITCDOIQ Symposium explored the developing landscape, from local organizational issues to global challenges, through case studies from industry, academic, government and healthcare leaders.
Joe Caserta, president at Caserta Concepts, presented "Big Data's Impact on the Enterprise" at the MITCDOIQ Symposium.
Presentation Abstract: Organizations are challenged with managing an unprecedented volume of structured and unstructured data coming into the enterprise from a variety of verified and unverified sources. With that is the urgency to rapidly maximize value while also maintaining high data quality.
Today we start with some history and the components of data governance and information quality necessary for successful solutions. I then bring it all to life with 2 client success stories, one in healthcare and the other in banking and financial services. These case histories illustrate how accurate, complete, consistent and reliable data results in a competitive advantage and enhanced end-user and customer satisfaction.
To learn more, visit www.casertaconcepts.com
Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...weiwchu
We recently discovered that models trained with large-scale speech datasets sourced from the web could achieve superior accuracy and potentially lower cost than traditionally human-labeled or simulated speech datasets. We developed a customizable AI-driven data labeling system. It infers word-level transcriptions with confidence scores, enabling supervised ASR training. It also robustly generates phone-level timestamps even in the presence of transcription or recognition errors, facilitating the training of TTS models. Moreover, It automatically assigns labels such as scenario, accent, language, and topic tags to the data, enabling the selection of task-specific data for training a model tailored to that particular task. We assessed the effectiveness of the datasets by fine-tuning open-source large speech models such as Whisper and SeamlessM4T and analyzing the resulting metrics. In addition to openly-available data, our data handling system can also be tailored to provide reliable labels for proprietary data from certain vertical domains. This customization enables supervised training of domain-specific models without the need for human labelers, eliminating data breach risks and significantly reducing data labeling cost.
Introduction to Data Science
1.1 What is Data Science, importance of data science,
1.2 Big data and data Science, the current Scenario,
1.3 Industry Perspective Types of Data: Structured vs. Unstructured Data,
1.4 Quantitative vs. Categorical Data,
1.5 Big Data vs. Little Data, Data science process
1.6 Role of Data Scientist
Overview of Statistical software such as ODK, surveyCTO,and CSPro
2. Software installation(for computer, and tablet or mobile devices)
3. Create a data entry application
4. Create the data dictionary
5. Create the data entry forms
6. Enter data
7. Add Edits to the Data Entry Application
8. CAPI questions and texts
Big Data and Analytics Shaping the future of PaymentsRuchiRathor2
The payments industry is experiencing a data-driven revolution powered by big data and analytics.
Here's a glimpse into 5 ways this dynamic duo is transforming how we pay.
In essence, big data and analytics are playing a pivotal role in building a future filled with faster, more secure, and convenient payment methods for everyone.
Combined supervised and unsupervised neural networks for pulse shape discrimi...Samuel Jackson
Our methodology for pulse shape discrimination is split into two steps. Firstly, we learn a model to discriminate between pulses using "clean" low-rate examples by removing pile-up & saturated events. In addition to traditional tail sum discrimination, we investigate three different choices for discrimination between γ-pulses, fast, thermal neutrons. We consider clustering the pulses directly using Gaussian Mixture Modelling (GMM), using variational autoencoders to learn a representation of the pulses and then clustering the learned representation (VAE+GMM) and using density ratio estimation to discriminate between a mixed (γ + neutron) and pure (γ only) sources using a multi-layer perceptron (MLP) as a supervised learning problem.
Secondly, we aim to classify and recover pile-up events in the < 150 ns regime by training a single unified multi-label MLP. To frame the problem as a multi-label supervised learning method, we first simulate pile-up events with known components. Then, using the simulated data and combining it with single event data, we train a final multi-label MLP to output a binary code indicating both how many and which type of events are present within an event window.
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion dataSamuel Jackson
We present our work to improve data accessibility and performance for data-intensive tasks within the fusion research community. Our primary goal is to develop services that facilitate efficient access for data-intensive applications while ensuring compliance with FAIR principles [1], as well as adoption of interoperable tools, methods and standards.
The major outcome of our work is the successful creation and deployment of a data service for the MAST (Mega Ampere Spherical Tokamak) experiment [2], leading to substantial enhancements in data discoverability, accessibility, and overall data retrieval performance, particularly in scenarios involving large-scale data access. Our work follows the principles of Analysis-Ready, Cloud Optimised (ARCO) data [3] by using cloud optimised data formats for fusion data.
Our system consists of a query-able metadata catalogue, complemented with an object storage system for publicly serving data from the MAST experiment. We will show how our solution integrates with the Pandata stack [4] to enable data analysis and processing at scales that would have previously been intractable, paving the way for data-intensive workflows running routinely with minimal pre-processing on the part of the researcher. By using a cloud-optimised file format such as zarr [5] we can enable interactive data analysis and visualisation while avoiding large data transfers. Our solution integrates with common python data analysis libraries for large, complex scientific data such as xarray [6] for complex data structures and dask [7] for parallel computation and lazily working with larger that memory datasets.
The incorporation of these technologies is vital for advancing simulation, design, and enabling emerging technologies like machine learning and foundation models, all of which rely on efficient access to extensive repositories of high-quality data. Relying on the FAIR guiding principles for data stewardship not only enhances data findability, accessibility, and reusability, but also fosters international cooperation on the interoperability of data and tools, driving fusion research into new realms and ensuring its relevance in an era characterised by advanced technologies in data science.
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016) https://doi.org/10.1038/sdata.2016.18
[2] M Cox, The Mega Amp Spherical Tokamak, Fusion Engineering and Design, Volume 46, Issues 2–4, 1999, Pages 397-404, ISSN 0920-3796, https://doi.org/10.1016/S0920-3796(99)00031-9
[3] Stern, Charles, et al. "Pangeo forge: crowdsourcing analysis-ready, cloud optimized data production." Frontiers in Climate 3 (2022): 782909.
[4] Bednar, James A., and Martin Durant. "The Pandata Scalable Open-Source Analysis Stack." (2023).
[5] Alistair Miles (2024) ‘zarr-developers/zarr-python: v2.17.1’. Zenodo. doi: 10.5281/zenodo.10790679
[6] Hoyer, S. & Hamman, J., (20
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...rightmanforbloodline
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B. Fraleigh, Verified Chapters 1 - 56,.pdf
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B. Fraleigh, Verified Chapters 1 - 56,.pdf
Annex K RBF's The World Game pdf documentSteven McGee
Signals & Telemetry Annex K for RBF's The World Game / Trade Federations / USPTO 13/573,002 Heart Beacon Cycle Time - Space Time Chain meters, metrics, standards. Adaptive Procedural template framework structured data derived from DoD / NATO's system of systems engineering tech framework
Dataguard Switchover Best Practices using DGMGRL (Dataguard Broker Command Line)
General Data Protection Regulation - BDW Meetup, October 11th, 2017
1. Big Data Warehousing Meetup
General Data Protection Regulation
GDPR
October 11, 2017
2. About Caserta
Data Intelligence Consulting and Modern Data Engineering
Data Lakes, Data Laboratories, Data Warehouses
Award-winning company for Data Innovation
Data Science, Machine Learning, Artificial Intelligence
Internationally recognized work force
Keynote Speakers, Educators, Mentors
Strategy, Architecture, Governance, Implementation
5. GDPR Cannot be Ignored
GDPR Compliance Top Data Protection Priority for 92% of US Organizations in 2017
- PwC Survey
• The GDPR requirements will force U.S. companies to change
the way they process, store, and protect customers’ personal
data.
• Companies must be able to show compliance by May 25,
2018
• A data protection officer (DPO) may be required
6. GDPR in a Town Near You
New York legislature, inspired by the GDPR, proposed the
Right to be Forgotten Act,.
GDPR will continue influencing privacy regulations across
the globe
Companies that comply with the GDPR will be better
prepared for future changes in U.S. legislation.
7. Data Elements Regulated
Basic identity information such as name, address and ID numbers
Web data such as location, IP address, cookie data and RFID tags
Health and genetic data
Biometric data
Racial or ethnic data
Political opinions
Sexual orientation
8. The Technical Challenge
“Delete all my personal data without undue delay when it is
no longer necessary or when consent has been withdrawn”
Legal: Right to Erasure or Right to be Forgotten
Engineer: Need the ability to delete some specific subset
or all data associated with a customer from all data
systems
9. More GDPR Technical Goals
The pseudonymisation and encryption of personal data.
The ability to ensure the ongoing confidentiality, integrity,
availability and resilience of processing systems and services.
The ability to restore the availability and access to personal data in
a timely manner in the event of a physical or technical incident.
A process for regularly testing, assessing and evaluating the
effectiveness of technical and organizational measures for ensuring
the security of the processing.
10. GDPR Three-Legged Stool
Metadata
github.com/linkedin/wherehows
Data Access
Something Similar to DALI (Data Access LinkedIn)
Data Lifecycle Management
gobblin.apache.com
11. GDPR Tips
Bake Data Privacy into the Design
Encrypt the Data, Implement Access Control Governance
Enable Fine Grain Access Control (FGAC)
Keep Inventory of Data and Processes
Document how data is collected, purged
Record or detect Data Lineage
Potentially Hire Data Protection Officer
Or Consultants to establish GDPR Strategy & Execution Plan
12. Thank You
Joe Caserta, President
joe@casertaconcepts.com
Twitter: joe_caserta