Jfokus_Bringing the cloud back down to earth.pptxGrace Jansen
How can we effectively develop for the cloud, when we as developers are coding back down on earth? This is where effective cloud-native developer tools can enable us to either be transported into the cloud or alternatively, to bring the cloud back down to earth. But what tools should we be using for this? In this session, we’ll explore some of the useful OSS tools and technologies that can used by developers to effectively develop, design and test cloud-native Java applications.
Apache Hadoop and the Big Data Opportunity in Banking
The document discusses Apache Hadoop and how it can help banks leverage big data opportunities. It provides an overview of what Apache Hadoop is, how it works, and the core projects. It then discusses how Hadoop can help banks create value by detecting fraud, managing risk, improving products based on customer data analysis, and more. The presenters are from Hortonworks, the lead commercial company for Hadoop, and Tresata, a company focused on using Hadoop for banking applications.
This document discusses PostgreSQL database architecture patterns for running PostgreSQL at scale when a relational database as a service like Amazon RDS won't meet needs. It describes challenges faced with MySQL, Redshift and Vertica and how PostgreSQL was better suited through techniques like partitioning by date, TOAST compression, foreign data wrappers, and poor man's parallel processing. Key takeaways are that PostgreSQL supported scaling to petabytes of data, sub-second queries across large date ranges, and custom extensions needed while avoiding limitations and expenses of other database options.
difference between a traditional ML model and a foundation model.
Gen AI tech stack
The architecture of today’s LLM applications
Decide Generative AI Strategy
LLM Framework Architecture
What’s the Hype around LLLOps?
엔터프라이즈의 인공지능(AI)과 머신러닝(ML) 적용은 왜 어려울까요?
베스핀글로벌의 웨비나 자료를 통해서 성공적인 AI와 ML 적용 방법을 확인하세요.
[목차]
1. 디지털 트랜스포메이션의 큰 흐름
- Gartner 선정 미래를 이끌어 갈 기업
- 글로벌 금융 기업의 디지털 트랜스포메이션, 데이터를 바라보는 시각
- 빅데이터 & AI 활용 사례
2. 빅데이터 분석 시스템 도입하기
- 빅데이터 분석 시스템 미도입 이유
- 빅데이터 분석 시스템 도입 사례
3. 데이터 분석을 위한 Data Lake & Data Governance
- 데이터 분석의 한계와 Data Lake
- 클라우드 Migration
- Data Governance의 중요성
4. AI 적용하기
- Amazon AI 서비스
- 적용 사례
[Keynote] Data Driven Organizations with AWS Data - 발표자: Agnes Panosian, Head...Amazon Web Services Korea
데이터는 모든 애플리케이션, 프로세스 및 비즈니스 의사 결정의 중심에 있습니다. 데이터는 거의 모든 조직의 디지털 트랜스포메이션의 초석입니다. 데이터는 새로운 경험을 촉진하고 혁신을 이끌어내는 통찰력으로 이어집니다. 전체 조직을 위한 데이터의 가치를 실현하는 전략을 구축하는 것은 쉽고 간단한 여정이 아닙니다. 이 세션에서는 데이터 기반 조직화를 위한 모범 사례와 그 여정에서 AWS가 어떻게 도움을 드릴 수 있는지를 다룹니다.
제 17회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [중고책나라] : 실시간 데이터를 이용한 Elasticsearch 클러스터 최적화BOAZ Bigdata
데이터 엔지니어링 프로젝트를 진행한 중고책나라 팀에서는 아래와 같은 프로젝트를 진행했습니다.
중고책 실시간 데이터를 활용하여 Elasticsearch Indexing 클러스터 성능 최적화
18기 금나연 숙명여자대학교 IT공학 전공
18기 박규연 국민대학교 소프트웨어학부
18기 김건우 국민대학교 AI빅데이터융합경영학과
Data Engineering A Deep Dive into DatabricksKnoldus Inc.
During this session, you'll gain a comprehensive understanding of Databricks' capabilities for efficiently processing and managing data, with a focus on Apache Spark for data transformation. We'll cover data ingestion methods, storage, orchestration, and best practices to ensure your data engineering workflows are optimized for success.
코끼리(BOAZ) 사서의 도서 추천 솔루션
: 이 책 내용이 내 취향인데, 비슷한 내용의 책은 어떻게 찾지?’
줄거리를 바탕으로 책을 고르시는 분, 관심 작가의 책을 읽고 싶은 분들께
코끼리 사서가 취향저격 책을 제안해 드립니다.
12기 강호석 고은비 고은지 양태일 이지인 전준수 정해원
[국내 최초 빅데이터 연합동아리 BOAZ]
유튜브 - https://www.youtube.com/channel/UCSniI26A56n2QZ71opJtTUg
페이스북 - https://www.facebook.com/BOAZbigdata
인스타그램 - http://www.instagram.com/boaz_bigdata
블로그 - https://blog.naver.com/boazbigdata
Amazon SageMaker is an end-to-end machine learning platform that allows users to build, train, and deploy machine learning models at scale. It provides pre-built machine learning algorithms, notebook instances to build models, one-click training for ML/DL models and custom algorithms, and deployment of trained models without additional engineering effort. SageMaker also manages and scales model inference clusters and APIs for production.
Architecture Design for Deep Neural Networks IIIWanjin Yu
Neural architecture search aims to automate neural network design. Recent approaches include:
(1) Reinforcement learning searches over large spaces but requires extensive computation.
(2) One-shot approaches like DARTS jointly optimize weights and architecture, improving efficiency.
(3) New methods like Proxyless NAS directly search on target tasks and hardware, finding mobile architectures.
Neural architecture search represents progress toward fully automatic deep learning and more specialized models.
Amazon SageMaker 모델 학습 방법 소개::최영준, 솔루션즈 아키텍트 AI/ML 엑스퍼트, AWS::AWS AIML 스페셜 웨비나Amazon Web Services Korea
Amazon SageMaker Training과 Processing에 처음 입문 하고자 하는 분을 위해 동작 방식을 설명하고, 실행할 수 있는 가이드를 제공합니다.사용자는 Amazon SageMaker 노트북을 생성한 다음, 직접 정의한 별도의 GPU 또는 고성능 CPU로 구성된 학습 클러스터에서 학습 코드를 실행하여, 효율적으로 모델 학습과 데이터 전처리, 추론 결과 후처리 또는 모델 평가 등을 할 수 있도록 합니다. 추가적으로 Amazon SageMaker Experiments를 이용하여 학습 실험에 대한 구조화와 평가 메트릭 간의 비교를 체계적으로 관리하는 방법을 소개합니다.
Hadoop World 2011: Advanced HBase Schema DesignCloudera, Inc.
While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second.
This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they can be implemented on top of HBase, using schemas that optimize for the given access patterns.
HBase can be an intimidating beast for someone considering its adoption. For what kinds of workloads is it well suited? How does it integrate into the rest of my application infrastructure? What are the data semantics upon which applications can be built? What are the deployment and operational concerns? In this talk, I'll address each of these questions in turn. As supporting evidence, both high-level application architecture and internal details will be discussed. This is an interactive talk: bring your questions and your use-cases!
The document discusses Facebook's use of HBase to store messaging data. It provides an overview of HBase, including its data model, performance characteristics, and how it was a good fit for Facebook's needs due to its ability to handle large volumes of data, high write throughput, and efficient random access. It also describes some enhancements Facebook made to HBase to improve availability, stability, and performance. Finally, it briefly mentions Facebook's migration of messaging data from MySQL to their HBase implementation.
HBaseCon 2013: Full-Text Indexing for Apache HBaseCloudera, Inc.
This document discusses full-text indexing for HBase tables. It describes how Lucene indices are organized based on HBase regions. Index building is implemented using coprocessors to update indices on data changes. Index splitting is optimized to avoid blocking updates during region splits. Search performance of indexing 10 billion records was tested, showing search times of around 1 second.
Apache HBase is the Hadoop opensource, distributed, versioned storage manager well suited for random, realtime read/write access. This talk will give an overview on how HBase achieve random I/O, focusing on the storage layer internals. Starting from how the client interact with Region Servers and Master to go into WAL, MemStore, Compactions and on-disk format details. Looking at how the storage is used by features like snapshots, and how it can be improved to gain flexibility, performance and space efficiency.
This document discusses tuning HBase and HDFS for performance and correctness. Some key recommendations include:
- Enable HDFS sync on close and sync behind writes for correctness on power failures.
- Tune HBase compaction settings like blockingStoreFiles and compactionThreshold based on whether the workload is read-heavy or write-heavy.
- Size RegionServer machines based on disk size, heap size, and number of cores to optimize for the workload.
- Set client and server RPC chunk sizes like hbase.client.write.buffer to 2MB to maximize network throughput.
- Configure various garbage collection settings in HBase like -Xmn512m and -XX:+UseCMSInit
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
This document discusses file system usage in HBase. It provides an overview of the three main file types in HBase: write-ahead logs (WALs), data files, and reference files. It describes durability semantics, IO fencing techniques for region server recovery, and how HBase leverages data locality through short circuit reads, checksums, and block placement hints. The document is intended help understand HBase's interactions with HDFS for tuning IO performance.
A 3 dimensional data model in hbase for large time-series dataset-20120915Dan Han
This document outlines a study on migrating relational database content to NoSQL storage systems like HBase. It discusses challenges in migration and the need for design patterns for HBase schemas. A 3-dimensional data model in HBase is proposed and evaluated using cosmology and bike rental datasets. Experiment results show the 3D model improves performance for queries that use HBase's version dimension. Future work includes further evaluation of the model's scalability and designing models for other dataset types.
Vladimir Rodionov (Hortonworks)
Time-series applications (sensor data, application/system logging events, user interactions etc) present a new set of data storage challenges: very high velocity and very high volume of data. This talk will present the recent development in Apache HBase that make it a good fit for time-series applications.
Apache HBase - Introduction & Use CasesData Con LA
HBase is an open source, distributed, column-oriented database modeled after Google's BigTable. It sits atop Hadoop, using HDFS for storage. HBase scales horizontally and supports fast random reads and writes. It is well-suited for large tables and high throughput access. Facebook uses HBase extensively for messaging and other applications due to its high write throughput and low latency reads. Other users include Flurry and Yahoo.
Speaker: Jesse Anderson (Cloudera)
As optional pre-conference prep for attendees who are new to HBase, this talk will offer a brief Cliff's Notes-level talk covering architecture, API, and schema design. The architecture section will cover the daemons and their functions, the API section will cover HBase's GET, PUT, and SCAN classes; and the schema design section will cover how HBase differs from an RDBMS and the amount of effort to place on schema and row-key design.
This document provides a user manual for an interactive development environment called Hog that allows for the creation of Apache Pig scripts. It describes Hog's installation, design including simple and complex interfaces, available node functions, output display options, and FAQs. The manual was developed in 2016 by KEYW Corp to guide users in leveraging Hog's various features.
This document provides an overview of Hog, an IDE created by KEYW Corp to allow analysts with minimal coding experience to perform queries on Apache Pig. Hog uses a drag-and-drop interface to build Pig scripts visually, avoiding the need for coding. It can generate Pig scripts, run them, and display outputs in graphical or tabular formats to help analysts explore and analyze data. Hog is designed for both novice analysts new to Pig as well as experienced developers.
This document discusses Spark, an open-source cluster computing framework. It notes that while Hadoop is useful for batch processing, it has limitations for interactive and iterative algorithms. Spark addresses these issues through its resilient distributed datasets (RDDs) which can be operated on in parallel and rebuilt if lost. RDDs support transformations like map and filter as well as actions that return values. The document provides examples of using Spark from Scala and discusses its architecture involving a DAG scheduler and task scheduler.
Welcome!
Michael Stack, Software Engineer, Cloudera & HBase PMC Chair
9:00-9:05am
Conference MC Michael Stack, Chair of the HBaseCon 2013 Program Committee, welcomes you to the conference and offers a preview of the day.
The Apache HBase Community: Best Ever and Getting Better
Amr Awadallah, CTO and Co-founder, Cloudera
9:05-9:15am
Amr comments on the explosion of interest in Apache HBase over the past few years, how that interest has influenced the Hadoop stack overall, and why Cloudera considers its involvement in the HBase community to be so important.
State of the Apache HBase Union
Michael Stack & Lars Hofhansl, Architect, Salesforce.com
9:15-9:40am
Release-managers-in-crime Michael and Lars offer a look back, and a look forward, at HBase releases and what they have brought us (and will bring us in the future).
The Apache HBase Ecosystem
Aaron Kimball, Chief Architect, WibiData
9:40-10:05am
Today, HBase stands as Apache Hadoop did years ago, a project with a growing and vibrant community in its own right. In this talk, Aaron will overview some of the projects built on top of HBase that you’ll get a chance to learn about during the day – each of these projects having grown out of a need to use HBase for an application that requires real-time atomic access to data. As an example, he’ll present the motivations for Kiji and how it is helping organizations create amazing new applications using HBase and Hadoop.
Overview of Apache HBase at Facebook (Slides Not Available)
Liyin Tang, Software Engineer, Facebook & HBase PMC Member
10:05-10:30am
In this keynote, you’ll get an overview of how HBase is used at Facebook. Explore Facebook’s applications using HBase as an OLTP service, which require high reliability, efficiency, and scalability, and how HBase can tolerate small network glitches and rack failures. You’ll also learn the use cases for adopting HBase as a batch processing service and various optimizations to scale processing throughput. Finally, learn Facebook’s thoughts about the future of HBase.
The Evolution of a Relational Database Layer over HBaseDataWorks Summit
Apache Phoenix is a SQL query layer over Apache HBase that allows users to interact with HBase through JDBC and SQL. It transforms SQL queries into native HBase API calls for efficient parallel execution on the cluster. Phoenix provides metadata storage, SQL support, and a JDBC driver. It is now a top-level Apache project after originally being developed at Salesforce. The speaker discussed Phoenix's capabilities like joins and subqueries, new features like HBase 1.0 support and functional indexes, and future plans like improved optimization through Calcite and transaction support.
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc
In a landmark year marked by significant AI advancements, it’s vital to prioritize transparency, accountability, and respect for privacy rights with your AI innovation.
Learn how to navigate the shifting AI landscape with our innovative solution TRUSTe Responsible AI Certification, the first AI certification designed for data protection and privacy. Crafted by a team with 10,000+ privacy certifications issued, this framework integrated industry standards and laws for responsible AI governance.
This webinar will review:
- How compliance can play a role in the development and deployment of AI systems
- How to model trust and transparency across products and services
- How to save time and work smarter in understanding regulatory obligations, including AI
- How to operationalize and deploy AI governance best practices in your organization
Self-Healing Test Automation Framework - HealeniumKnoldus Inc.
Revolutionize your test automation with Healenium's self-healing framework. Automate test maintenance, reduce flakes, and increase efficiency. Learn how to build a robust test automation foundation. Discover the power of self-healing tests. Transform your testing experience.
Top 12 AI Technology Trends For 2024.pdfMarrie Morris
Technology has become an irreplaceable component of our daily lives. The role of AI in technology revolutionizes our lives for the betterment of the future. In this article, we will learn about the top 12 AI technology trends for 2024.
The Challenge of Interpretability in Generative AI Models.pdfSara Kroft
Navigating the intricacies of generative AI models reveals a pressing challenge: interpretability. Our blog delves into the complexities of understanding how these advanced models make decisions, shedding light on the mechanisms behind their outputs. Explore the latest research, practical implications, and ethical considerations, as we unravel the opaque processes that drive generative AI. Join us in this insightful journey to demystify the black box of artificial intelligence.
Dive into the complexities of generative AI with our blog on interpretability. Find out why making AI models understandable is key to trust and ethical use and discover current efforts to tackle this big challenge.
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptxFwdays
I will share my personal experience of full-time development on wasm Blazor
What difficulties our team faced: life hacks with Blazor app routing, whether it is necessary to write JavaScript, which technology stack and architectural patterns we chose
What conclusions we made and what mistakes we committed
The Zaitechno Handheld Raman Spectrometer is a powerful and portable tool for rapid, non-destructive chemical analysis. It utilizes Raman spectroscopy, a technique that analyzes the vibrational fingerprint of molecules to identify their chemical composition. This handheld instrument allows for on-site analysis of materials, making it ideal for a variety of applications, including:
Material identification: Identify unknown materials, minerals, and contaminants.
Quality control: Ensure the quality and consistency of raw materials and finished products.
Pharmaceutical analysis: Verify the identity and purity of pharmaceutical compounds.
Food safety testing: Detect contaminants and adulterants in food products.
Field analysis: Analyze materials in the field, such as during environmental monitoring or forensic investigations.
The Zaitechno Handheld Raman Spectrometer is easy to use and features a user-friendly interface. It is compact and lightweight, making it ideal for field applications. With its rapid analysis capabilities, the Zaitechno Handheld Raman Spectrometer can help you improve efficiency and productivity in your research or quality control workflows.
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPathCommunity
Welcome to our third live UiPath Community Day Amsterdam! Come join us for a half-day of networking and UiPath Platform deep-dives, for devs and non-devs alike, in the middle of summer ☀.
📕 Agenda:
12:30 Welcome Coffee/Light Lunch ☕
13:00 Event opening speech
Ebert Knol, Managing Partner, Tacstone Technology
Jonathan Smith, UiPath MVP, RPA Lead, Ciphix
Cristina Vidu, Senior Marketing Manager, UiPath Community EMEA
Dion Mes, Principal Sales Engineer, UiPath
13:15 ASML: RPA as Tactical Automation
Tactical robotic process automation for solving short-term challenges, while establishing standard and re-usable interfaces that fit IT's long-term goals and objectives.
Yannic Suurmeijer, System Architect, ASML
13:30 PostNL: an insight into RPA at PostNL
Showcasing the solutions our automations have provided, the challenges we’ve faced, and the best practices we’ve developed to support our logistics operations.
Leonard Renne, RPA Developer, PostNL
13:45 Break (30')
14:15 Breakout Sessions: Round 1
Modern Document Understanding in the cloud platform: AI-driven UiPath Document Understanding
Mike Bos, Senior Automation Developer, Tacstone Technology
Process Orchestration: scale up and have your Robots work in harmony
Jon Smith, UiPath MVP, RPA Lead, Ciphix
UiPath Integration Service: connect applications, leverage prebuilt connectors, and set up customer connectors
Johans Brink, CTO, MvR digital workforce
15:00 Breakout Sessions: Round 2
Automation, and GenAI: practical use cases for value generation
Thomas Janssen, UiPath MVP, Senior Automation Developer, Automation Heroes
Human in the Loop/Action Center
Dion Mes, Principal Sales Engineer @UiPath
Improving development with coded workflows
Idris Janszen, Technical Consultant, Ilionx
15:45 End remarks
16:00 Community fun games, sharing knowledge, drinks, and bites 🍻
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Zilliz
��
Enterprises have traditionally prioritized data quantity, assuming more is better for AI performance. However, a new reality is setting in: high-quality data, not just volume, is the key. This shift exposes a critical gap – many organizations struggle to understand their existing data and lack effective curation strategies and tools. This talk dives into these data challenges and explores the methods of automating data curation.
DefCamp_2016_Chemerkin_Yury-publish.pdf - Presentation by Yury Chemerkin at DefCamp 2016 discussing mobile app vulnerabilities, data protection issues, and analysis of security levels across different types of mobile applications.
Retrieval Augmented Generation Evaluation with RagasZilliz
Retrieval Augmented Generation (RAG) enhances chatbots by incorporating custom data in the prompt. Using large language models (LLMs) as judge has gained prominence in modern RAG systems. This talk will demo Ragas, an open-source automation tool for RAG evaluations. Christy will talk about and demo evaluating a RAG pipeline using Milvus and RAG metrics like context F1-score and answer correctness.
Keynote : Presentation on SASE TechnologyPriyanka Aash
Secure Access Service Edge (SASE) solutions are revolutionizing enterprise networks by integrating SD-WAN with comprehensive security services. Traditionally, enterprises managed multiple point solutions for network and security needs, leading to complexity and resource-intensive operations. SASE, as defined by Gartner, consolidates these functions into a unified cloud-based service, offering SD-WAN capabilities alongside advanced security features like secure web gateways, CASB, and remote browser isolation. This convergence not only simplifies management but also enhances security posture and application performance across global networks and cloud environments. Discover how adopting SASE can streamline operations and fortify your enterprise's digital transformation strategy.
Increase Quality with User Access Policies - July 2024Peter Caitens
⭐️ Increase Quality with User Access Policies ⭐️, presented by Peter Caitens and Adam Best of Salesforce. View the slides from this session to hear all about “User Access Policies” and how they can help you onboard users faster with greater quality.
9. Case 3: user-action
●
users performs actions now and then
●
store every events
●
query recent events of a user
10. In RDBMS
Actions
id PK
user_id IDX
name
time
● For fast SELECT id, user_id, name, time FROM Action
WHERE user_id=XXX ORDER BY time DESC LIMIT 10
OFFSET 20, we must create index on user_id.
However, indices will greatly decrease insert speed
for index-rebuild.
13. In RDBMS
Users
Friendships
id IDX
user_id IDX
name
friend_id
sex
type
age
●
SELECT * FROM friendships WHERE
user_id='XXX';
14. In HBase
row column families
info: friend:
<user_id> info:name friend:<user_id>=type
info:sex
info:age
●
actually, it is a graph can be represented by a
sparse matrix.
●
then you can use M/R to find sth interesting.
e.g. the shortest path from user A to user B.
15. Case 5: access log
●
each log line contains time, ip, domain, url,
referer, browser_cookie, login_id, etc
●
will be analyzed every 5 minutes, every hour,
daily, weekly, and monthly
17. In HBase
row column families
http: user
<time><INC_COUNTER> http:ip user:browser_
http:domain cookie
http:url user:login_id
http:referer
INC_COUNTER is used to distinguish the adjacent same time values.