Denis Reznik Data driven future

•

0 likes•209 views

Of course, you know what data is. Probably you know what Big Data and small data is. But what's the heck is that buzz about data? Why is it so important today? These are the questions which will be the topic of the session. This session will be beyond the definitions and descriptions. We will talk about data, about different options for data usage, and how we can benefit from data.

What's hot

Detecting solar farms with deep learning

Jason Brown

K venkata reddy

ClimDev15

This document describes a geospatial modeling tool developed to retrieve climate data from large climate model databases in an efficient manner. The tool integrates R programming with ArcGIS to subset and extract grid point data for specific study areas from netCDF climate model files. It was tested on CORDEX climate model data and found to accurately obtain grid points, providing a less tedious method than manual retrieval. The tool allows climate data to be efficiently obtained and prepared as model inputs.

Deep Learning in Deep Space

Universitat Politècnica de Catalunya

This document discusses the use of machine learning techniques for analyzing astronomical and earth observation data. It notes that large sky survey projects generate huge amounts of data that exceeds our ability to analyze using traditional methods. Machine learning can help process and extract insights from this big data. Specifically, the document discusses how convolutional neural networks have achieved 98% accuracy in classifying galaxy images from sky surveys. It also provides examples of applying machine learning to tasks like terrain classification from earth observation satellite imagery.

Scientific Computing With Amazon Web Services

Jamie Kinney

NERSC, AI and the Superfacility, Debbie Bard

PacificResearchPlatform

The Pacific Research Platform (PRP) aims to achieve transparent and rapid data access among collaborating scientists at multiple institutions through an integrated implementation of data-focused networking that extends the university campus Science DMZ model to a regional, national, and, eventually, a global scale. PRP researchers are routinely achieving high-performance end-to-end networking from their labs to their collaborators’ labs and data centers, traversing multiple, heterogeneous Science DMZs and wide-area networks connecting multiple campus gateways, enabling researchers across the partnership to transfer data over dedicated optical lightpaths at speeds from 10Gb/s to 100Gb/s.

Green material, encryption and gate in Ark Load

Brij Consulting, LLC

Stair Captions and Stair Actions（ステアラボ人工知能シンポジウム2017）

STAIR Lab, Chiba Institute of Technology

Stair Lab introduces two new datasets for deep learning: Stair Captions, a dataset of 100,000 images with captions, and Stair Actions, a dataset of 100 action videos annotated with 1000 action instances. Stair Captions is based on the MS-COCO 2014 dataset and uses a 2D CNN and RNN model to generate captions. Stair Actions contains videos from a research project and aims to help with action recognition tasks. Both datasets are publicly available for researchers through the Stair Lab website.

Analysis of seismic data by clustering and data

Anatol Salanevich

The Next Light Wave: Why Too Much Light is An Issue

GTTP-GHOU-NUCLIO

eventnet asynchronous recursive event processing

Ryuta Shitomi

EventNet is a neural network architecture that can efficiently process asynchronous event streams from event cameras in real-time. It uses a temporal coding function to recursively update the network's state as new events arrive, avoiding redundant computation. Experimental results show it can perform tasks like target motion estimation and ego-motion estimation at rates over 1000 Hz using only the new event data. Compared to frame-based and PointNet approaches, EventNet significantly reduces computation time by recursively updating representations rather than reprocessing all prior events.

Big data in GIS Environment

Shivaprakash Yaragal

This document discusses investigating the capabilities of ESRI products for handling large datasets and big data. The objectives are to study ESRI's existing abilities to process and analyze large data sets, and to examine ESRI's architecture for big data processing. The author works with New York taxi trip data, comparing different processing and visualization methods in Python, ArcPy, and Tableau Public. These include spatial joining, data filtering, and creating visualizations to analyze patterns and outliers. The conclusion evaluates the best method based on processing time, dependencies, and license restrictions. Objective 2 briefly outlines ESRI's machine-based architecture for hosting big data solutions.

EGI impact on science and megatrends

EOSC-hub project

The document discusses the impact and usage trends of the EGI (European Grid Initiative) federated cloud computing infrastructure. It notes that EGI has supported over 23,000 research papers since 2008. Usage of EGI resources has increased significantly in recent years across many research domains, with computing hours increasing by 40% from 2016 to 2017. EGI provides federated cloud computing resources to thousands of individual researchers and supports the long-tail of science through various applications and thematic services.

Mike Warren Keynote

Data Con LA

Mike Warren is the co-founder and CTO of Descartes Labs, a company that operates a geospatial analysis platform using multiple integrated satellite image datasets. The platform provides analysis-ready images with historical records for machine learning and allows users to find, measure, monitor changes over time, and predict future changes to minimize risk and optimize outcomes. It eliminates much of the data preparation time typically required by geospatial scientists by maintaining a growing archive of processed images and a robust pipeline for continuous updates as new images become available.

Internship

Ali Akbari

This document discusses using spatial analysis and mixture of Gaussians modeling to analyze geo-tagged tweets from a city to identify hot spots and patterns in people's behavior over time and location. The goal is to empirically model the spatial density of tweets. The document describes using expectation maximization to fit the mixture of Gaussians model to synthetic and real tweet location data, and issues that arose such as model collapsing and the lack of a global maximum. BIC was used to select the number of clusters but also had limitations. Future work proposed focusing on city centers and understanding when BIC works best.

SDSC Technology Forum: Increasing the Impact of High Resolution Topography Da...

OpenTopography Facility

High-resolution topography is a powerful tool for studying the Earth's surface, vegetation, and urban landscapes, with broad scientific, engineering, and educational-based applications. Over the past decade, there has been dramatic growth in the acquisition of these data for scientific, environmental, engineering and planning purposes. In the US, the U.S. Geological Society is undertaking the 3D Elevation Program (3DEP) to map the entire lower 48 with lidar by 2023. The richness of these topography datasets make them extremely valuable beyond the application that drove their acquisition and thus are of interest to a large and varied user community. A cyberinfrastructure platform that enables users to efficiently discover, access and process these massive volumes of data increases the impact of investments in collection of the data and catalyzes scientific discovery as well as informs critical decisions that are made across our Nation every day that depend on elevation data, ranging from immediate safety of life, property, and environment to long term planning for infrastructure projects. Join us to hear about the motivations, technology, and data assets behind the National Science Foundation funded OpenTopography platform, which aims to democratize access to high resolution topographic data. OpenTopography’s innovation is in co-locating massive volumes of topographic data with processing tools that enable users with varied expertise and application domains to quickly and easily access and process data, to enable innovation and decision making.

Andrii Buryk "Alternative Energy and IT"

LogeekNightUkraine

This document discusses using Predix technology to forecast energy generation from solar power plants. It describes how Predix can be used for now-casting, short-term forecasting, and long-term forecasting of solar energy production. Predix utilizes sensors and analytics to process data from solar installations and predict upcoming energy generation for balancing the energy grid and avoiding blackouts.

Geolocation analysis using HiveQL

Priyanka Kale

The purpose of this study is to develop a system which will assist a user to determine if a location can be entitled as a “Safe” residence or not. The output will be based on an analysis carried out on the local crime history of the city. This involves examining a huge geolocation data and zeroing down to a single area. The area with majority crime incidents will be highlighted as Unsafe. Clicking/hovering on a single record will display name, associated crime and its rank depending on number of crimes occurred. Big Data Hadoop and Hive systems are implemented in Azure for the analysis. Keywords: Hadoop, Big Data, Hive, Azure

Research Data Explored: Two Studies on Data Citation & Usage

Open Knowledge Maps

What's hot (18)

Detecting solar farms with deep learning

K venkata reddy

Deep Learning in Deep Space

Scientific Computing With Amazon Web Services

NERSC, AI and the Superfacility, Debbie Bard

Green material, encryption and gate in Ark Load

Stair Captions and Stair Actions（ステアラボ人工知能シンポジウム2017）

Analysis of seismic data by clustering and data

The Next Light Wave: Why Too Much Light is An Issue

eventnet asynchronous recursive event processing

Big data in GIS Environment

EGI impact on science and megatrends

Mike Warren Keynote

Internship

SDSC Technology Forum: Increasing the Impact of High Resolution Topography Da...

Andrii Buryk "Alternative Energy and IT"

Geolocation analysis using HiveQL

Research Data Explored: Two Studies on Data Citation & Usage

Viewers also liked

Roman Kravchenko Investment in ukrainian io t startups

Аліна Шепшелей

Valerii Vasylkov Erlang. measurements and benefits.

Аліна Шепшелей

The document discusses the benefits of Erlang, including its functional nature, powerful pattern matching, built-in concurrency and fault tolerance through let it crash philosophy, ability to perform distributed computation, and capability for hot code upgrades without downtime. It covers Erlang's actor model approach to concurrency, use of processes and message passing, supervision trees for fault tolerance, and tools for debugging, profiling, and detecting bottlenecks.

Марина Бриль Организация работы маркетинговыхкоманд и экономическое обоснован...

Аліна Шепшелей

Виталий Лаптенок Процессы в продуктовой компании

Аліна Шепшелей

Dmutro Panin JHipster

Аліна Шепшелей

JHipster is a Yeoman generator used to create a Spring Boot and AngularJS project. It saves development time by including accepted practices and scaffolding for both design and runtime. The generator supports technologies like Spring Boot, AngularJS, Bootstrap, and MySQL. Developers can add additional functionality through JHipster modules and sub-generators. The generated projects include tools for testing, deployment to Docker, and integration with services like Elasticsearch.

Ievgen Vladimirov Only cloud

Аліна Шепшелей

Vladimir Mikhel Scrapping the web

Аліна Шепшелей

Sergej Komlach Tensor flow in android

Аліна Шепшелей

This document discusses Google's involvement in virtual reality (VR) and its Daydream VR platform. It outlines some of the key differences between Daydream and other VR platforms like Oculus Rift, provides details on Android OS optimizations and the Google VR SDK for developing Daydream apps, and briefly touches on potential future applications of VR in areas like education, medicine, news and entertainment.

Dmitriy Kouperman Working with legacy systems. stabilization, monitoring, man...

Аліна Шепшелей

About half of the developers, one way or another, faced with the legacy-projects. Not everyone can (and want) work with them. But with the right approach, such projects can be carried out with pleasure and even enthusiasm. We suggest that such a legacy of understanding, what are these project management techniques, practices, and explore the developers consider useful decisions: • Examples of optimization - it's worth a try; • Monitoring applications - JavaMelody; • Monitoring applications - logs and ELK (ELasticSearch + Logstash + Kibana); • Monitoring applications - Java Mission Control and Heap Dump Memory Analyzer Tool.

Elena Morgun Gil in different programming languages

Аліна Шепшелей

Ievgen Umanets Right way of syncing

Аліна Шепшелей

Dmytro Zaitsev Viper: make your mvp cleaner

Аліна Шепшелей

VIPER is an architectural pattern for structuring Android applications. It divides an app into distinct layers - View, Interactor, Presenter, Entity, and Router. The Presenter handles view logic and communication between the View and Interactor. The Interactor contains business logic. The View displays content from the Presenter. VIPER aims to make apps easier to understand, maintain, and test by separating concerns and reducing dependencies between layers. It is best for medium to large apps but may be overkill for small projects.

Anna Lavrova Gladiator in the suit: crisis is our brand!

Аліна Шепшелей

The document discusses the challenges that can arise when a software development team loses its project lead. It notes that without a team lead to guide them, team members may leave the project. It also suggests that the development roadmap could lack estimates, clients may leave if release dates are not met, designs may not follow guidelines, retrospectives may not occur, and it may be unclear who is responsible for creating stories. The document closes by thanking the audience and providing contact information for any questions.

Mihail Patalaha Aso: how to start and how to finish?

Аліна Шепшелей

Mikhail Patalakha is a mobile ASO manager with experience managing over 50 successful projects. He provides tips for optimizing mobile app keywords and rankings, including opening the application, researching competitors' keywords, removing duplicates, getting new keywords from Google Keyword Planner, defining competition and traffic from services like SensorTower and ASOdesk, choosing keywords based on difficulty, and calculating approximate visitor numbers using a provided formula. His contact information is provided for further questions.

Andrew Veles Product design is about the process

Аліна Шепшелей

This document discusses product design and the product design process. It emphasizes that product design is about focus, thinking through every step of the process from initial ideas to implementation. This includes activities like creating portraits, user stories, specifications, site maps, flows, wireframes, prototypes, and UI design. It also notes that the goal is stable growth for the product over time, but that the solution designed may need to change as problems change. Examples are provided of redesigns for a mobile app, desktop app, and logo. The conclusion emphasizes that building the right features for the right users is more challenging than just building features.

Andrey Sobol Blockchain crowdfunding or "mommy, look, i launched ipo"

Аліна Шепшелей

Vladimir Lozanov How to deliver high quality apps to the app store

Аліна Шепшелей

Mobile QA teams are responsible for thoroughly testing apps before release to ensure high quality. They use a variety of manual and automated testing methods at different stages of development. QA works closely with development and customer support to catch bugs, validate fixes, and improve the product based on user feedback. The goal is to deliver stable, bug-free apps through collaboration across teams.

Viewers also liked (17)

Roman Kravchenko Investment in ukrainian io t startups

Valerii Vasylkov Erlang. measurements and benefits.

Марина Бриль Организация работы маркетинговыхкоманд и экономическое обоснован...

Виталий Лаптенок Процессы в продуктовой компании

Dmutro Panin JHipster

Ievgen Vladimirov Only cloud

Vladimir Mikhel Scrapping the web

Sergej Komlach Tensor flow in android

Dmitriy Kouperman Working with legacy systems. stabilization, monitoring, man...

Elena Morgun Gil in different programming languages

Ievgen Umanets Right way of syncing

Dmytro Zaitsev Viper: make your mvp cleaner

Anna Lavrova Gladiator in the suit: crisis is our brand!

Mihail Patalaha Aso: how to start and how to finish?

Andrew Veles Product design is about the process

Andrey Sobol Blockchain crowdfunding or "mommy, look, i launched ipo"

Vladimir Lozanov How to deliver high quality apps to the app store

Similar to Denis Reznik Data driven future

SQL Server Deep Dive, Denis Reznik

Sigma Software

This document provides an overview of how SQL Server processes queries. It discusses the key components like the query processor, parser, algebrizer, optimizer and executor. The query processor breaks queries into logical and physical representations. The optimizer chooses the most efficient execution plan. The executor then runs the query. It also touches on topics like parameter sniffing, locking, deadlocks and the thread pool model.

Louise McCluskey, Kx Engineer at Kx Systems

Dataconomy Media

This document summarizes Kx Systems, a company that provides a high-performance time-series database called kdb+. Kdb+ can process and analyze large volumes of real-time and historical time-series data extremely fast with low latency. It is widely used in financial services and is now being applied to other industries like manufacturing, utilities, and life sciences. Kx Systems offers software, consulting services, and can help clients integrate kdb+ with their existing technologies and scale their deployments.

Tooling Up for Efficiency: DIY Solutions @ Netflix - ABD319 - re:Invent 2017

Amazon Web Services

At Netflix, we have traditionally approached cloud efficiency from a human standpoint, whether it be in-person meetings with the largest service teams or manually flipping reservations. Over time, we realized that these manual processes are not scalable as the business continues to grow. Therefore, in the past year, we have focused on building out tools that allow us to make more insightful, data-driven decisions around capacity and efficiency. In this session, we discuss the DIY applications, dashboards, and processes we built to help with capacity and efficiency. We start at the ten thousand foot view to understand the unique business and cloud problems that drove us to create these products, and discuss implementation details, including the challenges encountered along the way. Tools discussed include Picsou, the successor to our AWS billing file cost analyzer; Libra, an easy-to-use reservation conversion application; and cost and efficiency dashboards that relay useful financial context to 50+ engineering teams and managers.

Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...

Ian Foster

This document discusses computing challenges posed by rapidly increasing data scales in scientific applications and high performance computing. It introduces the concept of online data analysis and reduction as an alternative to traditional offline analysis to help address these challenges. The key messages are that dramatic changes in HPC system geography due to different growth rates of technologies are driving new application structures and computational logistics problems, presenting exciting new computer science opportunities in online data analysis and reduction.

HPC Cluster Computing from 64 to 156,000 Cores

inside-BigData.com

In this video from ChefConf 2014 in San Francisco, Cycle Computing CEO Jason Stowe outlines the biggest challenge facing us today, Climate Change, and suggests how Cloud HPC can help find a solution, including ideas around Climate Engineering, and Renewable Energy. "As proof points, Jason uses three use cases from Cycle Computing customers, including from companies like HGST (a Western Digital Company), Aerospace Corporation, Novartis, and the University of Southern California. It’s clear that with these new tools that leverage both Cloud Computing, and HPC – the power of Cloud HPC enables researchers, and designers to ask the right questions, to help them find better answers, faster. This all delivers a more powerful future, and means to solving these really difficult problems." Watch the video presentation: http://insidehpc.com/2014/09/video-hpc-cluster-computing-64-156000-cores/

Earth on AWS - Next-Generation Open Data Platforms

Amazon Web Services

Making Earth observation data available by using Amazon S3 is accelerating scientific discovery and enabling the creation of new products. Attend and learn how the scale and performance of Amazon S3 lets earth scientists, researchers, startups, and GIS professionals gather and analyse planetary-scale data without worrying about limitations of bandwidth, storage, memory, or processing power. Co-presented with support of the Australian Geoscience Data Cube collaboration, DigitalGlobe’s Geospatial Big Data Platform and the developer of the popular ObservedEarth mobile app. Speakers: Craig Lawton, Public Sector Solutions Architect, Amazon Web Services Lachlan Hurst, Observed Earth Matt Paget, Senior Experimental Scientist, CSIRO Dan Getman, Digital Globe

Program on Mathematical and Statistical Methods for Climate and the Earth Sys...

The Statistical and Applied Mathematical Sciences Institute

This document discusses using deep learning techniques to detect extreme weather patterns in climate data. It begins by outlining the scientific motivation and successes of deep learning in computer vision. It then describes early successes applying deep learning to climate science tasks like classifying tropical cyclones, atmospheric rivers, and weather fronts. Challenges include dealing with multi-variate climate data and lack of labeled examples. Future work involves creating unified deep learning models that can perform detection, localization, and segmentation of extreme weather across different climate datasets.

Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...

Ian Foster

Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16

Boris Adryan

Das Gesetz der großen Zahlen gilt immer: Die statistische Sicherheit nimmt mit der Anzahl der Datenpunkte immer zu, sofern die Datennahme fair erfolgt. Leider kostet das Sammeln der Daten oftmals Geld, und so ist man vor allem im Bereich der Sensorik (Stichwort: Internet der Dinge) gezwungen, sinnvolle Kompromisse einzugehen. In diesem Vortrag fasse ich die Erkenntnisse eines Projekts zusammen, in dem die Datenanalytik zeigte, dass man zukünftig nur 60% der ausgebrachten Sensoren wirklich braucht. Auch muss es nicht immer Echtzeit-Analyse sein: Mit einer auf den Business-Case abgestimmten Datenstrategie lassen sich unnötige Ausgaben vermeiden.

What is a Data Commons and Why Should You Care?

Robert Grossman

PhD Thesis Proposal

Ziqiang Feng

This thesis proposal aims to develop a system called Eureka to efficiently discover training data for visual machine learning tasks. Eureka combines early discard filters, just-in-time machine learning, and the ability to create more accurate filters without writing new code. The goal is to reduce the manual effort required of domain experts to find and label rare phenomena in large unlabeled visual datasets. The proposal outlines research thrusts to apply Eureka in different computing environments like edge, cloud, and smart storage, as well as different problem domains including images, videos, and other multidimensional data. Initial experiments show Eureka can discover more true positives per unit time compared to naive hand-labeling.

How to expand the Galaxy from genes to Earth in six simple steps (and live sm...

Raffaele Montella

FACE-IT is an effort to develop a new IT infrastructure to accelerate existing disciplinary research and enable information transfer among traditionally separate fields. At present, finding data and processing it into usable form can dominate research efforts. By providing ready access to not only data but also the software tools used to process it for specific uses (e.g., climate impact and economic model inputs), FACE-IT allows researchers to concentrate their efforts on analysis. Lowering barriers to data access allows researchers to stretch in new directions and allows researchers to learn and respond to the needs of other fields. FACE-IT builds on the Globus Galaxies platform, which has been developed over the past several years at the University of Chicago. FACE-IT also benefit from substantial software development undertaken by the communities who have developed most of the domain-specific tools required to populate FACE-IT with useful capabilities. The FACE-IT Galaxy manages earth system datatypes (as NetCDF), new tool parameters (dates, map, opendap), aggregated datatypes (RAFT), service providers and cool map visualizers.

Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...

Dataconomy Media

A modified k means algorithm for big data clustering

SK Ahammad Fahad

Amount of data is getting bigger in every moment and this data comes from everywhere; social media, sensors, search engines, GPS signals, transaction records, satellites, financial markets, ecommerce sites etc. This large volume of data may be semi-structured, unstructured or even structured. So it is important to derive meaningful information from this huge data set. Clustering is the process to categorize data such that data are grouped in the same cluster when they are similar according to specific metrics. In this paper, we are working on k-mean clustering technique to cluster big data. Several methods have been proposed for improving the performance of the k-means clustering algorithm. We propose a method for making the algorithm less time consuming, more effective and efficient for better clustering with reduced complexity. According to our observation, quality of the resulting clusters heavily depends on the selection of initial centroid and changes in data clusters in the subsequence iterations. As we know, after a certain number of iterations, a small part of the data points change their clusters. Therefore, our proposed method first finds the initial centroid and puts an interval between those data elements which will not change their cluster and those which may change their cluster in the subsequence iterations. So that it will reduce the workload significantly in case of very large data sets. We evaluate our method with different sets of data and compare with others methods as well.

Totten presidio presentation feb 20 2015 pdf

Michael P Totten

As a Presidio Fellow in Sustainability and Sports, at the Presidio Graduate School, San Francisco, CA, [http://www.presidio.edu/academics/presidiopro/certificates/sports- sustainability] I presented a class on energy efficiency and solar in sports stadiums and arenas. It covers related issues of advanced BIM (Building Information Modeling or Building Intelligence Management), Internet of Everything (IoT), continuous commissioning over building lifecycle, LED lighting systems, and more.

Benchmarking search relevance in industry vs academia

Nick Craswell

[CS570] Machine Learning Team Project (I know what items really are)

Kunwoo Park

This document summarizes a team's approach to predicting which items users might be interested in using a recommendation system. It describes extracting features from user and item metadata to train an SVM model, but this was too computationally expensive. Instead, the team used logistic regression with stochastic gradient descent. They tested features like age, gender and network similarities. Their combined model outperformed random prediction baselines on the KDD Cup 2012 Track 1 dataset.

Graph tour keynote 2019

Neo4j

This document discusses graph databases and Neo4j. It begins with an agenda that includes stories about graph databases in Washington DC, the state of graph databases in 2019, innovation waves, and recommendations for the future regarding AI and graphs. It then provides examples of how Neo4j is being used by organizations like ICIJ, NASA, and to search for cures for cancer. The document discusses the graph database market and Neo4j's dominance. It outlines the Neo4j graph platform vision and upcoming features. Specific customer use cases are presented, including ones for the German Center for Diabetes Research, Caterpillar, and DeviantArt.

MySQL vs. MonetDB

"FENG "GEORGE"" YU

WebServices_Grid.ppt

EqinNiftalyev

Jim Gray presented on his work with large databases and grid computing. He discussed two major projects - TerraServer and SkyServer/World Wide Telescope. TerraServer is a photo database of the United States containing over 15 TB of imagery data accessed through an SQL database. SkyServer is a database of astronomical data containing images and attributes of celestial objects from surveys like SDSS. Gray discussed lessons learned from building and managing these large databases, and future plans to build databases from inexpensive disk bricks. He advocated for grid computing through web services as a way to federate and access distributed data sources on the internet.

Similar to Denis Reznik Data driven future (20)

SQL Server Deep Dive, Denis Reznik

Louise McCluskey, Kx Engineer at Kx Systems

Tooling Up for Efficiency: DIY Solutions @ Netflix - ABD319 - re:Invent 2017

Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...

HPC Cluster Computing from 64 to 156,000 Cores

Earth on AWS - Next-Generation Open Data Platforms

Program on Mathematical and Statistical Methods for Climate and the Earth Sys...

Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...

Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16

What is a Data Commons and Why Should You Care?

PhD Thesis Proposal

How to expand the Galaxy from genes to Earth in six simple steps (and live sm...

Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...

A modified k means algorithm for big data clustering

Totten presidio presentation feb 20 2015 pdf

Benchmarking search relevance in industry vs academia

[CS570] Machine Learning Team Project (I know what items really are)

Graph tour keynote 2019

MySQL vs. MonetDB

WebServices_Grid.ppt

More from Аліна Шепшелей

Oleksandr Yefremov Continuously delivering mobile project

Аліна Шепшелей

This document discusses best practices for continuously delivering mobile projects. It outlines a CI/CD workflow that includes running tests and manual QA on pull requests, notifying stakeholders, automatically generating changelogs and version bumps, preparing release artifacts, and publishing them to stores or S3. Key steps are running tests on pull requests, using strict PR naming conventions, notifying teams in Slack, automating versioning and publishing with scripts and Fastlane, and deploying beta builds to Fabric/Crashlytics. The full workflow aims to streamline mobile releases by automating repetitive tasks and integrating all steps.

Alexander Voronov Test driven development in real world

Аліна Шепшелей

This document discusses test-driven development (TDD) practices. It covers topics like the benefits of cleaner interfaces and unbiased design when tests are written first. It also addresses challenges like introducing TDD to an existing codebase or team. Key points emphasized are starting simple with critical features, finding the lowest testable point, and making incremental changes to introduce tests and refactoring step-by-step. Continuous integration practices are also highlighted.

Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...

Аліна Шепшелей

Valerii Iakovenko Drones as the part of the present

Аліна Шепшелей

Valerii Moisieienko Apache hbase workshop

Аліна Шепшелей

This document provides an overview and agenda for an Apache HBase workshop. It introduces HBase as an open-source NoSQL database built on Hadoop that uses a column-family data model. The agenda covers what HBase is, its data model including rows, columns, cells and versions, CRUD operations, architecture including regions and masters, schema design best practices, and the Java API. Performance tips are given for client reads and writes such as using batches, caching, and tuning durability.

Anton Ivinskyi Application level metrics and performance tests

Аліна Шепшелей

Миша Рыбачук Что такое дизайн?

Аліна Шепшелей

Макс Семенчук Дизайнер, которому доверяют

Аліна Шепшелей

Anton Parkhomenko Boost your design workflow or git rebase for designers

Аліна Шепшелей

Kononenko Alina Designing for Apple Watch and Apple TV

Аліна Шепшелей

Gregory Shehet Undefined' on prod, or how to test a react app

Аліна Шепшелей

Alexey Osipenko Basics of functional reactive programming

Аліна Шепшелей

Roman Ugolnikov Migrationа and sourcecontrol for your db

Аліна Шепшелей

The document discusses database migration and source control. It describes how database structure, data, and logic can change across versions. It recommends using tools like Liquibase and Flyway to manage database schema changes and keep the database schema in sync with code. These tools allow defining changes in migration files and rolling back changes if needed. The document also covers how the tools work, supported databases, file formats, preconditions, and provides a demo of using the tools for a sample database migration.

Alex Theedom Java ee revisits design patterns

Аліна Шепшелей

Enter "Django Channels": new way of desinging and thinking about your application. It separates transport and processing concerns in typical Django project using combination of ASGI (Asynchronous Server Gateway Interface) and worker processes, enabling your application to be "event-oriented" and implement new workflows for processing your data. How does it work? What do you need to start? Is it even useful? Learn for yourself with this introductory talk.

Alexey Tokar To find a needle in a haystack

Аліна Шепшелей

The talk will cover core principles of text search applicable to fixed size dictionaries. We will have a deep look at some algorithms which are deeply hidden inside huge search engines or basic search inputs on web-sites. My goal is to provide comparison between different search approaches and provide objective assessment based on complexity, memory consumption and CPU utilization of each of them.

Volodymyr Getmanskyi How to build a dynamic pricing model using big data

Аліна Шепшелей

Maksym Antipov Hardware development as a hobby and a job

Аліна Шепшелей

Den Golotyuk Big data from 30 million daily users

Аліна Шепшелей

This document summarizes the key details of an analytics company called .io over the past year. In 3 sentences: The company has grown significantly in the past year, now serving over 30 million uniques across 200 customers globally. They focus on collecting and processing huge data flows in a simple way for customers by handling complex analytics internally and providing simple outputs. The company is supported by a small team of 4 engineers and processes over 2 billion daily requests and 100GB of daily backups across 150 cloud and physical nodes.

Anton Fedorchenko Swift for server side development

Аліна Шепшелей

Since Swift programming language was open sourced in December 2015, its popularity has boomed. This smart move from Apple introduced new opportunities for the languages and increased its impact on the developer community. This includes expanding Swift to other platforms and using it for server-side development. The presentation gives an introduction to the server-side development with Swift, highlights most popular frameworks and solutions, covers key questions regarding the language adoption.

Ruslan Shevchenko Programming languages landscape: new & old ideas

Аліна Шепшелей

More from Аліна Шепшелей (20)

Oleksandr Yefremov Continuously delivering mobile project

Alexander Voronov Test driven development in real world

Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...

Valerii Iakovenko Drones as the part of the present

Valerii Moisieienko Apache hbase workshop

Anton Ivinskyi Application level metrics and performance tests

Миша Рыбачук Что такое дизайн?

Макс Семенчук Дизайнер, которому доверяют

Anton Parkhomenko Boost your design workflow or git rebase for designers

Kononenko Alina Designing for Apple Watch and Apple TV

Gregory Shehet Undefined' on prod, or how to test a react app

Alexey Osipenko Basics of functional reactive programming

Roman Ugolnikov Migrationа and sourcecontrol for your db

Alex Theedom Java ee revisits design patterns

Alexey Tokar To find a needle in a haystack

Volodymyr Getmanskyi How to build a dynamic pricing model using big data

Maksym Antipov Hardware development as a hobby and a job

Den Golotyuk Big data from 30 million daily users

Anton Fedorchenko Swift for server side development

Ruslan Shevchenko Programming languages landscape: new & old ideas

Recently uploaded

Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence

Quentin Reul

The democratization of Generative AI is ushering in a new era of innovation for enterprises. Discover how you can harness this powerful technology to deliver unparalleled customer value and securing a formidable competitive advantage in today's competitive market. In this session, you will learn how to: - Identify high-impact customer needs with precision - Harness the power of large language models to address specific customer needs effectively - Implement AI responsibly to build trust and foster strong customer relationships Whether you're at the early stages of your AI journey or looking to optimize existing initiatives, this session will provide you with actionable insights and strategies needed to leverage AI as a powerful catalyst for customer-driven enterprise success.

TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification

TrustArc

In a landmark year marked by significant AI advancements, it’s vital to prioritize transparency, accountability, and respect for privacy rights with your AI innovation. Learn how to navigate the shifting AI landscape with our innovative solution TRUSTe Responsible AI Certification, the first AI certification designed for data protection and privacy. Crafted by a team with 10,000+ privacy certifications issued, this framework integrated industry standards and laws for responsible AI governance. This webinar will review: - How compliance can play a role in the development and deployment of AI systems - How to model trust and transparency across products and services - How to save time and work smarter in understanding regulatory obligations, including AI - How to operationalize and deploy AI governance best practices in your organization

FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx

FIDO Alliance

Generative AI technology is a fascinating field that focuses on creating comp...

Nohoax Kanont

Generative AI technology is a fascinating field that focuses on creating computer models capable of generating new, original content. It leverages the power of large language models, neural networks, and machine learning to produce content that can mimic human creativity. This technology has seen a surge in innovation and adoption since the introduction of ChatGPT in 2022, leading to significant productivity benefits across various industries. With its ability to generate text, images, video, and audio, generative AI is transforming how we interact with technology and the types of tasks that can be automated.

What's New in Teams Calling, Meetings, Devices June 2024

Stephanie Beckett

FIDO Munich Seminar: FIDO Tech Principles.pptx

FIDO Alliance

UiPath Community Day Amsterdam: Code, Collaborate, Connect

UiPathCommunity

Welcome to our third live UiPath Community Day Amsterdam! Come join us for a half-day of networking and UiPath Platform deep-dives, for devs and non-devs alike, in the middle of summer ☀. 📕 Agenda: 12:30 Welcome Coffee/Light Lunch ☕ 13:00 Event opening speech Ebert Knol, Managing Partner, Tacstone Technology Jonathan Smith, UiPath MVP, RPA Lead, Ciphix Cristina Vidu, Senior Marketing Manager, UiPath Community EMEA Dion Mes, Principal Sales Engineer, UiPath 13:15 ASML: RPA as Tactical Automation Tactical robotic process automation for solving short-term challenges, while establishing standard and re-usable interfaces that fit IT's long-term goals and objectives. Yannic Suurmeijer, System Architect, ASML 13:30 PostNL: an insight into RPA at PostNL Showcasing the solutions our automations have provided, the challenges we’ve faced, and the best practices we’ve developed to support our logistics operations. Leonard Renne, RPA Developer, PostNL 13:45 Break (30') 14:15 Breakout Sessions: Round 1 Modern Document Understanding in the cloud platform: AI-driven UiPath Document Understanding Mike Bos, Senior Automation Developer, Tacstone Technology Process Orchestration: scale up and have your Robots work in harmony Jon Smith, UiPath MVP, RPA Lead, Ciphix UiPath Integration Service: connect applications, leverage prebuilt connectors, and set up customer connectors Johans Brink, CTO, MvR digital workforce 15:00 Breakout Sessions: Round 2 Automation, and GenAI: practical use cases for value generation Thomas Janssen, UiPath MVP, Senior Automation Developer, Automation Heroes Human in the Loop/Action Center Dion Mes, Principal Sales Engineer @UiPath Improving development with coded workflows Idris Janszen, Technical Consultant, Ilionx 15:45 End remarks 16:00 Community fun games, sharing knowledge, drinks, and bites 🍻

It's your unstructured data: How to get your GenAI app to production (and spe...

Zilliz

So you've successfully built a GenAI app POC for your company -- now comes the hard part: bringing it to production. Aparavi addresses the challenges of AI projects while addressing data privacy and PII. Our Service for RAG helps AI developers and data scientists to scale their app to 1000s to millions of users using corporate unstructured data. Aparavi’s AI Data Loader cleans, prepares and then loads only the relevant unstructured data for each AI project/app, enabling you to operationalize the creation of GenAI apps easily and accurately while giving you the time to focus on what you really want to do - building a great AI application with useful and relevant context. All within your environment and never having to share private corporate data with anyone - not even Aparavi.

Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...

OnBoard

FIDO Munich Seminar In-Vehicle Payment Trends.pptx

FIDO Alliance

What's New in Copilot for Microsoft 365 June 2024.pptx

Stephanie Beckett

Indian Privacy law & Infosec for Startups

AMol NAik

AMD Zen 5 Architecture Deep Dive from Tech Day

Low Hong Chuan

"Making .NET Application Even Faster", Sergey Teplyakov.pptx

Fwdays

In this talk we're going to explore performance improvement lifecycle, starting with setting the performance goals, using profilers to figure out the bottle necks, making a fix and validating that the fix works by benchmarking it. The talk will be useful for novice and seasoned .NET developers and architects interested in making their application fast and understanding how things work under the hood.

Demystifying Neural Networks And Building Cybersecurity Applications

Priyanka Aash

In today's rapidly evolving technological landscape, Artificial Neural Networks (ANNs) have emerged as a cornerstone of artificial intelligence, revolutionizing various fields including cybersecurity. Inspired by the intricacies of the human brain, ANNs have a rich history and a complex structure that enables them to learn and make decisions. This blog aims to unravel the mysteries of neural networks, explore their mathematical foundations, and demonstrate their practical applications, particularly in building robust malware detection systems using Convolutional Neural Networks (CNNs).

Discovery Series - Zero to Hero - Task Mining Session 1

DianaGray10

Camunda Chapter NY Meetup July 2024.pptx

ZachWylie3

Scaling Vector Search: How Milvus Handles Billions+

Zilliz

How UiPath Discovery Suite supports identification of Agentic Process Automat...

DianaGray10

📚 Understand the basics of the newly persona-based LLM-powered Agentic Process Automation and discover how existing UiPath Discovery Suite products like Communication Mining, Process Mining, and Task Mining can be leveraged to identify APA candidates. Topics Covered: 💡 Idea Behind APA: Explore the innovative concept of Agentic Process Automation and its significance in modern workflows. 🔄 How APA is Different from RPA: Learn the key differences between Agentic Process Automation and Robotic Process Automation. 🚀 Discover the Advantages of APA: Uncover the unique benefits of implementing APA in your organization. 🔍 Identifying APA Candidates with UiPath Discovery Products: See how UiPath's Communication Mining, Process Mining, and Task Mining tools can help pinpoint potential APA candidates. 🔮 Discussion on Expected Future Impacts: Engage in a discussion on the potential future impacts of APA on various industries and business processes. Enhance your knowledge on the forefront of automation technology and stay ahead with Agentic Process Automation. 🧠💼✨ Speakers: Arun Kumar Asokan, Delivery Director (US) @ qBotica and UiPath MVP Naveen Chatlapalli, Solution Architect @ Ashling Partners and UiPath MVP

The Challenge of Interpretability in Generative AI Models.pdf

Sara Kroft

Navigating the intricacies of generative AI models reveals a pressing challenge: interpretability. Our blog delves into the complexities of understanding how these advanced models make decisions, shedding light on the mechanisms behind their outputs. Explore the latest research, practical implications, and ethical considerations, as we unravel the opaque processes that drive generative AI. Join us in this insightful journey to demystify the black box of artificial intelligence. Dive into the complexities of generative AI with our blog on interpretability. Find out why making AI models understandable is key to trust and ethical use and discover current efforts to tackle this big challenge.

Recently uploaded (20)

Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence

TrustArc Webinar - Innovating with TRUSTe Responsible AI Certification

FIDO Munich Seminar: Biometrics and Passkeys for In-Vehicle Apps.pptx

Generative AI technology is a fascinating field that focuses on creating comp...

What's New in Teams Calling, Meetings, Devices June 2024

FIDO Munich Seminar: FIDO Tech Principles.pptx

UiPath Community Day Amsterdam: Code, Collaborate, Connect

It's your unstructured data: How to get your GenAI app to production (and spe...

Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...

FIDO Munich Seminar In-Vehicle Payment Trends.pptx

What's New in Copilot for Microsoft 365 June 2024.pptx

Indian Privacy law & Infosec for Startups

AMD Zen 5 Architecture Deep Dive from Tech Day

"Making .NET Application Even Faster", Sergey Teplyakov.pptx

Demystifying Neural Networks And Building Cybersecurity Applications

Discovery Series - Zero to Hero - Task Mining Session 1

Camunda Chapter NY Meetup July 2024.pptx

Scaling Vector Search: How Milvus Handles Billions+

How UiPath Discovery Suite supports identification of Agentic Process Automat...

The Challenge of Interpretability in Generative AI Models.pdf

Denis Reznik Data driven future

1. Data-Driven Future What to Learn and What to Expect? Denis Reznik Data Architect at Intapp Kyiv Microsoft Data Platform MVP

2. About me •Denis Reznik •Kyiv, Ukraine •Data Architect at Intapp, Inc. •Microsoft Data Platform MVP •Co-Founder of Ukrainian Data Community 2 |

3. Agenda •Data is a new Oil (c) •Data and Science •Data in Big Companies •Data and Application Development •Data-Driven Future

4. Data is a New Oil “Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.” (c) Clive Humby, UK Mathemetician

5. Data and Science •Thousands of years •Empirical •Few hundreds of years •Theoretical •Last fifty years •Computational •“Query the world” •Last twenty years •eScience (Data Science) •“Download the world”

6. Machine Learning Supervised Learning Unsupervised Learning Classification Regression

7. Linear Regression Learning Algorithm Training Data h h - Hypothesis Ocean Temperature Whales Population

8. DEMO Linear Regression

9. Data in Big Companies

10. Parallel Processing Temperature Sensor Datasets (n Items) Q: How many times temperature was above the norm during the last week? A: 5 Time: 2 sec Algorithmic Complexity: O(n)

11. Parallel Processing Temperature Sensor Datasets (k Items in each one) Q: How many times temperature was above the norm during the last week? A: 1 Time: 0.5 sec Algorithmic Complexity: O(n/k) A: 0 A: 3 A: 4

12. Map-Reduce A: 1 Map -> COUNT(*) WHERE Value > 40 A: 0 A: 3 A: 4 Reduce -> COUNT(*) A: 5 Reduce

13. DEMO Map-Reduce

14. Data and Application Development source: https://www.youtube.com/watch?v=t6kM2EM6so4

15. Index (B-Tree) - Seek … … 1 .. 1M 1 .. 2K 2K+1 .. 4K 1M-2K .. 1M 1 .. 300 301..800 801..1,5K 1,5K+1..2K SELECT * FROM Users WHERE Id = 523

16. Index (B-Tree) - Scan … … 1 .. 1M 1 .. 2K 2K+1 .. 4K 1M-2K .. 1M 1 .. 300 301..800 801..1,5K 1,5K+1..2K SELECT * FROM Users

17. Index (B-Tree) - Range Scan … … 1 .. 1M 1 .. 2K 2K+1 .. 4K 1M-2K .. 1M 1 .. 300 301..800 801..1,5K 1,5K+1..2K SELECT * FROM Users WHERE Id BETWEEN 700 AND 1700

18. Hashtable John Dow John Snow Jack Snack 2 3 1 4 0 John Dow Hash Function 0 Jack Snack 2 John Snow 0

19. Data-Driven Future •Data amount is growing and this is cool •More and more decisions are based on data •More and more applications are developed •It is exciting to be a Software Engineer now!

20. Thank you! Denis Reznik Twitter: @denisreznik Email: denisreznik@live.ru Blog: http://reznik.uneta.com.ua Facebook: https://www.facebook.com/denis.reznik.5 LinkedIn: http://ua.linkedin.com/pub/denis-reznik/3/502/234

Denis Reznik Data driven future

Related slideshows

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (17)

Similar to Denis Reznik Data driven future

Similar to Denis Reznik Data driven future (20)

More from Аліна Шепшелей

More from Аліна Шепшелей (20)

Recently uploaded

Recently uploaded (20)

Denis Reznik Data driven future