“Roger shares deep expertise (and "deep learning" insights too) with poise, clarity and humility. His data science classes provided a wealth of knowledge with practical insight and were extremely helpful. If you're looking for a strong technical mentor or someone to coach you out of a data science minefield, put Roger on speed dial! Thank you Roger for sharing useful expertise in a fresh, genuine, extremely helpful fashion.”
About
Activity
-
Building scalable solutions to address mental health is at the core of what we do at Headspace, and one of the foundational pillars of our strategy…
Building scalable solutions to address mental health is at the core of what we do at Headspace, and one of the foundational pillars of our strategy…
Liked by Roger Barga
-
If you are experiencing a windows system not booting up, some steps from this old blog of mine could help. The steps are very similar with different…
If you are experiencing a windows system not booting up, some steps from this old blog of mine could help. The steps are very similar with different…
Liked by Roger Barga
Publications
-
Predictive Analytics with Microsoft Azure Machine Learning: Build and Deploy Actionable Solutions in Minutes. Second Edition
Apress
Data Science and Machine Learning are in high demand, as customers are increasingly looking for ways to glean insights from all their data. More customers now realize that Business Intelligence is not enough as the volume, speed and complexity of data now defy traditional analytics tools. While Business Intelligence addresses descriptive and diagnostic analysis, Data Science unlocks new opportunities through predictive and prescriptive analysis.
The purpose of this book is to provide a…Data Science and Machine Learning are in high demand, as customers are increasingly looking for ways to glean insights from all their data. More customers now realize that Business Intelligence is not enough as the volume, speed and complexity of data now defy traditional analytics tools. While Business Intelligence addresses descriptive and diagnostic analysis, Data Science unlocks new opportunities through predictive and prescriptive analysis.
The purpose of this book is to provide a gentle and instructionally organized introduction to the field of data science and machine learning, with a focus on building and deploying predictive models.Other authorsSee publication -
Project Daytona: Data Analytics as a Cloud Service
Proceedings of the International Conference of Data Engineering (ICDE), International Conference on Data Engineering, 7 March 2012
Spreadsheets are established data collection and analysis tools in business, technical computing and academic research. Excel, for example, offers an attractive user interface, provides an easy to use data entry model, and offers substantial interactivity for what-if analysis. However, spreadsheets and other common client applications do not offer scalable computation for large scale data analytics and exploration. Increasingly researchers in domains ranging from the social sciences to…
Spreadsheets are established data collection and analysis tools in business, technical computing and academic research. Excel, for example, offers an attractive user interface, provides an easy to use data entry model, and offers substantial interactivity for what-if analysis. However, spreadsheets and other common client applications do not offer scalable computation for large scale data analytics and exploration. Increasingly researchers in domains ranging from the social sciences to environmental sciences are faced with a deluge of data, often sitting in spreadsheets such as Excel or other client applications, and they lack a convenient way to explore the data, to find related data sets, or to invoke scalable analytical models over the data. To address these limitations, we have developed a cloud data analytics service based on Daytona, which is an iterative MapReduce runtime optimized for data analytics. In our model, Excel and other existing client applications provide the data entry and user interaction surfaces, Daytona provides a scalable runtime on the cloud for data analytics, and our service seamlessly bridges the gap between the client and cloud. Any analyst can use our data analytics service to discover and import data from the cloud, invoke cloud scale data analytics algorithms to extract information from large datasets, invoke data visualization, and then store the data back to the cloud all through a spreadsheet or other client application they are already familiar with.
Other authorsSee publication -
CloudClustering: Toward an iterative data processing pattern on the cloud
Proceedings IEEE Cloud 2011, The 4th International Conference on Cloud Computing, IEEE Computer Society
As the emergence of cloud computing brings the potential for large-scale data analysis to a broader community, architectural patterns for data analysis on the cloud, especially iterative algorithms, are increasingly useful. MapReduce suffers performance limitations for this purpose as it is not inherently designed for iterative algorithms.
In this paper we describe our implementation of Cloud-Clustering, a distributed k-means clustering algorithm on Microsoft’s Windows Azure cloud. The…As the emergence of cloud computing brings the potential for large-scale data analysis to a broader community, architectural patterns for data analysis on the cloud, especially iterative algorithms, are increasingly useful. MapReduce suffers performance limitations for this purpose as it is not inherently designed for iterative algorithms.
In this paper we describe our implementation of Cloud-Clustering, a distributed k-means clustering algorithm on Microsoft’s Windows Azure cloud. The k-means algorithm makes a good case study because its characteristics are representative of many iterative data analysis algorithms. CloudClustering adopts a novel architecture to improve performance without sacrificing fault tolerance. To achieve this goal, we introduce a distributed fault tolerance mechanism called the buddy system, and we make use of data affinity and checkpointing. Our goal is to generalize thisOther authors -
-
A Scalable Communication Runtime for Clouds
Proceedings IEEE Cloud 2011, The 4th International Conference on Cloud Computing, IEEE Computer Society
Leveraging cloud computing to acquire the necessary computation resources to scale out parallel applications is becoming common practice. However, many such applications also require communication and synchronization between processes. Although, commercial cloud platforms provide ready access to scalable compute and storage services, implementing communication and synchronization between cooperating processes and efficiently exchanging arbitrary size messages remains a challenge for application…
Leveraging cloud computing to acquire the necessary computation resources to scale out parallel applications is becoming common practice. However, many such applications also require communication and synchronization between processes. Although, commercial cloud platforms provide ready access to scalable compute and storage services, implementing communication and synchronization between cooperating processes and efficiently exchanging arbitrary size messages remains a challenge for application developers. In clouds, durable queues provide basic abstractions for communication. However, they are not sufficient for applications that require transferring arbitrary size messages or for applications that require higher level abstractions such as broadcast. Furthermore, direct socket based communication is susceptible to various fluctuations common in data center environments. We envision a solution to this problem that leverages scalable storage services, queues, and direct socket based communication. Publish/subscribe (pub/sub) is a well-known communication pattern that can achieve the above capabilities in a loosely coupled fashion, which is highly desirable in cloud environments where most services are asynchronous. In this paper, we describe the architecture of a pub/sub library implemented on a commercial cloud computing platform, which can be used to develop various parallel applications. We also present an evaluation of our implementation using both micro benchmarks and a real world application. Together, these demonstrate that our approach is both effective and scalable in performing communication and synchronization in cloud scale applications.
Other authors -
-
Accurate Latency Estimation in a Distributed Event Processing System
27th International Conference on Data Engineering (ICDE '11)
-
Bioinformatics and Data-Intensive Scientific Discovery in the Beginning of the 21st Century
in OMICS A Journal of Integrative Biology
-
The Client and the Cloud: Democratizing Research Computing
IEEE Internet Computing, IEEE Computer Society
-
Versioning for Workflow Evolution
Third International Workshop on Data Intensive Distributed Computing held in conjunction with the 19th International Symposium on High Performance Distributed Computing (HPDC'10)
Scientists working in eScience environments often use workflows to carry out their computations. Since the workflows evolve as the research itself evolves, these workflows can be a tool for tracking the evolution of the research. Scientists can trace their research and associated results through time or even go back in time to a previous stage and fork to a new branch of research. In this paper we introduce the workflow evolution framework (EVF), which is demonstrated through implementation in…
Scientists working in eScience environments often use workflows to carry out their computations. Since the workflows evolve as the research itself evolves, these workflows can be a tool for tracking the evolution of the research. Scientists can trace their research and associated results through time or even go back in time to a previous stage and fork to a new branch of research. In this paper we introduce the workflow evolution framework (EVF), which is demonstrated through implementation in the Trident workflow workbench. The primary contribution of the EVF is efficient management of knowledge associated with workflow evolution. Since we believe evolution can be used for workflow attribution, our framework will motivate researchers to share their workflows and get the credit for their contributions.
-
Building the Trident Scientific Workflow Workbench for Data Management in the Cloud
International Conference on Advanced Engineering Computing and Applications in Sciences (ADVCOMP), IEEE
-
GrayWulf: Scalable Software Architecture for Data Intensive Computing
Hawaii International Conference on System Sciences (HICSS), IEEE Computer Society
-
Observing the Oceans - A 2020 Vision for Ocean Science
The Fourth Paradigm: Data Intensive Scientific Discovery, Microsoft Research
Patents
-
De-focusing over big data for extraction of unknown value
Issued USPTO 08452792
Techniques for defocusing queries over big datasets and dynamic datasets are provided to broaden search results and incorporate all potentially relevant data and avoid overly narrowing queries. An analytic component can receive queries directed at one region of a dataset and analyze the queries to generate inferences about the queries. The queries can then be defocused by a defocusing component and incorporate a larger dataset than originally searched to broaden the queries. The larger dataset…
Techniques for defocusing queries over big datasets and dynamic datasets are provided to broaden search results and incorporate all potentially relevant data and avoid overly narrowing queries. An analytic component can receive queries directed at one region of a dataset and analyze the queries to generate inferences about the queries. The queries can then be defocused by a defocusing component and incorporate a larger dataset than originally searched to broaden the queries. The larger dataset can incorporate all, or a part of the original dataset and can also be disparate from the original dataset. Clusters of queries can also be merged and unified to deal with ‘local minima’ issues and broaden the understanding of the dataset. In other embodiments, dynamic data can be monitored and changes tracked, to ensure that all portions of the dataset are being searched by the queries.
-
AUTOMATIC SIGNIFICANCE TAGGING OF INCOMING COMMUNICATIONS
US 20090006366
-
Automatically managing incoming communications between sender and recipient, analyzing factors, selectively applying observed behavior, performing designated action
US 7,885,948
-
BREAK-THROUGH MECHANISM FOR PERSONAS ASSOCIATED WITH A SINGLE DEVICE
US 20110045806
-
CONSISTENCY SENSITIVE STREAMING OPERATORS
US 20090125635
-
DISTRIBUTED WORKFLOW FRAMEWORK
US 20110035506
-
Event stream conditioning
US 8,099,452
-
FEDERATED DISTRIBUTED WORKFLOW SCHEDULER
US 20110161391
-
Implementation of stream algebra over class instances
US 7,676,461
-
Optimized recovery logging
US 7,418,462
-
PUBLISHING WORK ACTIVITY TO SOCIAL NETWORKS
US 20090006415
-
Persistent client-server database sessions
US 7,386,557
-
Persistent stateful component-based applications via automatic recovery
US 7,461,292
-
Publishing work activity information key tags associated with shared databases in social networks
US Implementation of stream algebra over class instances
-
Recovery guarantees for general multi-tier applications
US 7,478,277
-
Recovery guarantees for software components
US 6,959,401
-
SINGLE DEVICE WITH MULTIPLE PERSONAS
US 20110061008
-
Streaming operator placement for distributed stream processing
US 8,060,614
-
TEMPORAL EVENT STREAM MODEL
US 20090125550
Honors & Awards
-
National Academy of Engineers, Frontiers of Engineering
National Academy of Engineering
-
Intel Graduate Fellowship Recipient
Intel Corporation
The Intel PhD Fellowship Program awards fellowships to PhD candidates doing work in fields related to Intel's business and research interests. These fellowships, available only at select U.S. universities, include tuition, and a stipend. Approximately 35 fellowships are awarded annually.
Recommendations received
1 person has recommended Roger
Join now to viewMore activity by Roger
-
You can store Azure #Database for #MySQL connection strings in #Azure Key Vault to securely manage sensitive information and ensure that it’s…
You can store Azure #Database for #MySQL connection strings in #Azure Key Vault to securely manage sensitive information and ensure that it’s…
Liked by Roger Barga
-
I am excited to inform that I am joining the CTO team for IBM Cloud as Lead Architect, Cross-Industry Solutions. My goal is leverage my solutioning…
I am excited to inform that I am joining the CTO team for IBM Cloud as Lead Architect, Cross-Industry Solutions. My goal is leverage my solutioning…
Liked by Roger Barga
-
Be sure to join for this month's #Azure #Database for #MySQL live #webinar (Jul 10 at 7:30 AM PDT) for technical deep dives, the latest service news…
Be sure to join for this month's #Azure #Database for #MySQL live #webinar (Jul 10 at 7:30 AM PDT) for technical deep dives, the latest service news…
Liked by Roger Barga
-
A move from #Azure #Database for #MySQL – Single Server to Flexible Server improves #performance, offers better value, and uses an easy #migration…
A move from #Azure #Database for #MySQL – Single Server to Flexible Server improves #performance, offers better value, and uses an easy #migration…
Liked by Roger Barga
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Roger Barga
1 other named Roger Barga is on LinkedIn
See others named Roger Barga