Roger Barga

Seattle, Washington, United States Contact Info

Sign in to view Roger’s full profile

Welcome back

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

7K followers 500+ connections

View mutual connections with Roger

Welcome back

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Join to view profile

About

Product leader with broad experience in enterprise computing, from strategy, product…

Activity

Building scalable solutions to address mental health is at the core of what we do at Headspace, and one of the foundational pillars of our strategy…

Building scalable solutions to address mental health is at the core of what we do at Headspace, and one of the foundational pillars of our strategy…

Liked by Roger Barga
Everyone starts at zero. Jeff Bezos giving a tour of Amazon's first "office" in 1994. Today, the company is valued at US $1.9 trillion. “The keys…

Everyone starts at zero. Jeff Bezos giving a tour of Amazon's first "office" in 1994. Today, the company is valued at US $1.9 trillion. “The keys…

Liked by Roger Barga
If you are experiencing a windows system not booting up, some steps from this old blog of mine could help. The steps are very similar with different…

If you are experiencing a windows system not booting up, some steps from this old blog of mine could help. The steps are very similar with different…

Liked by Roger Barga

Join now to see all activity

Publications

Predictive Analytics with Microsoft Azure Machine Learning: Build and Deploy Actionable Solutions in Minutes. Second Edition

Apress August 23, 2015
Data Science and Machine Learning are in high demand, as customers are increasingly looking for ways to glean insights from all their data. More customers now realize that Business Intelligence is not enough as the volume, speed and complexity of data now defy traditional analytics tools. While Business Intelligence addresses descriptive and diagnostic analysis, Data Science unlocks new opportunities through predictive and prescriptive analysis.

The purpose of this book is to provide a…

Data Science and Machine Learning are in high demand, as customers are increasingly looking for ways to glean insights from all their data. More customers now realize that Business Intelligence is not enough as the volume, speed and complexity of data now defy traditional analytics tools. While Business Intelligence addresses descriptive and diagnostic analysis, Data Science unlocks new opportunities through predictive and prescriptive analysis.

The purpose of this book is to provide a gentle and instructionally organized introduction to the field of data science and machine learning, with a focus on building and deploying predictive models.

Other authors
See publication
Project Daytona: Data Analytics as a Cloud Service

Proceedings of the International Conference of Data Engineering (ICDE), International Conference on Data Engineering, 7 March 2012 March 7, 2012
Spreadsheets are established data collection and analysis tools in business, technical computing and academic research. Excel, for example, offers an attractive user interface, provides an easy to use data entry model, and offers substantial interactivity for what-if analysis. However, spreadsheets and other common client applications do not offer scalable computation for large scale data analytics and exploration. Increasingly researchers in domains ranging from the social sciences to…

Spreadsheets are established data collection and analysis tools in business, technical computing and academic research. Excel, for example, offers an attractive user interface, provides an easy to use data entry model, and offers substantial interactivity for what-if analysis. However, spreadsheets and other common client applications do not offer scalable computation for large scale data analytics and exploration. Increasingly researchers in domains ranging from the social sciences to environmental sciences are faced with a deluge of data, often sitting in spreadsheets such as Excel or other client applications, and they lack a convenient way to explore the data, to find related data sets, or to invoke scalable analytical models over the data. To address these limitations, we have developed a cloud data analytics service based on Daytona, which is an iterative MapReduce runtime optimized for data analytics. In our model, Excel and other existing client applications provide the data entry and user interaction surfaces, Daytona provides a scalable runtime on the cloud for data analytics, and our service seamlessly bridges the gap between the client and cloud. Any analyst can use our data analytics service to discover and import data from the cloud, invoke cloud scale data analytics algorithms to extract information from large datasets, invoke data visualization, and then store the data back to the cloud all through a spreadsheet or other client application they are already familiar with.

Other authors
See publication
CloudClustering: Toward an iterative data processing pattern on the cloud

Proceedings IEEE Cloud 2011, The 4th International Conference on Cloud Computing, IEEE Computer Society July 4, 2011
As the emergence of cloud computing brings the potential for large-scale data analysis to a broader community, architectural patterns for data analysis on the cloud, especially iterative algorithms, are increasingly useful. MapReduce suffers performance limitations for this purpose as it is not inherently designed for iterative algorithms.

In this paper we describe our implementation of Cloud-Clustering, a distributed k-means clustering algorithm on Microsoft’s Windows Azure cloud. The…

As the emergence of cloud computing brings the potential for large-scale data analysis to a broader community, architectural patterns for data analysis on the cloud, especially iterative algorithms, are increasingly useful. MapReduce suffers performance limitations for this purpose as it is not inherently designed for iterative algorithms.

In this paper we describe our implementation of Cloud-Clustering, a distributed k-means clustering algorithm on Microsoft’s Windows Azure cloud. The k-means algorithm makes a good case study because its characteristics are representative of many iterative data analysis algorithms. CloudClustering adopts a novel architecture to improve performance without sacrificing fault tolerance. To achieve this goal, we introduce a distributed fault tolerance mechanism called the buddy system, and we make use of data affinity and checkpointing. Our goal is to generalize this

Other authors
See publication
A Scalable Communication Runtime for Clouds

Proceedings IEEE Cloud 2011, The 4th International Conference on Cloud Computing, IEEE Computer Society June 7, 2011
Leveraging cloud computing to acquire the necessary computation resources to scale out parallel applications is becoming common practice. However, many such applications also require communication and synchronization between processes. Although, commercial cloud platforms provide ready access to scalable compute and storage services, implementing communication and synchronization between cooperating processes and efficiently exchanging arbitrary size messages remains a challenge for application…

Leveraging cloud computing to acquire the necessary computation resources to scale out parallel applications is becoming common practice. However, many such applications also require communication and synchronization between processes. Although, commercial cloud platforms provide ready access to scalable compute and storage services, implementing communication and synchronization between cooperating processes and efficiently exchanging arbitrary size messages remains a challenge for application developers. In clouds, durable queues provide basic abstractions for communication. However, they are not sufficient for applications that require transferring arbitrary size messages or for applications that require higher level abstractions such as broadcast. Furthermore, direct socket based communication is susceptible to various fluctuations common in data center environments. We envision a solution to this problem that leverages scalable storage services, queues, and direct socket based communication. Publish/subscribe (pub/sub) is a well-known communication pattern that can achieve the above capabilities in a loosely coupled fashion, which is highly desirable in cloud environments where most services are asynchronous. In this paper, we describe the architecture of a pub/sub library implemented on a commercial cloud computing platform, which can be used to develop various parallel applications. We also present an evaluation of our implementation using both micro benchmarks and a real world application. Together, these demonstrate that our approach is both effective and scalable in performing communication and synchronization in cloud scale applications.

Other authors
See publication
Accurate Latency Estimation in a Distributed Event Processing System

27th International Conference on Data Engineering (ICDE '11) Apr 2011
Other authors
See publication
Bioinformatics and Data-Intensive Scientific Discovery in the Beginning of the 21st Century

in OMICS A Journal of Integrative Biology 2011

See publication
The Client and the Cloud: Democratizing Research Computing

IEEE Internet Computing, IEEE Computer Society 2011

See publication
Versioning for Workflow Evolution

Third International Workshop on Data Intensive Distributed Computing held in conjunction with the 19th International Symposium on High Performance Distributed Computing (HPDC'10) June 22, 2010

Scientists working in eScience environments often use workflows to carry out their computations. Since the workflows evolve as the research itself evolves, these workflows can be a tool for tracking the evolution of the research. Scientists can trace their research and associated results through time or even go back in time to a previous stage and fork to a new branch of research. In this paper we introduce the workflow evolution framework (EVF), which is demonstrated through implementation in…

Scientists working in eScience environments often use workflows to carry out their computations. Since the workflows evolve as the research itself evolves, these workflows can be a tool for tracking the evolution of the research. Scientists can trace their research and associated results through time or even go back in time to a previous stage and fork to a new branch of research. In this paper we introduce the workflow evolution framework (EVF), which is demonstrated through implementation in the Trident workflow workbench. The primary contribution of the EVF is efficient management of knowledge associated with workflow evolution. Since we believe evolution can be used for workflow attribution, our framework will motivate researchers to share their workflows and get the credit for their contributions.

See publication
Building the Trident Scientific Workflow Workbench for Data Management in the Cloud

International Conference on Advanced Engineering Computing and Applications in Sciences (ADVCOMP), IEEE Oct 2009

See publication
GrayWulf: Scalable Software Architecture for Data Intensive Computing

Hawaii International Conference on System Sciences (HICSS), IEEE Computer Society 2009
Other authors
See publication
Observing the Oceans - A 2020 Vision for Ocean Science

The Fourth Paradigm: Data Intensive Scientific Discovery, Microsoft Research 2009
Other authors
See publication
Pan-STARRS: Learning to Ride the Data Tsunami

Microsoft eScience Workshop Dec 2008

See publication

Patents

De-focusing over big data for extraction of unknown value

Issued May 28, 2013 USPTO 08452792

Techniques for defocusing queries over big datasets and dynamic datasets are provided to broaden search results and incorporate all potentially relevant data and avoid overly narrowing queries. An analytic component can receive queries directed at one region of a dataset and analyze the queries to generate inferences about the queries. The queries can then be defocused by a defocusing component and incorporate a larger dataset than originally searched to broaden the queries. The larger dataset…

Techniques for defocusing queries over big datasets and dynamic datasets are provided to broaden search results and incorporate all potentially relevant data and avoid overly narrowing queries. An analytic component can receive queries directed at one region of a dataset and analyze the queries to generate inferences about the queries. The queries can then be defocused by a defocusing component and incorporate a larger dataset than originally searched to broaden the queries. The larger dataset can incorporate all, or a part of the original dataset and can also be disparate from the original dataset. Clusters of queries can also be merged and unified to deal with ‘local minima’ issues and broaden the understanding of the dataset. In other embodiments, dynamic data can be monitored and changes tracked, to ensure that all portions of the dataset are being searched by the queries.

See patent
AUTOMATIC SIGNIFICANCE TAGGING OF INCOMING COMMUNICATIONS

US 20090006366
Automatically managing incoming communications between sender and recipient, analyzing factors, selectively applying observed behavior, performing designated action

US 7,885,948
BREAK-THROUGH MECHANISM FOR PERSONAS ASSOCIATED WITH A SINGLE DEVICE

US 20110045806
CONSISTENCY SENSITIVE STREAMING OPERATORS

US 20090125635
DISTRIBUTED WORKFLOW FRAMEWORK

US 20110035506
Event stream conditioning

US 8,099,452
FEDERATED DISTRIBUTED WORKFLOW SCHEDULER

US 20110161391
Implementation of stream algebra over class instances

US 7,676,461
Optimized recovery logging

US 7,418,462
PUBLISHING WORK ACTIVITY TO SOCIAL NETWORKS

US 20090006415
Persistent client-server database sessions

US 7,386,557
Persistent stateful component-based applications via automatic recovery

US 7,461,292
Publishing work activity information key tags associated with shared databases in social networks

US Implementation of stream algebra over class instances
Recovery guarantees for general multi-tier applications

US 7,478,277
Recovery guarantees for software components

US 6,959,401
SINGLE DEVICE WITH MULTIPLE PERSONAS

US 20110061008
Streaming operator placement for distributed stream processing

US 8,060,614
TEMPORAL EVENT STREAM MODEL

US 20090125550

Honors & Awards

National Academy of Engineers, Frontiers of Engineering

National Academy of Engineering

2003
Intel Graduate Fellowship Recipient

Intel Corporation

Jun 1996

The Intel PhD Fellowship Program awards fellowships to PhD candidates doing work in fields related to Intel's business and research interests. These fellowships, available only at select U.S. universities, include tuition, and a stipend. Approximately 35 fellowships are awarded annually.

Recommendations received

Keith Beggs

“Roger shares deep expertise (and "deep learning" insights too) with poise, clarity and humility. His data science classes provided a wealth of knowledge with practical insight and were extremely helpful. If you're looking for a strong technical mentor or someone to coach you out of a data science minefield, put Roger on speed dial! Thank you Roger for sharing useful expertise in a fresh, genuine, extremely helpful fashion.”

1 person has recommended Roger

Join now to view

More activity by Roger

You can store Azure #Database for #MySQL connection strings in #Azure Key Vault to securely manage sensitive information and ensure that it’s…

You can store Azure #Database for #MySQL connection strings in #Azure Key Vault to securely manage sensitive information and ensure that it’s…

Liked by Roger Barga
I am excited to inform that I am joining the CTO team for IBM Cloud as Lead Architect, Cross-Industry Solutions. My goal is leverage my solutioning…

I am excited to inform that I am joining the CTO team for IBM Cloud as Lead Architect, Cross-Industry Solutions. My goal is leverage my solutioning…

Liked by Roger Barga
Be sure to join for this month's #Azure #Database for #MySQL live #webinar (Jul 10 at 7:30 AM PDT) for technical deep dives, the latest service news…

Be sure to join for this month's #Azure #Database for #MySQL live #webinar (Jul 10 at 7:30 AM PDT) for technical deep dives, the latest service news…

Liked by Roger Barga
A move from #Azure #Database for #MySQL – Single Server to Flexible Server improves #performance, offers better value, and uses an easy #migration…

A move from #Azure #Database for #MySQL – Single Server to Flexible Server improves #performance, offers better value, and uses an easy #migration…

Liked by Roger Barga
#DataSaturday is back in Big D!

#DataSaturday is back in Big D!

Liked by Roger Barga

View Roger’s full profile

See who you know in common
Get introduced
Contact Roger directly

Join to view full profile

Sign in

Stay updated on your professional world

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Other similar profiles

Explore more posts

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Roger Barga

Roger Barga

Chief Executive Officer

Craryville, NY

1 other named Roger Barga is on LinkedIn

See others named Roger Barga

Add new skills with these courses

See all courses

About

Activity

Building scalable solutions to address mental health is at the core of what we do at Headspace, and one of the foundational pillars of our strategy…

Liked by Roger Barga

Everyone starts at zero. Jeff Bezos giving a tour of Amazon's first "office" in 1994. Today, the company is valued at US $1.9 trillion. “The keys…

Liked by Roger Barga

If you are experiencing a windows system not booting up, some steps from this old blog of mine could help. The steps are very similar with different…

Liked by Roger Barga

Publications

Apress August 23, 2015

Proceedings of the International Conference of Data Engineering (ICDE), International Conference on Data Engineering, 7 March 2012 March 7, 2012

Proceedings IEEE Cloud 2011, The 4th International Conference on Cloud Computing, IEEE Computer Society July 4, 2011

Proceedings IEEE Cloud 2011, The 4th International Conference on Cloud Computing, IEEE Computer Society June 7, 2011

27th International Conference on Data Engineering (ICDE '11) Apr 2011

in OMICS A Journal of Integrative Biology 2011

IEEE Internet Computing, IEEE Computer Society 2011

Third International Workshop on Data Intensive Distributed Computing held in conjunction with the 19th International Symposium on High Performance Distributed Computing (HPDC'10) June 22, 2010

International Conference on Advanced Engineering Computing and Applications in Sciences (ADVCOMP), IEEE Oct 2009

Hawaii International Conference on System Sciences (HICSS), IEEE Computer Society 2009

The Fourth Paradigm: Data Intensive Scientific Discovery, Microsoft Research 2009

Microsoft eScience Workshop Dec 2008

Patents

Issued May 28, 2013 USPTO 08452792

AUTOMATIC SIGNIFICANCE TAGGING OF INCOMING COMMUNICATIONS

US 20090006366

Automatically managing incoming communications between sender and recipient, analyzing factors, selectively applying observed behavior, performing designated action

US 7,885,948

BREAK-THROUGH MECHANISM FOR PERSONAS ASSOCIATED WITH A SINGLE DEVICE

US 20110045806

CONSISTENCY SENSITIVE STREAMING OPERATORS

US 20090125635

DISTRIBUTED WORKFLOW FRAMEWORK

US 20110035506

Event stream conditioning

US 8,099,452

FEDERATED DISTRIBUTED WORKFLOW SCHEDULER

US 20110161391

Implementation of stream algebra over class instances

US 7,676,461

Optimized recovery logging

US 7,418,462

PUBLISHING WORK ACTIVITY TO SOCIAL NETWORKS

US 20090006415

Persistent client-server database sessions

US 7,386,557

Persistent stateful component-based applications via automatic recovery

US 7,461,292

Publishing work activity information key tags associated with shared databases in social networks

US Implementation of stream algebra over class instances

Recovery guarantees for general multi-tier applications

US 7,478,277

Recovery guarantees for software components

US 6,959,401

SINGLE DEVICE WITH MULTIPLE PERSONAS

US 20110061008

Streaming operator placement for distributed stream processing

US 8,060,614

TEMPORAL EVENT STREAM MODEL

US 20090125550

Honors & Awards

National Academy of Engineers, Frontiers of Engineering

National Academy of Engineering

Intel Graduate Fellowship Recipient

Intel Corporation

Recommendations received

Keith Beggs

More activity by Roger

You can store Azure #Database for #MySQL connection strings in #Azure Key Vault to securely manage sensitive information and ensure that it’s…

Liked by Roger Barga

I am excited to inform that I am joining the CTO team for IBM Cloud as Lead Architect, Cross-Industry Solutions. My goal is leverage my solutioning…

Liked by Roger Barga

Be sure to join for this month's #Azure #Database for #MySQL live #webinar (Jul 10 at 7:30 AM PDT) for technical deep dives, the latest service news…

Liked by Roger Barga

A move from #Azure #Database for #MySQL – Single Server to Flexible Server improves #performance, offers better value, and uses an easy #migration…

Liked by Roger Barga

#DataSaturday is back in Big D!

Liked by Roger Barga

View Roger’s full profile

Sign in